Processing math: 100%
Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral | IEEE Conference Publication | IEEE Xplore

Improving Clustering Method Performance Using K-Means, Mini Batch K-Means, BIRCH and Spectral


Abstract:

The most pressing problem of the k-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed comp...Show More

Abstract:

The most pressing problem of the k-Nearest Neighbor (KNN) classification method is voting technology, which will lead to poor accuracy of some randomly distributed complex data sets. To overcome the weakness of KNN, we added a step before the KNN classification phase. We developed a new schema for grouping data sets, making the number of clusters greater than the number of data classes. In addition, the committee selects each cluster so that it does not use voting techniques such as standard KNN methods. This study uses two sequential methods, namely the clustering method and the KNN method. Clustering methods can be used to group records into multiple clusters to select commissions from these clusters. Five clustering methods were tested: K-Means, K-Means with Principal Component Analysis (PCA), Mini Batch K-Means, Spectral and Balanced Iterative Reduction and Clustering using Hierarchies (BIRCH). All tested clustering methods are based on the cluster type of the center of gravity. According to the result, the BIRCH method has the lowest error rate among the five clustering methods (2.13), and K-Means has the largest clusters (156.63).
Date of Conference: 16-17 December 2021
Date Added to IEEE Xplore: 11 February 2022
ISBN Information:
Conference Location: Yogyakarta, Indonesia
Department of Informatics, Institut Teknologi Telkom Purwokerto, Banyumas, Indonesia
Department of Data Science, Institut Teknologi Telkom Purwokerto, Banyumas, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia

I. Introduction

The k-Nearest Neighbor (KNN) method or also known as the k-Nearest Neighbor Rule (KNNR) is a non-parametric classification method that is known to be the simplest, effective, good performance, and robust [1], [2]. This method works by finding a number of patterns (among all the training patterns in all classes) closest to the input pattern, then determining the decision class based on using voting technique. Some of the weaknesses of the KNN method are that it is sensitive to less relevant features and the neighboring size of [3], [4]. It is relatively difficult to determine the exact because it can be high; in other cases, it can be very low. The most urgent problem in KNN is the voting technique, which makes it low-accuracy for several complex datasets which are randomly distributed [5]. To overcome the weakness of KNN, we created a new scheme in the form of dataset clustering so that the number of clusters is greater than the number of data classes. Furthermore, commissions will select each cluster, so it does not use voting techniques like the standard KNN method.

Department of Informatics, Institut Teknologi Telkom Purwokerto, Banyumas, Indonesia
Department of Data Science, Institut Teknologi Telkom Purwokerto, Banyumas, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia
School of Computing, Telkom University, Bandung, Indonesia

Contact IEEE to Subscribe

References

References is not available for this document.