Loading [MathJax]/extensions/MathMenu.js
An approach to enhance KNN based on data clustering using K-medoid | IEEE Conference Publication | IEEE Xplore

An approach to enhance KNN based on data clustering using K-medoid


Abstract:

K-nearest-neighbor (KNN) is one of the state-of-the-art machine learning algorithms used for classification and regression tasks. In addition to being simple to understan...Show More

Abstract:

K-nearest-neighbor (KNN) is one of the state-of-the-art machine learning algorithms used for classification and regression tasks. In addition to being simple to understand, KNN is also versatile, spanning various applications. Despite its simplicity, it is considered a lazy classifier that does not generate a trained model but stores or memorizes training examples instead. Consequently, the prediction process using KNN becomes costly in resources and time, especially when the dataset becomes large. Also, there is no general way to choose the best distance metric during the prediction. This paper proposes a new algorithm called K-nearest Medoid KNN (KMKNN) which improves the performance of KNN in terms of prediction performance and time efficiency without a major effect on its result accuracy. The core idea of the proposed KMKNN is to cluster the dataset before the prediction to limit the distance measures to those data instances that belong to the nearest cluster of the new data. KMKNN when compared to the traditional KNN and other similar extended versions of KNN, achieves a noticeable improvement on 15 benchmark datasets. The importance of this work is primarily in large datasets or when the distance measure used is computationally expensive, which is common in the computer vision and pattern recognition domains.
Date of Conference: 08-09 May 2022
Date Added to IEEE Xplore: 01 June 2022
ISBN Information:
Conference Location: Cairo, Egypt

I. Introduction

Machine Learning (ML) techniques are widely regarded as the dominant player in variety of application domains [1]–[8]. Defining a certain state-of-the-art algorithm that can solve all kinds of ML problems is very difficult. Some algorithms, such as the well-known KNN algorithm will be dropped from the competition. The disadvantage of KNN algorithm is that it only uses the training set to be stored in a dataset and does not use the training data to learn. The KNN algorithm, shown in Table I, is an instance-based learning algorithm. The two steps for prediction are costly, because each time in the prediction of the unseen data, the KNN classifier starts brute forcing all training instances to its end to find the KNN and then to perform the majority vote, this will be costly in resources and time particularly in case of larger datasets[9]; thus, the KNN classifier is called a lazy classifier [10].

Contact IEEE to Subscribe

References

References is not available for this document.