Skip to Main Content
We have studied the problem of classifying of surnames into the countries of origin using a collection of feature based learning algorithms. We have compiled a database of surnames and their countries of origin from publicly available databases as training data for the classifiers. We propose a feature selection algorithm which dynamically decides the most prominent feature of the names based on the training data. Based on the selected features, we utilized a number of supervised and unsupervised learning algorithms to classify the surnames into the countries. Finally, we have compared the accuracy and performance of the different classifiers with different parameters and metrics. We are able to demonstrate that the reduced feature set works well with the well-known classifiers.