Skip to Main Content
The popularity of SVMs has grown tremendously in the last few years for many different classification problems due to its generalization properties, however training SVMs require high computational power. Platt's SMO is one the fastest algorithm for training support vector machines, which takes the decomposition technique to the extreme by selecting a set of only two points as the working set then solving them analytically. However SMO becomes slow for large size training data set. In this paper we present a MapReduce based distributed implementation of SMO using Hadoop. The distributed SMO uses multiple core processors to process the training data. By partitioning the training data set into smaller subsets and allocating each of the partitioned subsets to a single Map task, each Map task optimizes the partition in parallel and finally the reducer combine the results. Experiments show the efficiency of the distributed SMO increases with the increase of the number of processors, the training speed of distributed SMO with 12 Map task is about 11times higher than standalone SMO. There is no significant difference in accuracy between distributed and standalone SMO.
Fuzzy Systems and Knowledge Discovery (FSKD), 2010 Seventh International Conference on (Volume:6 )
Date of Conference: 10-12 Aug. 2010