Skip to Main Content
Support vector machines are state-of-the-art machine learning algorithms that can be used for classification problems such as DNA splice site identification. However, the large number of samples in biological data sets can often lead to slow training speed. The training speed can be improved by removing non-support vectors prior to training. This paper proposes a method to predict non-support vectors with high accuracy by the use of strict- constrained gradient ascent optimisation. Unlike other data preselection methods, the proposed gradient based method is itself a training algorithm for SVM, and is also very simple to implement. Experiments with comparable results are conducted on a DNA splice-site detection problem. Results show significant speed improvements over other algorithms. The relationship between speed improvement and cache memory size is also exploited. Generalisation capability of the proposed algorithm is also shown to be better than some other reformulated SVMs.