Skip to Main Content
A novel representation of protein sequence, amino acid composition distribution (AACD), is introduced to perform prediction of subcellular localization in this paper. First, a protein sequence is divided equally into multiple segments. Then, amino acid composition of each segment is calculated in series. After that, each protein sequence can be represented a feature vector. Finally, feature vectors of all sequences are further input into multi-class support vector machines to predict the subcellular localization. The results show that AACD is more effective to represent protein sequence and is non-sensitive to sequence similarity because of the better ability to reflect the information of protein subcellular localization.