Skip to Main Content
Data Mining is the process to extract hidden predictive information from large databases. In Bioinformatics, data mining enables researchers to meet the challenge of mining large amount of biomolecular data to discover real knowledge. Major research efforts done in the area of bioinformatics involves sequence analysis, protein structure prediction and gene finding. Proteins are said to be prominent molecules in our cells. They involve virtually in all cell functions. The activities and functions of proteins can be determined by protein sequence motifs. These protein motifs are identified from the segments of protein sequences. All segments may not be important to produce good motif patterns. The generated sequence segments do not have classes or labels. Hence, unsupervised segment selection technique is adopted to select significant segments. Therefore Singular Value Decomposition (SVD) entropy method is adopted to select significant sequence segments. In this proposed work, weighted K-Means and Adaptive Fuzzy C-Means have been applied to the selected segments to generate granules, since large amount of segments cannot be grouped or clustered as such. Each granules generated by weighted K-Means algorithm are further clustered by using the K-Means algorithm and granules generated by Adaptive Fuzzy C-Means algorithm are clustered by using Weighted K-Means. The two proposed models are compared with K-Means granular computing model. The experimental results show that Adaptive Fuzzy C-Means with Weighted K-Means technique produces better results than K-Means and weighted K-Means granular computing methods.