Abstract:
Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such la...Show MoreMetadata
Abstract:
Advances in modern bio-sequencing techniques have led to a proliferation of raw genomic data that enables an unprecedented opportunity for data mining. To analyze such large volume and high-dimensional scientific data, many high performance dimension reduction and clustering algorithms have been developed. Among the known algorithms, we use Multidimensional Scaling (MDS) to reduce the dimension of original data and Pair wise Clustering, and to classify the data. We have shown that interpolative MDS, which is an online technique for real-time streaming in Big Data, can be applied to get better performance on massive data. However, SMACOF MDS approach is only directly applicable to cases where all pair wise distances are used and where weight is one for each term. In this paper, we proposed a robust and scalable MDS and interpolation algorithm using Deterministic Annealing technique, to solve problems with either missing distances or a non-trivial weight function. We compared our method to three state-of-art techniques. By experimenting on three common types of bioinformatics dataset, the results illustrate that the precision of our algorithms are better than other algorithms, and the weighted solutions has a lower computational time cost as well.
Published in: 2013 IEEE 9th International Conference on e-Science
Date of Conference: 22-25 October 2013
Date Added to IEEE Xplore: 16 December 2013
Electronic ISBN:978-0-7695-5083-1