Skip to Main Content
Mutations in HIV patients' reverse transcriptase and protease may be related to drug resistance. There are many issues that make difficult the complete elucidation of the relationship between these mutations and drug resistance, such as cross resistance and the limitations to detect the relevance of resistance. Look up tables and rule-based systems are an attempt to classify sequences and predict treatment failure. However, they depend on the scientific literature and their quality and reliability. Data-intensive analysis of HIV mutation databases may help to corroborate or to improve such knowledge spread in the literature. Pattern recognition algorithms classify data extracting information from different data domain. Clustering and biclustering classification algorithms have been explored to group scientific and business data based on measures of similarities. K-means is a popular algorithm for clustering and Bimax is used with binary data. Considering this scenario, the main contribution of this work is to develop a new methodology based on K-means and Bimax using a binary data representation of reverse transcriptase and protease sequences, in an attempt to get an unsupervised classification of the sequences that may be related to drug resistance. In our work, 14,393 sequences with selected positions of the proteins, known to be related to drug resistance, represented in an 82-dimensional vector space are analyzed by pattern recognition algorithms. The sequences are represented as binary vectors. Suitable visualization of such vectors is produced for medical interpretation and indicates some correspondence to the prediction of drug resistance given by the brazilian look up table, used by brazilian physicians, but that depends on the literature on HIV and it's quality to be created. As a consequence, in this work we describe a methodology based on the application of pattern recognition algorithms using binary data in order to suggest clusters of mutations and t- eir relations with drug resistance using a different cluster visualization scheme.