Skip to Main Content
The present study develops a classification model to correlate the binding pockets of 70 HIV-1 protease crystal structures in terms of their structural descriptors to their complexed HIV-1 protease inhibitors. The Random Forest classification model is used to reduce the chemical descriptor space from 456 to the 12 most relevant descriptors based on the Gini importance measure. The selected 12 descriptors are then used to develop classification models using linear discriminant analysis (LDA) and logistic regression (LR). The top eight descriptors were found to produce the best LDA model with an overall error of 30% and a leave-one-out cross validation error of 44.29%, while the top five descriptors were found to produce the best LR model with an overall error of 28.57% and a leave-one-out cross validation error of 41.43%. Hierarchical clustering was performed on the top five and eight descriptors to verify whether the descriptor selection of Random Forest can group together the binding pockets based on their complexed ligands. The selected descriptors would play a crucial role in understanding the HIV-1 protease binding pocket structure in terms of its chemical descriptors.