Skip to Main Content
We present here the trend vector based method for identification of inhibitors for a given protein target. Therefore our approach reduces the number of compounds to be tested experimentally in costly validation studies, when some initial information about actives is already available. The machine learning method is trained here on compounds from the Elsevier Molecular Design Ltd (MDL) Information Systems' drug data report (MDDR) for five diverse protein targets (cyclooxygenase-2, dihydrofolatereductase, thrombin, HIV-reverse transcriptase and antagonists of the estrogen receptor). Each classified ligand is represented using an optimized set of two dimensional topological descriptors. Then the trend vectors are used to divide the whole set of ligands into two groups: 1) molecules predicted to be active, and 2) those predicted to be inactive. Training and predicted activities were treated as binary. The accuracy of the method is comparable to other existing prediction tools (such as support vector machines, or random forest), whereas it provides significantly higher speed and portability. The accuracy of prediction (precision) reaches 60% on heterogeneous source data. As a consequence, the method can be easily applied to large commercial compounds collections.