Skip to Main Content
MicroRNAs (miRNAs) are small non-coding RNA molecules that post-transcrlptionally regulate gene expression by base-pairing to mRNAs. Prediction of microRNA (miRNA)- target transcript pair is now in the forefront of current research. A number of experimental and computational approaches have already detected thousands of targets for hundreds of human miRNAs. However, most of the computational target prediction methods suffer from high false positive and false negative rate. One reason for this is the marked deficiency of negative examples or non-target data. Current machine learning based target prediction algorithms suffer from lack of sufficient number of negative examples to train the machine properly because only a limited number of biologically verified negative miRNA-target transcripts have been identified with respect to true miRNA- target examples. Hence researchers have to rely on artificial negative examples. But, it has been observed that these artificially generated negative examples can not provide a good prediction accuracy for the independent test data set. Therefore it is necessary to generate more confident artificial negative examples. In the proposed article we have predicted potential miRNA- target pairs with higher sensitivity and specificity based on a new way of generating negative examples. Firstly, artificial miRNAs are generated that are believed not to be a true miRNA. In this regard, we use a novel approach K-mer exchange between key and non-key regions of the miRNA. Based on the false miRNAs we search their potential targets by scanning entire 3' untranslated regions (UTRs) using the target prediction algorithm miRanda. Based on the newly generated negative examples and a set of biologically verified positive examples we trained the classifier SVM and classify a set of independent test samples. In this regard we have generated a set of 90 experimentally verified context specific features. Our prediction algorithm has been validated with - - an independent experimental data and we obtained a much higher prediction accuracy. The robust performance of the proposed method is mainly the result of using a large high-quality artificial negative examples and the integration of many biologically verified known and novel context specific features.