Liquid chromatography tandem mass spectrometry (LC/MS/MS) based plasma proteomics profiling technique is a promising technology platform to study candidate protein biomarkers for complex human diseases such as cancer. Factors such as inherent variability, protein detectability limitation, and peptide discovery biases among LC/MS/MS platforms have made the classification and prediction of proteomics profiles challenging. In this paper, we developed a proteomics data analysis method to identify multi-protein biomarker panels for breast cancer diagnosis based on artificial neural networks. Using this method, we first applied standard analysis of variance (ANOVA) to derive a list of single candidate biomarkers that significantly changed from plasma proteomics profiles between breast cancer and controls. Next, we constructed a feed forward neural network (FFNN) for each combination of single marker proteins and trained with plasma proteomics results derived from 40 breast cancer women and 40 control women. We evaluated the results for best five-marker panel and ten-marker panels on a testing data set of similar cohort of 80 plasma proteomics profiles, of which half are breast cancer women and half are controls, using both statistical methods (receiver operating characteristics curve comparisons) and biological literature validation. We found that five-marker panel using two-variable FFNN output achieved the best prediction performance in testing data set, with 82.5% in sensitivity and 82.5% in specificity. Our computational method can serve as a general model for multi-biomarker panel discovery applications in other diseases.
Published in:
Computer-Based Medical Systems, 2009. CBMS 2009. 22nd IEEE International Symposium on
Date of Conference: 2-5 Aug. 2009