Skip to Main Content
This paper discusses the identification of potential contaminants in mass spectrometry data derived from proteomic studies. Contaminant masses are usually submitted with valid peptide masses to the protein identification algorithms which can potentially lead to false positive results. In this paper we present an approach for the automatic identification of contaminant masses so that they can be removed prior to the submission of the peak list for protein identification. For this purpose we have developed an algorithm that clusters mass values. We calculate the frequencies of all masses and then identify possible contaminant masses. We propose that masses that occur with high frequency are contaminants. In our analysis of 78,384 masses derived from 3,029 proteins, we identify 16 possible contaminants. Of these 16, four are known trypsin autolysis peptides. Removing these contaminant masses from the database search will lead to more accurate and reliable protein identification.