Skip to Main Content
To identify the source of Escherichia coli (E.coli) fecal bacterial contamination, we propose a fuzzy dissimilarity measure to calculate the similarity between the E.coli DNA patterns. The fuzzy dissimilarity measure preserves the dimension of the DNA patterns and at the same time allows variation among same host patterns. The fuzzy dissimilarity measure produces a dissimilarity matrix, a form of relational data. For classification of this type of data representation we present a weighted k-nearest neighbor algorithm. The weighted k.nearest neighbor technique uses the classical k-nearest neighbor rule but solves the problem of 'tie' between multi-classes. In addition, we suggest an ensemble data set method for sample sets with a large range of class sizes. The proposed system showed potential as a stable system in detecting fecal bacterial hosts and as a base for future studies in interpreting DNA patterns.