Skip to Main Content
Biclustering is usually referred to as the process of finding subsets of rows and columns from a given dataset. Each subset is a bicluster and corresponds to a sub-matrix whose elements tend to present a high degree of coherence with each other. In order to find such structures, the δ-biclustering problem was formulated, being denoted as the problem of finding a set of biclusters limited by a maximum degree of coherence, measured by a mean-squared residue, while maximizing the bicluster total size. Additionally, it is expected a reduced overlap among the biclusters in the set, in other words, a minimization of the number of common elements shared by them. This also leads to a high coverage of the original dataset given the number of biclusters found. Most algorithms intended to find such biclusters focus only on the mean-squared residue and/or the bicluster size. This usually leads to a set of biclusters that do not fully cover the whole data and, as a consequence, shares a high overlap among them. This may generate redundant information on some portions of the dataset and lack of information on other portions. Also, some methods introduce noise into the dataset in order to promote a better coverage, but sometimes misleading the search. In this paper, a swarm-based approach, named SwarmBcluster, is created to effectively find biclusters without introducing noise and with the main objective of achieving maximum coverage. Experiments were performed considering two well-known datasets and a comparative analysis considering other approaches indicates that SwarmBcluster is capable of finding a set of biclusters with high coverage, while maintaining a high average volume and also obeying the coherence constraint imposed.
Evolutionary Computation (CEC), 2010 IEEE Congress on
Date of Conference: 18-23 July 2010