Functional classification involves grouping genes according to their molecular functions or the biological processes they participate in. This unsupervised classification task is essential for interpreting gene datasets produced by post-genomic experiments. As the functional annotation of genes is mostly based on the Gene Ontology (GO), many similarity measures using the GO have been described, but few of them have been used for clustering. In this paper we evaluate functional classification of genes using our previously described IntelliGO semantic similarity measure with the help of reference sets. These sets consist of genes taken from human and yeast KEGG (Kyoto Encyclopedia of Genes and Genomes) pathways and Pfam clans. Hierarchical clustering and heatmap visualization are used to illustrate the advantages of IntelliGO over several other measures. Because genes often belong to more than one reference set, the fuzzy C-means clustering algorithm is then applied to the datasets using IntelliGO. The F-score method is used to estimate the quality of clustering and the optimal number of clusters. The results are compared with those obtained from the state-of-the-art DAVID (Database for Annotation Visualization and Integrated Discovery) functional classification method. Overlap analysis allows to study the matching between clusters and reference sets, and leads us to propose a set-difference method for discovering missing information. The IntelliGO similarity measure, the clustering tool and the reference sets used for evaluation are available at: http://plateforme-mbi.loria.fr/intelligo.