Skip to Main Content
Many real-world datasets are of mixed types, having numeric and categorical attributes. Even though difficult, analyzing mixed-type datasets is important. In this paper, we propose an extended self-organizing map (SOM), called MixSOM, which utilizes a data structure distance hierarchy to facilitate the handling of numeric and categorical values in a direct, unified manner. Moreover, the extended model regularizes the prototype distance between neighboring neurons in proportion to their map distance so that structures of the clusters can be portrayed better on the map. Extensive experiments on several synthetic and real-world datasets are conducted to demonstrate the capability of the model and to compare MixSOM with several existing models including Kohonen's SOM, the generalized SOM and visualization-induced SOM. The results show that MixSOM is superior to the other models in reflecting the structure of the mixed-type data and facilitates further analysis of the data such as exploration at various levels of granularity.