Skip to Main Content
The difficulties in analyzing and clustering (synthesizing) multivariate data of the mixed type (discrete and continuous) are largely due to: 1) nonuniform scaling in different coordinates, 2) the lack of order in nominal data, and 3) the lack of a suitable similarity measure. This paper presents a new approach which bypasses these difficulties and can acquire statistical knowledge from incomplete mixed-mode data. The proposed method adopts an event-covering approach which covers a subset of statistically relevant outcomes in the outcome space of variable-pairs. And once the covered event patterns are acquired, subsequent analysis tasks such as probabilistic inference, cluster analysis, and detection of event patterns for each cluster based on the incomplete probability scheme can be performed. There are four phases in our method: 1) the discretization of the continuous components based on a maximum entropy criterion so that the data can be treated as n-tuples of discrete-valued features; 2) the estimation of the missing values using our newly developed inference procedure; 3) the initial formation of clusters by analyzing the nearest-neighbor distance on subsets of selected samples; and 4) the reclassification of the n-tuples into more reliable clusters based on the detected interdependence relationships. For performance evaluation, experiments have been conducted using both simulated and real life data.