By Topic

Scoring levels of categorical variables with heterogeneous data

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$33 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

2 Author(s)
Tuv, E. ; Anal. & Control Technol., Intel Corp., Chandler, AZ, USA ; Runger, G.C.

Heterogeneous (mixed-type) data present significant challenges in both supervised and unsupervised learning. The situation is even more complicated when nominal variables have several levels (values) that make using indicator variables (for every categorical level) infeasible. With unsupervised learning, several fairly involved, computationally intensive, nonlinear multivariate techniques iteratively alternate data transformations with optimal scoring. These seek to optimize an objective on the basis of a covariance matrix. Our goal is to find a computationally efficient and flexible method for mapping categorical variables to numeric scores in mixed-type data. We attempt to go beyond optimizing second-order statistics (such as covariance) and enable distance-based methods by exploring mutual relationships or bumps of dependencies between variables. This is a new objective for a scoring method that's based on patterns learned from all the available variables.

Published in:

Intelligent Systems, IEEE  (Volume:19 ,  Issue: 2 )