Close category search window
 

Feature Selection using a Random Forests Classifier for the Integrated Analysis of Multiple Data Types

Sign In

Cookies must be enabled to login.After enabling cookies , please use refresh or reload or ctrl+f5 on the browser for the login options.

Formats Non-Member Member
$31 $13
Learn how you can qualify for the best price for this item!
Become an IEEE Member or Subscribe to
IEEE Xplore for exclusive pricing!
close button

puzzle piece

IEEE membership options for an individual and IEEE Xplore subscriptions for an organization offer the most affordable access to essential journal articles, conference papers, standards, eBooks, and eLearning courses.

Learn more about:

IEEE membership

IEEE Xplore subscriptions

5 Author(s)
Reif, D.M. ; Dept. of Molecular Physiol. & Biophys., Vanderbilt Univ., Nashville, TN ; Motsinger, A.A. ; McKinney, B.A. ; Crowe, J.E.
more authors

Complex clinical phenotypes arise from the concerted interactions among the myriad components of a biological system. Therefore, comprehensive models can only be developed through the integrated study of multiple types of experimental data gathered from the system in question. The Random Foreststrade(RF) method is adept at identifying relevant features having only slight main effects in high-dimensional data. This method is well-suited to integrated analysis, as relevant attributes may be selected from categorical or continuous data, and there may be interactions across data types. RF is a natural approach for studying gene-gene, gene-protein, or protein-protein interactions because importance scores for particular attributes take interactions into account. Thus, Random Forests is a promising solution to the analysis challenge posed by high-dimensional datasets including interactions among attributes of different types. In this study, we characterize the performance of RF on a range of simulated genetic and/or proteomic datasets. We compare the performance of RF in identifying relevant attributes when given genetic data alone, proteomic data alone, or a combined dataset of genetic plus proteomic data. Our results indicate that utilizing multiple data types is beneficial when the disease model is complex and the phenotypic outcome-associated data type is unknown. The results of this study also show that RF is adept at identifying relevant features in high-dimensional data with small main effects and low heritability

Published in:
Computational Intelligence and Bioinformatics and Computational Biology, 2006. CIBCB '06. 2006 IEEE Symposium on

Date of Conference: 28-29 Sept. 2006

Need Help?


IEEE Advancing Technology for Humanity About IEEE Xplore | Contact | Help | Terms of Use | Nondiscrimination Policy | Site Map | Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest professional association for the advancement of technology.
© Copyright 2013 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.