Skip to Main Content
A generalized strategy is developed to predict the occurrence of a multicategory–multifactorial disease from a set of medical risk factors that are most often used to screen patients for the disease. The prediction problem is formulated as an -class classification problem. The strategy employs fusion to combine risk factors into a single feature vector, normalization to fuse risk factors which have different formats and ranges, rank-sum ordering for feature selection, discrete Karhunen–Loeve transform-based transformation to facilitate parametric classifier development, and the design of parametric classifiers. Two methods, which differ on how the features are selected, are developed. In the first method, features are selected from a set consisting of linear combinations of all risk factors. In the second method, the features are linear combinations of a preselected subset of the risk factors. The methods are applied to predict the occurrence of Alzheimer's disease (AD) into three classes: Probable-AD, Possible-AD, and Uncertain. It is shown that a classification accuracy of over 71% can be obtained. This result is quite encouraging given that AD is very difficult to clinically diagnose. Higher classification accuracies can be expected for diseases that are not as complex to diagnose as AD. Most importantly, it is concluded that the generalized strategy can not only be applied to the multicategory–multifactorial disease prediction problem but also to other multiclass pattern recognition problems involving diverse information collected from different sources.