Skip to Main Content
BACKGROUND: Sub-health state is a low-quality status between health and disease. The aim of this study was to determine which factors and/or combination of factors could be predictive of sub-health state in female as using random forest method. METHODS: Data were collected through a clinical epidemiology survey and obtained 2992 cases (2507 cases were in sub-health state and 485 cases were in health), in which the female subhealth state cases were 1285 and the female health state cases were 177, respectively. Based on association declined by mutual information, we used a classification technique called Random Forest to predict the sub-health state in female through the analysis of the clinical data. RESULTS: We've obtained the total OOB error rate of 20.06% , namely, the correct classification rate is 79.94%. In other words, there were 10 variables very powerful to discriminate between health state and sub-health state in female. They were the symptoms as follows, Fatigue, Myasthenia of limbs, Amnesia, Dizziness, Dysphoria, Sighing, Hypochondriac distension and pain, Constipation, Swollen sore throat and Premenstrual Distension of Breast. CONCLUSIONS: We suggest data random forest mining method for feature selection in female sub-health state; the main advantage of this method is to select important features that retaining a high predictive accuracy.