Clinical prediction rules play an important role in medical practice. They expedite diagnosis and limit unnecessary tests. However, the rule creation process is time consuming and expensive. With the current developments of efficient data mining algorithms and growing accessibility to medical data, the creation of clinical rules can be supported by automated rule induction from data. A data-driven method based on the reuse of previously collected medical records and clinical trial statistics is cost-effective; however, it requires well defined and intelligent methods for data analysis. This paper presents a new framework for knowledge representation for secondary data analysis and for generation of a new typicality measure, which integrates medical knowledge into statistical analysis. The framework is based on a semiotic approach for contextual knowledge and fuzzy logic for approximate knowledge. This semio-fuzzy framework has been applied to the analysis of predictors for the diagnosis of obstructive sleep apnea. This approach was tested on two clinical data sets. Medical knowledge was represented by a set of facts and fuzzy rules, and used to perform statistical analysis. Statistical methods provided several candidate outliers. Our new typicality measure identified those, which were medically significant, in the sense that the removal of those important outliers improved the descriptive model. This is a critical preprocessing step towards automated induction of predictive rules from data. These experimental results demonstrate that knowledge-based methods integrated with statistical approaches provide a practical framework to support the generation of clinical prediction rules.