Loading [MathJax]/extensions/MathMenu.js
DPPred: An Effective Prediction Framework with Concise Discriminative Patterns | IEEE Journals & Magazine | IEEE Xplore

DPPred: An Effective Prediction Framework with Concise Discriminative Patterns


Abstract:

In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized line...Show More

Abstract:

In the literature, two series of models have been proposed to address prediction problems including classification and regression. Simple models, such as generalized linear models, have ordinary performance but strong interpretability on a set of simple features. The other series, including tree-based models, organize numerical, categorical, and high dimensional features into a comprehensive structure with rich interpretable information in the data. In this paper, we propose a novel Discriminative Pattern-based Prediction framework (DPPred) to accomplish the prediction tasks by taking their advantages of both effectiveness and interpretability. Specifically, DPPred adopts the concise discriminative patterns that are on the prefix paths from the root to leaf nodes in the tree-based models. DPPred selects a limited number of the useful discriminative patterns by searching for the most effective pattern combination to fit generalized linear models. Extensive experiments show that in many scenarios, DPPred provides competitive accuracy with the state-of-the-art as well as the valuable interpretability for developers and experts. In particular, taking a clinical application dataset as a case study, our DPPred outperforms the baselines by using only 40 concise discriminative patterns out of a potentially exponentially large set of patterns.
Published in: IEEE Transactions on Knowledge and Data Engineering ( Volume: 30, Issue: 7, 01 July 2018)
Page(s): 1226 - 1239
Date of Publication: 28 September 2017

ISSN Information:

PubMed ID: 30745791

Funding Agency:


1 Introduction

Accuracy and interpretability are two desired goals in predictive modeling, including both classification and regression. Previous work can be characterized into two lines. One line has ordinary performance with strong interpretability on a set of simple features, but meets a serious bottleneck when modeling complex high-order interactions between features, such as linear regression, logistic regression  [18], and support vector machine [34] . The other line consists of models that are more often studied for their high accuracy, for example, tree-based models including random forest [2] and gradient boosted trees  [16] as well as the neural network models  [20], which model nonlinear relationships with high-order combinations of different features. However, their lower interpretability and high complexity prevent practitioners from deploying in practice [18]. In the real-world scientific and medical applications which require both intuitive understanding of the features and high accuracies, the practitioners are not satisfied with neither line of models, and thus, it is important and challenging to develop an effective prediction framework with high interpretability when dealing with high-order interactions with features.

Contact IEEE to Subscribe

References

References is not available for this document.