Proceedings 2001 IEEE International Conference on Data Mining

Nov. 29 2001-Dec. 2 2001

Filter Results

Displaying Results 1 - 25 of 111
  • Proceedings 2001 IEEE International Conference on Data Mining

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (381 KB)
    Freely Available from IEEE
  • AINE: an immunological approach to data mining

    Publication Year: 2001, Page(s):297 - 304
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (707 KB) | HTML iconHTML

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001, Page(s):675 - 677
    Request permission for commercial reuse | PDF file iconPDF (153 KB)
    Freely Available from IEEE
  • Visualizing association mining results through hierarchical clusters

    Publication Year: 2001, Page(s):425 - 432
    Cited by:  Papers (3)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (752 KB) | HTML iconHTML

    We propose a new methodology for visualizing association mining results. Inter-item distances are computed from combinations of itemset supports. The new distances retain a simple pairwise structure, and are consistent with important frequently occurring itemsets. Thus standard tools of visualization, e.g. hierarchical clustering dendrograms can still be applied, while the distance information upo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Preprocessing opportunities in optimal numerical range partitioning

    Publication Year: 2001, Page(s):115 - 122
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (968 KB) | HTML iconHTML

    We show that only segment borders have to be taken into account as cut point candidates when searching for the optimal multisplit of a numerical value range with respect to convex attribute evaluation functions. Segment borders can be found efficiently in a linear-time preprocessing step. With training set error, which is not strictly convex, the data can be preprocessed into an even smaller numbe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data analysis and mining in ordered information tables

    Publication Year: 2001, Page(s):497 - 504
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (703 KB) | HTML iconHTML

    Many real-world problems deal with ordering objects instead of classifying objects, although the majority of the research in machine learning and data mining has been focused on the latter. For the modeling of ordering problems, we generalize the notion of information tables to ordered information tables by adding order relations on attribute values. The problem of mining ordering rules is formula... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental learning of Bayesian networks with hidden variables

    Publication Year: 2001, Page(s):651 - 652
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (273 KB) | HTML iconHTML

    An incremental method for learning Bayesian networks based on evolutionary computing, IEMA, is put forward. IEMA introduces the evolutionary algorithm and EM algorithm into the process of incremental learning; it can avoid getting into local maxima, and also incrementally learn Bayesian networks with high accuracy in the presence of missing values and hidden variables. In addition, we improved the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary structure learning algorithm for Bayesian network and Penalized Mutual Information metric

    Publication Year: 2001, Page(s):615 - 616
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (259 KB) | HTML iconHTML

    The paper formulates the problem of learning Bayesian network structures from data as determining the structure that best approximates the probability distribution indicated by the data. A new metric, Penalized Mutual Information metric, is proposed, and an evolutionary algorithm is designed to search for the best structure among alternatives. The experimental results show that this approach is re... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inexact field learning: an approach to induce high quality rules from low quality data

    Publication Year: 2001, Page(s):586 - 588
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (362 KB) | HTML iconHTML

    To avoid low quality problems caused by low quality data, the paper introduces an inexact field learning approach which derives rules by working on the fields of attributes with respect to classes, rather than on individual point values of attributes. The experimental results show that field learning achieved a higher prediction accuracy rate on new unseen test cases which is particularly true whe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The EQ framework for learning equivalence classes of Bayesian networks

    Publication Year: 2001, Page(s):417 - 424
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (702 KB) | HTML iconHTML

    This paper proposes a theoretical and an algorithmic framework for the analysis and the design of efficient learning algorithms which explore the space of equivalence classes of Bayesian network structures. This framework is composed of a generic learning model which uses essential graphs and more general partially directed graphs in order to represent the equivalence classes evaluated during sear... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Who links to whom: mining linkage between Web sites

    Publication Year: 2001, Page(s):51 - 58
    Cited by:  Papers (30)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (775 KB) | HTML iconHTML

    Previous studies of the Web graph structure have focused on the graph structure at the level of individual pages. In actuality the Web is a hierarchically nested graph, with domains, hosts and Web sites introducing intermediate levels of affiliation and administrative control. To better understand the growth of the Web we need to understand its macro-structure, in terms of the linkage between Web ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A min-max cut algorithm for graph partitioning and data clustering

    Publication Year: 2001, Page(s):107 - 114
    Cited by:  Papers (191)  |  Patents (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (800 KB) | HTML iconHTML

    An important application of graph partitioning is data clustering using a graph model - the pairwise similarities between all data objects form a weighted graph adjacency matrix that contains all necessary information for clustering. In this paper, we propose a new algorithm for graph partitioning with an objective function that follows the min-max clustering principle. The relaxed version of the ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Time series segmentation for context recognition in mobile devices

    Publication Year: 2001, Page(s):203 - 210
    Cited by:  Papers (48)  |  Patents (18)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (689 KB) | HTML iconHTML

    Recognizing the context of use is important in making mobile devices as simple to use as possible. Finding out what the user's situation is can help the device and underlying service in providing an adaptive and personalized user interface. The device can infer parts of the context of the user from sensor data: the mobile device can include sensors for acceleration, noise level, luminosity, humidi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interestingness, peculiarity, and multi-database mining

    Publication Year: 2001, Page(s):566 - 573
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (765 KB) | HTML iconHTML

    In order to discover new, surprising, interesting patterns hidden in data, peculiarity oriented mining and multidatabase mining are required. In the paper, we introduce peculiarity rules as a new class of rules, which can be discovered from a relatively low number of peculiar data by searching the relevance among the peculiar data. We give a formal interpretation and comparison of three classes of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Neural analysis of mobile radio access network

    Publication Year: 2001, Page(s):457 - 464
    Cited by:  Papers (10)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (900 KB) | HTML iconHTML

    The self-organizing map (SOM) is an efficient tool for visualization and clustering of multidimensional data. It transforms the input vectors on two-dimensional grid of prototype vectors and orders them. The ordered prototype vectors are easier to visualize and explore than the original data. Mobile networks produce a huge amount of spatiotemporal data. The data consists of parameters of base stat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian data mining on the Web with B-Course

    Publication Year: 2001, Page(s):626 - 629
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (494 KB) | HTML iconHTML

    B-Course is a free Web based Bayesian data mining service. This service allows the users to analyze their own data for multivariate probabilistic dependencies represented as Bayesian network models. In addition to this, B-Course also offers facilities for inferring certain types of causal dependencies from the data. The software is especially suitable for educational purposes as the tutorial style... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications

    Publication Year: 2001, Page(s):233 - 240
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (763 KB) | HTML iconHTML

    The article presents a novel approach to constructing a good ensemble of classifiers using rough set theory and database operations. Ensembles of classifiers are formulated precisely within the framework of rough set theory and constructed very efficiently by using set-oriented database operations. Our method first computes a set of reducts which include all the indispensable attributes required f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interestingness preprocessing

    Publication Year: 2001, Page(s):489 - 496
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (874 KB) | HTML iconHTML

    As the size of databases increases, the number of rules mined from them also increases, often to an extent that overwhelms users. To address this problem, an important part of the knowledge discovery in databases (KDD) process is dedicated to determining which of these patterns is interesting. In this paper, we define the interestingness pre-processing (IPP) step and introduce a new framework for ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Measuring real-time predictive models

    Publication Year: 2001, Page(s):649 - 650
    Cited by:  Papers (1)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (262 KB) | HTML iconHTML

    In this paper we examine the problem of comparing real-time predictive models and propose a number of measures for selecting the best model, based on a combination of accuracy, timeliness, and cost. We apply the measure to the real-time attrition problem View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing the interestingness of association rules from the temporal dimension

    Publication Year: 2001, Page(s):377 - 384
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (776 KB) | HTML iconHTML

    Rule discovery is one of the central tasks of data mining. Existing research has produced many algorithms for the purpose. These algorithms, however, often generate too many rules. In the past few years, rule interestingness techniques were proposed to help the user find interesting rules. These techniques typically employ the dataset as a whole to mine rules, and then filter and/or rank the disco... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining California vital statistics data

    Publication Year: 2001, Page(s):671 - 672
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (301 KB) | HTML iconHTML

    Vital statistics data offer a fertile ground for data mining. The authors discuss the results of a data mining project on the causes of death aspect of the vital statistics data in the state of California. A data mining tool called Cubist is used to build predictive models out of two million cases over a nine-year period. The objective of our study is to discover knowledge that can be used to gain... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heuristic optimization for decentralized frequent itemset counting

    Publication Year: 2001, Page(s):613 - 614
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (298 KB) | HTML iconHTML

    The choices for mining of decentralized data are numerous, and we have developed techniques to enumerate and optimize decentralized frequent itemset counting. We introduce our heuristic approach to improve the performance of such techniques developed in ways similar to query processing in database systems. We also describe empirical results that validate our heuristic techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparisons of classification methods for screening potential compounds

    Publication Year: 2001, Page(s):11 - 18
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (798 KB) | HTML iconHTML

    We compare a number of data mining and statistical methods on the drug design problem of modeling molecular structure-activity relationships. The relationships can be used to identify active compounds based on their chemical structures from a large inventory of chemical compounds. The data set of this application has a highly skewed class distribution, in which only 2% of the compounds are conside... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Significance tests for patterns in continuous data

    Publication Year: 2001, Page(s):67 - 74
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (745 KB) | HTML iconHTML

    The authors consider the question of uncertainty of detected patterns in data mining. In particular, we develop statistical tests for patterns found in continuous data, indicating the significance of these patterns in terms of the probability that they have occurred by chance. We examine the performance of these tests on patterns detected in several large data sets, including a data set describing... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Creating ensembles of classifiers

    Publication Year: 2001, Page(s):580 - 581
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (294 KB) | HTML iconHTML

    Ensembles of classifiers offer promise in increasing overall classification accuracy. The availability of extremely large datasets has opened avenues for application of distributed and/or parallel learning to efficiently learn models of them. In this paper, distributed learning is done by training classifiers on disjoint subsets of the data. We examine a random partitioning method to create disjoi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.