By Topic

Proceedings 2001 IEEE International Conference on Data Mining

Nov. 29 2001-Dec. 2 2001

Filter Results

Displaying Results 1 - 25 of 111
  • Proceedings 2001 IEEE International Conference on Data Mining

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (381 KB)
    Freely Available from IEEE
  • AINE: an immunological approach to data mining

    Publication Year: 2001, Page(s):297 - 304
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (707 KB) | HTML iconHTML

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001, Page(s):675 - 677
    Request permission for commercial reuse | PDF file iconPDF (153 KB)
    Freely Available from IEEE
  • Mining the Web with active hidden Markov models

    Publication Year: 2001, Page(s):645 - 646
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (364 KB) | HTML iconHTML

    Given the enormous amounts of information available only in unstructured or semi-structured textual documents, tools for information extraction (IE) have become enormously important. IE tools identify the relevant information in such documents and convert it into a structured format such as a database or an XML document. While first IE algorithms were hand-crafted sets of rules, researchers soon t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A clustering method for very large mixed data sets

    Publication Year: 2001, Page(s):643 - 644
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (305 KB) | HTML iconHTML

    In developed countries, especially over the last decade, there has been an explosive growth in the capability to generate, collect and use very large data sets. The objects of these data sets could be simultaneously described by quantitative and qualitative attributes. At present, algorithms able to process either very large data sets (in metric spaces) or mixed (qualitative and quantitative) inco... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental learning with support vector machines

    Publication Year: 2001, Page(s):641 - 642
    Cited by:  Papers (38)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (273 KB) | HTML iconHTML

    Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The representative basis for association rules

    Publication Year: 2001, Page(s):639 - 640
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (289 KB) | HTML iconHTML

    We define the concept of the representative basis for interesting association rules, and an inference system which is purely qualitative. The representative basis is unique, and minimal with respect to the inference system. On the representative basis, the inference system is correct and complete. Experimental results show that the number of rules in the representative basis is significantly reduc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Inexact field learning: an approach to induce high quality rules from low quality data

    Publication Year: 2001, Page(s):586 - 588
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (362 KB) | HTML iconHTML

    To avoid low quality problems caused by low quality data, the paper introduces an inexact field learning approach which derives rules by working on the fields of attributes with respect to classes, rather than on individual point values of attributes. The experimental results show that field learning achieved a higher prediction accuracy rate on new unseen test cases which is particularly true whe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metric rule generation with septic shock patient data

    Publication Year: 2001, Page(s):637 - 638
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (287 KB) | HTML iconHTML

    The article present an application of metric rule generation in the domain of medical research. We consider intensive care unit patients developing a septic shock during their stay at the hospital. To analyse the patient data, rule generation is embedded in a medical data mining cycle. For rule generation, we improve an architecture based on a growing trapezoidal basis function network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient splitting rules based on the probabilities of pre-assigned intervals

    Publication Year: 2001, Page(s):584 - 585
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (278 KB) | HTML iconHTML

    The paper describes novel methods for classification in order to find an optimal tree. Unlike the current splitting rules that are provided by searching all threshold values, the paper proposes splitting rules that are based on the probabilities of pre-assigned intervals. In experiments, we demonstrate that our methods properly classify image objects based on new split rules View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast algorithm to cluster high dimensional basket data

    Publication Year: 2001, Page(s):633 - 636
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (440 KB) | HTML iconHTML

    Clustering is a data mining problem that has received significant attention by the database community. Data set size, dimensionality and sparsity have been identified as aspects that make clustering more difficult. The article introduces a fast algorithm to cluster large binary data sets where data points have high dimensionality and most of their coordinates are zero. This is the case with basket... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Association rules enhanced classification of underwater acoustic signal

    Publication Year: 2001, Page(s):582 - 583
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (296 KB) | HTML iconHTML

    The classification of underwater acoustic signals is one of the important fields of pattern recognition. Inspired by the experience of training human experts in sonar, we propose a two-phase training algorithm to exploit association rules to reveal understandable intrinsic rules which contribute to correct classification in known mis-classification data sets. Preliminary experimental results demon... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical text classification and evaluation

    Publication Year: 2001, Page(s):521 - 528
    Cited by:  Papers (23)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (772 KB) | HTML iconHTML

    Hierarchical classification refers to the assignment of one or more suitable categories from a hierarchical category space to a document. While previous work in hierarchical classification focused on virtual category trees where documents are assigned only to the leaf categories, we propose a top-down level-based classification method that can classify documents to both leaf and internal categorie... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An experimental comparison of supervised and unsupervised approaches to text summarization

    Publication Year: 2001, Page(s):630 - 632
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (381 KB) | HTML iconHTML

    The paper presents a direct comparison of supervised and unsupervised approaches to text summarization. As a representative supervised method, we use the C4.5 decision tree algorithm, extended with the minimum description length principle (MDL), and compare it against several unsupervised methods. It is found that a particular unsupervised method based on an extension of the K-means clustering alg... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FARM: a framework for exploring mining spaces with multiple attributes

    Publication Year: 2001, Page(s):449 - 456
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (704 KB) | HTML iconHTML

    Mining for frequent itemsets typically involves a preprocessing step in which data with multiple attributes are grouped into transactions, and items are defined based on attribute values. We hake observed that such fixed attribute mining can severely constrain the patterns that are discovered. Herein, we introduce mining spaces, a new framework for mining multi-attribute data that includes the dis... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applications of data mining in hydrology

    Publication Year: 2001, Page(s):617 - 620
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (419 KB) | HTML iconHTML

    Long-term range streamflow forecast plays an invaluable role in water resource planning and management. The potential applicability and limitations of the time series forecasting approach using neural network with the multiresolution learning paradigm (NNMLP) are investigated. The predicted longterm range streamflows using the NNMLP are compared with the observations. The results show that the tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Creating ensembles of classifiers

    Publication Year: 2001, Page(s):580 - 581
    Cited by:  Papers (10)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (294 KB) | HTML iconHTML

    Ensembles of classifiers offer promise in increasing overall classification accuracy. The availability of extremely large datasets has opened avenues for application of distributed and/or parallel learning to efficiently learn models of them. In this paper, distributed learning is done by training classifiers on disjoint subsets of the data. We examine a random partitioning method to create disjoi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Document clustering and cluster topic extraction in multilingual corpora

    Publication Year: 2001, Page(s):513 - 520
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (789 KB) | HTML iconHTML

    A statistics-based approach for clustering documents and for extracting cluster topics is described relevant (meaningful) expressions (REs) automatically extracted from corpora are used as clustering base features. These features are transformed and its number is strongly reduced in order to obtain a small set of document classification features. This is achieved on the basis of principal componen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The EQ framework for learning equivalence classes of Bayesian networks

    Publication Year: 2001, Page(s):417 - 424
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (702 KB) | HTML iconHTML

    This paper proposes a theoretical and an algorithmic framework for the analysis and the design of efficient learning algorithms which explore the space of equivalence classes of Bayesian network structures. This framework is composed of a generic learning model which uses essential graphs and more general partially directed graphs in order to represent the equivalence classes evaluated during sear... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using rough sets theory and database operations to construct a good ensemble of classifiers for data mining applications

    Publication Year: 2001, Page(s):233 - 240
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (763 KB) | HTML iconHTML

    The article presents a novel approach to constructing a good ensemble of classifiers using rough set theory and database operations. Ensembles of classifiers are formulated precisely within the framework of rough set theory and constructed very efficiently by using set-oriented database operations. Our method first computes a set of reducts which include all the indispensable attributes required f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Preparations for semantics-based XML mining

    Publication Year: 2001, Page(s):345 - 352
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (782 KB) | HTML iconHTML

    XML allows users to define elements using arbitrary words and organize them in a nested structure. These features of XML offer both challenges and opportunities in information retrieval, document management, and data mining. In this paper, we propose a new methodology for preparing XML documents for quantitative determination of similarity between XML documents by taking into account XML semantics... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Who links to whom: mining linkage between Web sites

    Publication Year: 2001, Page(s):51 - 58
    Cited by:  Papers (30)  |  Patents (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (775 KB) | HTML iconHTML

    Previous studies of the Web graph structure have focused on the graph structure at the level of individual pages. In actuality the Web is a hierarchically nested graph, with domains, hosts and Web sites introducing intermediate levels of affiliation and administrative control. To better understand the growth of the Web we need to understand its macro-structure, in terms of the linkage between Web ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bayesian data mining on the Web with B-Course

    Publication Year: 2001, Page(s):626 - 629
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (494 KB) | HTML iconHTML

    B-Course is a free Web based Bayesian data mining service. This service allows the users to analyze their own data for multivariate probabilistic dependencies represented as Bayesian network models. In addition to this, B-Course also offers facilities for inferring certain types of causal dependencies from the data. The software is especially suitable for educational purposes as the tutorial style... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • H-mine: hyper-structure mining of frequent patterns in large databases

    Publication Year: 2001, Page(s):441 - 448
    Cited by:  Papers (43)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (878 KB) | HTML iconHTML

    Methods for efficient mining of frequent patterns have been studied extensively by many researchers. However, the previously proposed methods still encounter some performance bottlenecks when mining databases with different data characteristics, such as dense vs. sparse, long vs. short patterns, memory-based vs. disk-based, etc. In this study, we propose a simple and novel hyper-linked data struct... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary structure learning algorithm for Bayesian network and Penalized Mutual Information metric

    Publication Year: 2001, Page(s):615 - 616
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (259 KB) | HTML iconHTML

    The paper formulates the problem of learning Bayesian network structures from data as determining the structure that best approximates the probability distribution indicated by the data. A new metric, Penalized Mutual Information metric, is proposed, and an evolutionary algorithm is designed to search for the best structure among alternatives. The experimental results show that this approach is re... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.