By Topic

Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on

Date Nov. 29 2001-Dec. 2 2001

Filter Results

Displaying Results 1 - 25 of 111
  • Proceedings 2001 IEEE International Conference on Data Mining

    Publication Year: 2001
    Request permission for commercial reuse | PDF file iconPDF (381 KB)
    Freely Available from IEEE
  • AINE: an immunological approach to data mining

    Publication Year: 2001, Page(s):297 - 304
    Cited by:  Papers (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (707 KB) | HTML iconHTML

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001, Page(s):675 - 677
    Request permission for commercial reuse | PDF file iconPDF (153 KB)
    Freely Available from IEEE
  • The EQ framework for learning equivalence classes of Bayesian networks

    Publication Year: 2001, Page(s):417 - 424
    Cited by:  Papers (11)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (702 KB) | HTML iconHTML

    This paper proposes a theoretical and an algorithmic framework for the analysis and the design of efficient learning algorithms which explore the space of equivalence classes of Bayesian network structures. This framework is composed of a generic learning model which uses essential graphs and more general partially directed graphs in order to represent the equivalence classes evaluated during sear... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On mining general temporal association rules in a publication database

    Publication Year: 2001, Page(s):337 - 344
    Cited by:  Papers (5)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (783 KB) | HTML iconHTML

    In this paper, we explore a new problem of mining general temporal association rules in publication databases. In essence, a publication database is a set of transactions where each transaction T is a set of items, each containing an individual exhibition period. The current model of association rule mining is not able to handle a publication database due to the following fundamental problems: (1)... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining mutually dependent patterns

    Publication Year: 2001, Page(s):409 - 416
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (843 KB) | HTML iconHTML

    In some domains, such as isolating problems in computer networks and discovering stock market irregularities, there is more interest in patterns consisting of infrequent, but highly correlated items rather than patterns that occur frequently (as defined by minsup, the minimum support level). We describe the m-pattern, a new pattern that is defined in terms of minp, the minimum probability of mutua... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subject classification in the Oxford English Dictionary

    Publication Year: 2001, Page(s):329 - 336
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (793 KB) | HTML iconHTML

    The Oxford English Dictionary is a valuable source of lexical information and a rich testing ground for mining highly structured text. Each entry is organized into a hierarchy of senses, which include definitions, labels and cited quotations. Subject labels distinguish the subject classification of a sense, for example they signal how a word may be used in anthropology, music or computing. Unfortu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Anchor text mining for translation of Web queries

    Publication Year: 2001, Page(s):401 - 408
    Cited by:  Patents (14)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (749 KB) | HTML iconHTML

    The paper presents an approach to automatically extracting translations of Web query terms through mining of Web anchor texts and link structures. One of the existing difficulties in cross-language information retrieval (CLIR) and Web search is the lack of the appropriate translations of new terminology and proper names. Such a difficult problem can be effectively alleviated by our proposed approa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical considerations in learning from data

    Publication Year: 2001, Page(s):321 - 328
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (830 KB) | HTML iconHTML

    In this paper, we focus on statistics. Classical statistics and Bayesian statistics are both employed in data mining. Both have advantages but both also have severe limitations in this context. We point out some of these limitations as well as some of the advantages. The fact that we may need to take account of evidence both internal and external to the data set presents a difficulty for classical... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The representative basis for association rules

    Publication Year: 2001, Page(s):639 - 640
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (289 KB) | HTML iconHTML

    We define the concept of the representative basis for interesting association rules, and an inference system which is purely qualitative. The representative basis is unique, and minimal with respect to the inference system. On the representative basis, the inference system is correct and complete. Experimental results show that the number of rules in the representative basis is significantly reduc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification with degree of membership: a fuzzy approach

    Publication Year: 2001, Page(s):35 - 42
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (685 KB) | HTML iconHTML

    Classification is an important topic in data mining research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some user-specified thresholds to guide the search for interesting rule... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Preprocessing opportunities in optimal numerical range partitioning

    Publication Year: 2001, Page(s):115 - 122
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (968 KB) | HTML iconHTML

    We show that only segment borders have to be taken into account as cut point candidates when searching for the optimal multisplit of a numerical value range with respect to convex attribute evaluation functions. Segment borders can be found efficiently in a linear-time preprocessing step. With training set error, which is not strictly convex, the data can be preprocessed into an even smaller numbe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • LPMiner: an algorithm for finding frequent itemsets using length-decreasing support constraint

    Publication Year: 2001, Page(s):505 - 512
    Cited by:  Papers (20)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (787 KB) | HTML iconHTML

    Over the years, a variety of algorithms for finding frequent item sets in very large transaction databases has been developed. The key feature in most of these algorithms is that they use a constant support constraint to control the inherently exponential complexity of the problem. In general, item sets that contain only a few items tend to be interesting if they have a high support, whereas long ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Closing the loop: heuristics for autonomous discovery

    Publication Year: 2001, Page(s):393 - 400
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (791 KB) | HTML iconHTML

    Autonomous discovery systems will be able to peruse very large databases more thoroughly than people can. In a companion paper by G.R. Livingston et al. (see ibid., p.385-92, 2001), we describe a general framework for autonomous systems. We present and evaluate heuristics for use in this framework. Although these heuristics were designed for a prototype system, we believe they provide good initial... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining frequent closed itemsets with the frequent pattern list

    Publication Year: 2001, Page(s):653 - 654
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (276 KB) | HTML iconHTML

    The mining of a complete set of frequent itemsets will lead to a huge number of itemsets. Fortunately, this problem can be reduced to the mining of frequent closed itemsets (FCIs), which results in a much smaller number of itemsets. The approaches to mining frequent closed itemsets can be categorized into two groups: those with candidate generation and those without. In this paper, we propose an a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frequent subgraph discovery

    Publication Year: 2001, Page(s):313 - 320
    Cited by:  Papers (162)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (799 KB) | HTML iconHTML

    As data mining techniques are being increasingly applied to non-traditional domains, existing approaches for finding frequent itemsets cannot be used as they cannot model the requirement of these domains. An alternate way of modeling the objects in these data sets is to use graphs. Within that model, the problem of finding frequent patterns becomes that of discovering subgraphs that occur frequent... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applications of data mining in hydrology

    Publication Year: 2001, Page(s):617 - 620
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (419 KB) | HTML iconHTML

    Long-term range streamflow forecast plays an invaluable role in water resource planning and management. The potential applicability and limitations of the time series forecasting approach using neural network with the multiresolution learning paradigm (NNMLP) are investigated. The predicted longterm range streamflows using the NNMLP are compared with the observations. The results show that the tim... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metric rule generation with septic shock patient data

    Publication Year: 2001, Page(s):637 - 638
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (287 KB) | HTML iconHTML

    The article present an application of metric rule generation in the domain of medical research. We consider intensive care unit patients developing a septic shock during their stay at the hospital. To analyse the patient data, rule generation is embedded in a medical data mining cycle. For rule generation, we improve an architecture based on a growing trapezoidal basis function network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable algorithm for clustering sequential data

    Publication Year: 2001, Page(s):179 - 186
    Cited by:  Papers (22)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (934 KB) | HTML iconHTML

    In recent years, we have seen an enormous growth in the amount of available commercial and scientific data. Data from domains such as protein sequences, retail transactions, intrusion detection, and Web-logs have an inherent sequential nature. Clustering of such data sets is useful for various purposes. For example, clustering of sequences from commercial data sets may help marketer identify diffe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating boosting algorithms to classify rare classes: comparison and improvements

    Publication Year: 2001, Page(s):257 - 264
    Cited by:  Papers (40)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (681 KB) | HTML iconHTML

    Classification of rare events has many important data mining applications. Boosting is a promising meta-technique that improves the classification performance of any weak classifier. So far, no systematic study has been conducted to evaluate how boosting performs for the task of mining rare classes. The authors evaluate three existing categories of boosting algorithms from the single viewpoint of ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining constrained association rules to predict heart disease

    Publication Year: 2001, Page(s):433 - 440
    Cited by:  Papers (23)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (821 KB) | HTML iconHTML

    This work describes our experiences in discovering association rules in medical data to predict heart disease. We focus on two aspects of this work: mapping medical data to a transaction format suitable for mining association rules, and identifying useful constraints. Based on these aspects we introduce an improved algorithm to discover constrained association rules. We present an experimental sec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining image features for efficient query processing

    Publication Year: 2001, Page(s):353 - 360
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (812 KB) | HTML iconHTML

    The number of features required to depict an image can be very large. Using all features simultaneously to measure image similarity and to learn image query-concepts can suffer from the problem of dimensionality curse, which degrades both search accuracy and search speed. Regarding search accuracy, the presence of irrelevant features with respect to a query can contaminate similarity measurement, ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A clustering method for very large mixed data sets

    Publication Year: 2001, Page(s):643 - 644
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (305 KB) | HTML iconHTML

    In developed countries, especially over the last decade, there has been an explosive growth in the capability to generate, collect and use very large data sets. The objects of these data sets could be simultaneously described by quantitative and qualitative attributes. At present, algorithms able to process either very large data sets (in metric spaces) or mixed (qualitative and quantitative) inco... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using rule sets to maximize ROC performance

    Publication Year: 2001, Page(s):131 - 138
    Cited by:  Papers (25)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (757 KB) | HTML iconHTML

    Rules are commonly used for classification because they are modular intelligible and easy to learn. Existing work in classification rule learning assumes the goal is to produce categorical classifications to maximize classification accuracy. Recent work in machine learning has pointed out the limitations of classification accuracy: when class distributions are skewed or error costs are unequal, an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic topic identification using webpage clustering

    Publication Year: 2001, Page(s):195 - 202
    Cited by:  Papers (8)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (733 KB) | HTML iconHTML

    Grouping Web pages into distinct topics is one way of organizing the large amount of retrieved information on the Web. In this paper, we report that, based on a similarity metric, which incorporates textual information, hyperlink structure and co-citation relations, an unsupervised clustering method can automatically and effectively identify relevant topics, as shown in experiments on several retr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.