By Topic

Data Mining, 2001. ICDM 2001, Proceedings IEEE International Conference on

Date Nov. 29 2001-Dec. 2 2001

Filter Results

Displaying Results 1 - 25 of 111
  • Proceedings 2001 IEEE International Conference on Data Mining

    Publication Year: 2001
    Save to Project icon | Request Permissions | PDF file iconPDF (381 KB)  
    Freely Available from IEEE
  • AINE: an immunological approach to data mining

    Publication Year: 2001 , Page(s): 297 - 304
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (707 KB) |  | HTML iconHTML  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2001 , Page(s): 675 - 677
    Save to Project icon | Request Permissions | PDF file iconPDF (153 KB)  
    Freely Available from IEEE
  • Mining the Web with active hidden Markov models

    Publication Year: 2001 , Page(s): 645 - 646
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (364 KB) |  | HTML iconHTML  

    Given the enormous amounts of information available only in unstructured or semi-structured textual documents, tools for information extraction (IE) have become enormously important. IE tools identify the relevant information in such documents and convert it into a structured format such as a database or an XML document. While first IE algorithms were hand-crafted sets of rules, researchers soon t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A simple KNN algorithm for text categorization

    Publication Year: 2001 , Page(s): 647 - 648
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (310 KB) |  | HTML iconHTML  

    Text categorization (also called text classification) is the process of identifying the class to which a text document belongs. This paper proposes to use a simple non-weighted features KNN algorithm for text categorization. We propose to use a feature selection method that finds the relevant features for the learning task at hand using feature interaction (based on word interdependencies). This w... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Learning automatic acquisition of subcategorization frames using Bayesian inference and support vector machines

    Publication Year: 2001 , Page(s): 623 - 625
    Cited by:  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    Learning Bayesian belief networks (BBN) from corpora and support vector machines (SVM) have been applied to the automatic acquisition of verb subcategorization frames for Modern Greek. We are incorporating minimal linguistic resources, i.e. basic morphological tagging and phrase chunking, to demonstrate that verb subcategorization, which is of great significance for developing robust natural langu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The representative basis for association rules

    Publication Year: 2001 , Page(s): 639 - 640
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (289 KB) |  | HTML iconHTML  

    We define the concept of the representative basis for interesting association rules, and an inference system which is purely qualitative. The representative basis is unique, and minimal with respect to the inference system. On the representative basis, the inference system is correct and complete. Experimental results show that the number of rules in the representative basis is significantly reduc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Incremental learning with support vector machines

    Publication Year: 2001 , Page(s): 641 - 642
    Cited by:  Papers (25)  |  Patents (8)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (273 KB) |  | HTML iconHTML  

    Support vector machines (SVMs) have become a popular tool for machine learning with large amounts of high dimensional data. In this paper an approach for incremental learning with support vector machines is presented, that improves the existing approach of Syed et al. (1999). An insight into the interpretability of support vectors is also given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theory and applications of attribute decomposition

    Publication Year: 2001 , Page(s): 473 - 480
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (740 KB) |  | HTML iconHTML  

    This paper examines the attribute decomposition approach with simple Bayesian combination for dealing with classification problems that contain high number of attributes and moderate numbers of records. According to the attribute decomposition approach, the set of input attributes is automatically decomposed into several subsets. A classification model is built for each subset, then all the models... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distance measures for effective clustering of ARIMA time-series

    Publication Year: 2001 , Page(s): 273 - 280
    Cited by:  Papers (29)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (903 KB) |  | HTML iconHTML  

    Much environmental and socioeconomic time-series data can be adequately modeled using autoregressive integrated moving average (ARIMA) models. We call such time series "ARIMA time series". We propose the use of the linear predictive coding (LPC) cepstrum for clustering ARIMA time series, by using the Euclidean distance between the LPC cepstra of two time series as their dissimilarity measure. We d... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Web cartography for online site promotion: an algorithm for clustering Web resources

    Publication Year: 2001 , Page(s): 529 - 535
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (740 KB) |  | HTML iconHTML  

    Presents a Web cartography approach to be used in the context of online site promotion. The overall objective is to provide users with handy maps offering information about candidate sites for the creation of hyperlinks that enable a large flow of targeted visitors. Two main types of data must be considered: texts and hyperlinks. We propose to exploit the latter to construct a relevant corpus on w... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification with degree of membership: a fuzzy approach

    Publication Year: 2001 , Page(s): 35 - 42
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (685 KB) |  | HTML iconHTML  

    Classification is an important topic in data mining research. It is concerned with the prediction of the values of some attribute in a database based on other attributes. To tackle this problem, most of the existing data mining algorithms adopt either a decision tree based approach or an approach that requires users to provide some user-specified thresholds to guide the search for interesting rule... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Indiscernibility degree of objects for evaluating simplicity of knowledge in the clustering procedure

    Publication Year: 2001 , Page(s): 211 - 217
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (580 KB) |  | HTML iconHTML  

    The paper presents a novel, rough set-based clustering method that enables the evaluation of classification knowledge simplicity during the clustering procedure. The method iteratively refines equivalence relations so that they become a more simple set of relations that give adequate coarse classification to the objects. At each step of the iteration, the importance of the equivalence relation is ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FIExPat: flexible extraction of sequential patterns

    Publication Year: 2001 , Page(s): 481 - 488
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (942 KB) |  | HTML iconHTML  

    This paper addresses sequential data mining, a sub-area of data mining where the data to be analyzed is organized in sequences. In many problem domains a natural ordering exists over data. Examples of sequential databases (SDBs) include: (a) collections of temporal data sequences, such as chronological series of daily stock indices or multimedia data (sound, music, video, etc.); and (b) macromolec... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining mutually dependent patterns

    Publication Year: 2001 , Page(s): 409 - 416
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (843 KB) |  | HTML iconHTML  

    In some domains, such as isolating problems in computer networks and discovering stock market irregularities, there is more interest in patterns consisting of infrequent, but highly correlated items rather than patterns that occur frequently (as defined by minsup, the minimum support level). We describe the m-pattern, a new pattern that is defined in terms of minp, the minimum probability of mutua... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining decision trees from data streams in a mobile environment

    Publication Year: 2001 , Page(s): 281 - 288
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (807 KB) |  | HTML iconHTML  

    This paper presents a novel Fourier analysis-based technique to aggregate, communicate and visualize decision trees in a mobile environment. A Fourier representation of a decision tree has several useful properties that are particularly useful for mining continuous data streams from small mobile computing devices. This paper presents algorithms to compute the Fourier spectrum of a decision tree an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using boosting to simplify classification models

    Publication Year: 2001 , Page(s): 558 - 565
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (594 KB) |  | HTML iconHTML  

    Ensemble classification techniques such as bagging, boosting and arcing algorithms have been shown to lead to reduced classification errors on unseen cases and seem immune to the problem of overfitting. Several explanations for the reduction in generalisation error have been presented, with recent authors defining and applying diagnostics such as "edge" and "margin". These measures provide insight... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discovering representative episodal association rules from event sequences using frequent closed episode sets and event constraints

    Publication Year: 2001 , Page(s): 603 - 606
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (454 KB) |  | HTML iconHTML  

    Discovering association rules from time-series data is an important data mining problem. The number of potential rules grows quickly as the number of items in the antecedent grows. It is therefore difficult for an expert to analyze the rules and identify the useful. An approach for generating representative association rules for transactions that uses only a subset of the set of frequent itemsets ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Statistical considerations in learning from data

    Publication Year: 2001 , Page(s): 321 - 328
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (830 KB) |  | HTML iconHTML  

    In this paper, we focus on statistics. Classical statistics and Bayesian statistics are both employed in data mining. Both have advantages but both also have severe limitations in this context. We point out some of these limitations as well as some of the advantages. The fact that we may need to take account of evidence both internal and external to the data set presents a difficulty for classical... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using artificial anomalies to detect unknown and known network intrusions

    Publication Year: 2001 , Page(s): 123 - 130
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (763 KB) |  | HTML iconHTML  

    Intrusion detection systems (IDSs) must be capable of detecting new and unknown attacks, or anomalies. We study the problem of building detection models for both pure anomaly detection and combined misuse and anomaly detection (i.e., detection of both known and unknown intrusions). We propose an algorithm to generate artificial anomalies to coerce the inductive learner into discovering an accurate... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Classification through maximizing density

    Publication Year: 2001 , Page(s): 655 - 656
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (236 KB) |  | HTML iconHTML  

    This paper presents a novel method for classification, which makes use of models built by the lattice machine (LM). The LM approximates data resulting in, as a model of data, a set of hyper tuples that are equilabelled, supported and maximal. The method presented uses the LM model of data to classify new data with a view to maximising the density of the model. Experiments show that this method, wh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Provably fast training algorithms for support vector machines

    Publication Year: 2001 , Page(s): 43 - 50
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (810 KB) |  | HTML iconHTML  

    Support vector machines are a family of data analysis algorithms based on convex quadratic programming. We focus on their use for classification: in that case, the SVM algorithms work by maximizing the margin of a classifying hyperplane in a feature space. The feature space is handled by means of kernels if the problems are formulated in dual form. Random sampling techniques successfully used for ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metric rule generation with septic shock patient data

    Publication Year: 2001 , Page(s): 637 - 638
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (287 KB) |  | HTML iconHTML  

    The article present an application of metric rule generation in the domain of medical research. We consider intensive care unit patients developing a septic shock during their stay at the hospital. To analyse the patient data, rule generation is embedded in a medical data mining cycle. For rule generation, we improve an architecture based on a growing trapezoidal basis function network View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concise representation of frequent patterns based on disjunction-free generators

    Publication Year: 2001 , Page(s): 305 - 312
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (725 KB) |  | HTML iconHTML  

    Many data mining problems require the discovery of frequent patterns in order to be solved. Frequent itemsets are useful in the discovery of association rules, episode rules, sequential patterns and clusters. The number of frequent itemsets is usually huge. Therefore, it is important to work out concise representations of frequent itemsets. We describe three basic lossless representations of frequ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining coverage-based fuzzy rules by evolutional computation

    Publication Year: 2001 , Page(s): 218 - 224
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandAbstract | PDF file iconPDF (631 KB) |  | HTML iconHTML  

    The authors propose a novel mining approach based on the genetic process and an evaluation mechanism to automatically construct an effective fuzzy rule base. The proposed approach consists of three phases: fuzzy-rule generating, fuzzy-rule encoding and fuzzy-rule evolution. In the fuzzy-rule generating phase, a number of fuzzy rules are randomly generated. In the fuzzy-rule encoding phase, all the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.