By Topic

Third IEEE International Conference on Data Mining

19-22 Nov. 2003

Filter Results

Displaying Results 1 - 25 of 130
  • The hybrid Poisson aspect model for personalized shopping recommendation

    Publication Year: 2003, Page(s):545 - 548
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2262 KB) | HTML iconHTML

    Predicting an individual customer's likelihood of purchasing a specific item forms the basis of many marketing activities, such as personalized shopping recommendation. Collaborative filtering and association rule mining can be applied to this problem, but in retail supermarkets, the problem becomes particularly challenging because of the sparsity and skewness of transaction data. We present HyPAM... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ontologies improve text document clustering

    Publication Year: 2003, Page(s):541 - 544
    Cited by:  Papers (90)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB) | HTML iconHTML

    Text document clustering plays an important role in providing intuitive navigation and browsing mechanisms by organizing large sets of documents into a small number of meaningful clusters. The bag of words representation used for these clustering methods is often unsatisfactory as it ignores relationships between important terms that do not cooccur literally. In order to deal with the problem, we ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving home automation by discovering regularly occurring device usage patterns

    Publication Year: 2003, Page(s):537 - 540
    Cited by:  Papers (21)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (244 KB) | HTML iconHTML

    The data stream captured by recording inhabitant-device interactions in an environment can be mined to discover significant patterns, which an intelligent agent could use to automate device interactions. However, this knowledge discovery problem is complicated by several challenges, such as excessive noise in the data, data that does not naturally exist as transactions, a need to operate in real t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model-based clustering with soft balancing

    Publication Year: 2003, Page(s):459 - 466
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (411 KB) | HTML iconHTML

    Balanced clustering algorithms can be useful in a variety of applications and have recently attracted increasing research interest. Most recent work, however, addressed only hard balancing by constraining each cluster to have equal or a certain minimum number of data objects. We provide a soft balancing strategy built upon a soft mixture-of-models clustering framework. This strategy constrains the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Direct interesting rule generation

    Publication Year: 2003, Page(s):155 - 162
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (299 KB) | HTML iconHTML

    An association rule generation algorithm usually generates too many rules including a lot of uninteresting ones. Many interestingness criteria are proposed to prune those uninteresting rules. However, they work in post-pruning process and hence do not improve the rule generation efficiency. We discuss properties of informative rule set and conclude that the informative rule set includes all intere... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparing pure parallel ensemble creation techniques against bagging

    Publication Year: 2003, Page(s):533 - 536
    Cited by:  Papers (8)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (239 KB) | HTML iconHTML

    We experimentally evaluate randomization-based approaches to creating an ensemble of decision-tree classifiers. Unlike methods related to boosting, all of the eight approaches considered here create each classifier in an ensemble independently of the other classifiers. Experiments were performed on 28 publicly available datasets, using C4.5 release 8 as the base classifier. While each of the other... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Regression clustering

    Publication Year: 2003, Page(s):451 - 458
    Cited by:  Papers (7)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (318 KB) | HTML iconHTML

    Complex distribution in real-world data is often modeled by a mixture of simpler distributions. Clustering is one of the tools to reveal the structure of this mixture. The same is true to the datasets with chosen response variables that people run regression on. Without separating the clusters with very different response properties, the residue error of the regression is large. Input variable sel... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An algebra for inductive query evaluation

    Publication Year: 2003, Page(s):147 - 154
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (339 KB) | HTML iconHTML

    Inductive queries are queries that generate pattern sets. We study properties of Boolean inductive queries, i.e. queries that are Boolean expressions over monotonic and antimonotonic constraints. More specifically, we introduce and study algebraic operations on the answer sets of such queries and show how these can be used for constructing and optimizing query plans. Special attention is devoted t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The rough set approach to association rule mining

    Publication Year: 2003, Page(s):529 - 532
    Cited by:  Papers (15)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (249 KB) | HTML iconHTML

    In transaction processing, an association is said to exist between two sets of items when a transaction containing one set is likely to also contain the other. In information retrieval, an association between two sets of keywords occurs when they cooccur in a document. Similarly, in data mining, an association occurs when one attribute set occurs together with another. As the number of such associ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A user-driven and quality-oriented visualization for mining association rules

    Publication Year: 2003, Page(s):493 - 496
    Cited by:  Papers (7)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (368 KB) | HTML iconHTML

    On account of the enormous amounts of rules that can be produced by data mining algorithms, knowledge validation is one of the most problematic steps in an association rule discovery process. In order to find relevant knowledge for decision-making, the user needs to really rummage through the rules. Visualization can be very beneficial to support him/her in this task by improving the intelligibili... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ExAMiner: optimized level-wise frequent pattern mining with monotone constraints

    Publication Year: 2003, Page(s):11 - 18
    Cited by:  Papers (12)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (349 KB) | HTML iconHTML

    The key point is that, in frequent pattern mining, the most appropriate way of exploiting monotone constraints in conjunction with frequency is to use them in order to reduce the problem input together with the search space. Following this intuition, we introduce ExAMiner, a level-wise algorithm which exploits the real synergy of antimonotone and monotone constraints: the total benefit is greater ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A K-NN associated fuzzy evidential reasoning classifier with adaptive neighbor selection

    Publication Year: 2003, Page(s):709 - 712
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (339 KB) | HTML iconHTML

    We present a fuzzy evidential reasoning algorithm in light of the Dempster-Shafer evidence theory and the K-nearest neighbor algorithm for pattern classification. Given an input pattern to be classified, each of its K nearest neighbors is viewed as an evidence source, in terms of a fuzzy evidence structure. The distance between the input pattern and each of its K nearest neighbors is used for mass... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • OP-cluster: clustering by tendency in high dimensional space

    Publication Year: 2003, Page(s):187 - 194
    Cited by:  Papers (49)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (319 KB) | HTML iconHTML

    Clustering is the process of grouping a set of objects into classes of similar objects. Because of unknownness of the hidden patterns in the data sets, the definition of similarity is very subtle. Until recently, similarity measures are typically based on distances, e.g Euclidean distance and cosine distance. We propose a flexible yet powerful clustering model, namely OP-cluster (Order Preserving ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CBC: clustering based text classification requiring minimal labeled data

    Publication Year: 2003, Page(s):443 - 450
    Cited by:  Papers (15)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (315 KB) | HTML iconHTML

    Semisupervised learning methods construct classifiers using both labeled and unlabeled training data samples. While unlabeled data samples can help to improve the accuracy of trained models to certain extent, existing methods still face difficulties when labeled data is not sufficient and biased against the underlying data distribution. We present a clustering based classification (CBC) approach. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Localized prediction of continuous target variables using hierarchical clustering

    Publication Year: 2003, Page(s):139 - 146
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (366 KB) | HTML iconHTML

    We propose a novel technique for the efficient prediction of multiple continuous target variables from high-dimensional and heterogeneous data sets using a hierarchical clustering approach. The proposed approach consists of three phases applied recursively: partitioning, localization and prediction. In the partitioning step, similar target variables are grouped together by a clustering algorithm. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast PNN-based clustering using k-nearest neighbor graph

    Publication Year: 2003, Page(s):525 - 528
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (269 KB) | HTML iconHTML

    Search for nearest neighbor is the main source of computation in most clustering algorithms. We propose the use of nearest neighbor graph for reducing the number of candidates. The number of distance calculations per search can be reduced from O(N) to O(k) or where N is the number of clusters, and k is the number of neighbors in the graph. We apply the proposed scheme within agglomerative clusteri... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining relevant text from unlabelled documents

    Publication Year: 2003, Page(s):489 - 492
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (289 KB) | HTML iconHTML

    Automatic classification of documents is an important area of research with many applications in the fields of document searching, forensics and others. Methods to perform classification of text rely on the existence of a sample of documents whose class labels are known. However, in many situations, obtaining this sample may not be an easy (or even possible) task. We focus on the classification of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient multidimensional quantitative hypotheses generation

    Publication Year: 2003, Page(s):3 - 10
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (333 KB) | HTML iconHTML

    Finding local interrelations (hypotheses) among attributes within very large databases of high dimensionality is an acute problem for many databases and data mining applications. These include, dependency modeling, clustering large databases, correlation and link analysis. Traditional statistical methods are concerned with the corroboration of (a set of) hypotheses on a given body of data. Testing... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmenting customer transactions using a pattern-based clustering approach

    Publication Year: 2003, Page(s):411 - 418
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (291 KB) | HTML iconHTML

    Grouping customer transactions into categories helps understand customers better. The marketing literature has concentrated on identifying important segmentation variables (e.g. customer loyalty) and on using clustering and mixture models for segmentation. The data mining literature has provided various clustering algorithms for segmentation. We investigate using "pattern-based" clustering approac... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complex spatial relationships

    Publication Year: 2003, Page(s):227 - 234
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (324 KB) | HTML iconHTML

    We describe the need for mining complex relationships in spatial data. Complex relationships are defined as those involving two or more of: multifeature colocation, self-colocation, one-to-many relationships, self-exclusion and multifeature exclusion. We demonstrate that even in the mining of simple relationships, knowledge of complex relationships is necessary to accurately calculate the signific... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detecting patterns of change using enhanced parallel coordinates visualization

    Publication Year: 2003, Page(s):747 - 750
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB) | HTML iconHTML

    Analyzing data to find trends, correlations, and stable patterns is an important problem for many industrial applications. We propose a new technique based on parallel coordinates visualization. Previous work on parallel coordinates method has shown that they are effective only when variables that are correlated and/or show similar patterns are displayed adjacently. Although current parallel coord... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A feature selection framework for text filtering

    Publication Year: 2003, Page(s):705 - 708
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (261 KB) | HTML iconHTML

    We present a new framework for local feature selection in text filtering. In this framework, a feature set is constructed per category by first selecting a set of terms highly indicative of membership (positive set) and another set of terms highly indicative of nonmembership (negative set), and then combining these two sets. This feature selection framework not only unifies several standard featur... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Center-based indexing for nearest neighbors search

    Publication Year: 2003, Page(s):681 - 684
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (258 KB) | HTML iconHTML

    We address the problem of indexing data for the k nearest neighbors (k-nn) search. We present a tree-based top-down indexing method that uses an iterative k-means algorithm for tree node splitting and combines three different search pruning criteria from BST, GHT and GNAT into one. The experiments show that the presented indexing tree accelerates the k-nn searching up to several thousands times in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building text classifiers using positive and unlabeled examples

    Publication Year: 2003, Page(s):179 - 186
    Cited by:  Papers (97)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (288 KB) | HTML iconHTML

    We study the problem of building text classifiers using positive and unlabeled examples. The key feature of this problem is that there is no negative example for learning. Recently, a few techniques for solving this problem were proposed in the literature. These techniques are based on the same idea, which builds a classifier in two steps. Each existing technique uses a different method for each s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-sensitive learning by cost-proportionate example weighting

    Publication Year: 2003, Page(s):435 - 442
    Cited by:  Papers (111)  |  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (350 KB) | HTML iconHTML

    We propose and evaluate a family of methods for converting classifier learning algorithms and classification theory into cost-sensitive algorithms and theory. The proposed conversion is based on cost-proportionate weighting of the training examples, which can be realized either by feeding the weights to the classification algorithm (as often done in boosting), or by careful subsampling. We give so... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.