By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 10 • Date Oct. 2006

Filter Results

Displaying Results 1 - 15 of 15
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (143 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (98 KB)  
    Freely Available from IEEE
  • Segmenting Customers from Population to Individuals: Does 1-to-1 Keep Your Customers Forever?

    Page(s): 1297 - 1311
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3008 KB) |  | HTML iconHTML  

    There have been various claims made in the marketing community about the benefits of 1-to-1 marketing versus traditional customer segmentation approaches and how much they can improve understanding of customer behavior. However, few rigorous studies exist that systematically compare these approaches. In this paper, we conducted such a study and compared the predictive performance of aggregate, segmentation, and 1-to-1 marketing approaches across a broad range of experimental settings, such as multiple segmentation levels, multiple real-world marketing data sets, multiple dependent variables, different types of classifiers, different segmentation techniques, and different predictive measures. Our experiments show that both 1-to-1 and segmentation approaches significantly outperform aggregate modeling. Reaffirming anecdotal evidence of the benefits of 1-to-1 marketing, our experiments show that the 1-to-1 approach also dominates the segmentation approach for the frequently transacting customers. However, our experiments also show that segmentation models taken at the best granularity levels dominate 1-to-1 models when modeling customers with little transactional data using effective clustering methods. In addition, the peak performance of segmentation models are reached at the finest granularity levels, skewed towards the 1-to-1 case. This finding adds support for the microsegmentation approach and suggests that 1-to-1 marketing may not always be the best solution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Feature Reduction via Generalized Uncorrelated Linear Discriminant Analysis

    Page(s): 1312 - 1322
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1626 KB) |  | HTML iconHTML  

    High-dimensional data appear in many applications of data mining, machine learning, and bioinformatics. Feature reduction is commonly applied as a preprocessing step to overcome the curse of dimensionality. Uncorrelated linear discriminant analysis (ULDA) was recently proposed for feature reduction. The extracted features via ULDA were shown to be statistically Uncorrelated, which is desirable for many applications. In this paper, an algorithm called ULDA/QR is proposed to simplify the previous implementation of ULDA. Then, the ULDA/GSVD algorithm is proposed, based on a novel optimization criterion, to address the singularity problem which occurs in undersampled problems, where the data dimension is larger than the sample size. The criterion used is the regularized version of the one in ULDA/QR. Surprisingly, our theoretical result shows that the solution to ULDA/GSVD is independent of the value of the regularization parameter. Experimental results on various types of data sets are reported to show the effectiveness of the proposed algorithm and to compare it with other commonly used feature reduction algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Joinless Approach for Mining Spatial Colocation Patterns

    Page(s): 1323 - 1337
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3782 KB) |  | HTML iconHTML  

    Spatial colocations represent the subsets of features which are frequently located together in geographic space. Colocation pattern discovery presents challenges since spatial objects are embedded in a continuous space, whereas classical data is often discrete. A large fraction of the computation time is devoted to identifying the instances of colocation patterns. We propose a novel joinless approach for efficient colocation pattern mining. The jotnless colocation mining algorithm uses an instance-lookup scheme instead of an expensive spatial or instance join operation for identifying colocation instances. We prove the joinless algorithm is correct and complete in finding colocation rules. We also describe a partial join approach for spatial data which are clustered in neighborhood areas. We provide the algebraic cost models to characterize the performance dominance zones of the joinless method and the partial join method with a current join-based colocation mining method, and compare their computational complexities. In the experimental evaluation, using synthetic and real-world data sets, our methods performed more efficiently than the join-based method and show more scalability in dense data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multilabel Neural Networks with Applications to Functional Genomics and Text Categorization

    Page(s): 1338 - 1351
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4028 KB)  

    In multilabel learning, each instance in the training set is associated with a set of labels and the task is to output a label set whose size is unknown a priori for each unseen instance. In this paper, this problem is addressed in the way that a neural network algorithm named BP-MLL, i.e., backpropagation for multilabel learning, is proposed. It is derived from the popular backpropagation algorithm through employing a novel error function capturing the characteristics of multilabel learning, i.e., the labels belonging to an instance should be ranked higher than those not belonging to that instance. Applications to two real-world multilabel learning problems, i.e., functional genomics and text categorization, show that the performance of BP-MLL is superior to that of some well-established multilabel learning algorithms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Achieving Communication Efficiency through Push-Pull Partitioning of Semantic Spaces to Disseminate Dynamic Information

    Page(s): 1352 - 1367
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1951 KB) |  | HTML iconHTML  

    Many database applications that need to disseminate dynamic information from a server to various clients can suffer from heavy communication costs. Data caching at a client can help mitigate these costs, particularly when individual PUSH-PULL decisions are made for the different semantic regions in the data space. The server is responsible for notifying the client about updates in the PUSH regions. The client needs to contact the server for queries that ask for data in the PULL regions. We call the idea of partitioning the data space into PUSH-PULL regions to minimize communication cost data gerrymandering. In this paper, we present solutions to technical challenges in adopting this simple but powerful idea. We give a provably optimal-cost dynamic programming algorithm for gerrymandering on a single query attribute. We propose a family of efficient heuristics for gerrymandering on multiple query attributes. We handle the dynamic case in which the workloads of queries and updates evolve over time. We validate our methods through extensive experiments on real and synthetic data sets View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Effective and Efficient Exact Match Retrieval Scheme for Symbolic Image Database Systems Based on Spatial Reasoning: A Logarithmic Search Time Approach

    Page(s): 1368 - 1381
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4837 KB) |  | HTML iconHTML  

    In this paper, a novel method of representing symbolic images in a symbolic image database (SID) invariant to image transformations that is useful for exact match retrieval is presented. The relative spatial relationships existing among the components present in an image are perceived with respect to the direction of reference and preserved by a set of triples. A distinct and unique key is computed for each distinct triple. The mean and standard deviation of the set of keys computed for a symbolic image are stored along with the total number of keys as the representatives of the corresponding image. The proposed exact match retrieval scheme is based on a modified binary search technique and, thus, requires O (logn) search time in the worst case, where n is the total number of symbolic images in the SID. An extensive experimentation on a large database of 22,630 symbolic images is conducted to corroborate the superiority of the model. The effectiveness of the proposed representation scheme is tested with standard testbed images View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Multiresolution Terrain Model for Efficient Visualization Query Processing

    Page(s): 1382 - 1396
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2480 KB) |  | HTML iconHTML  

    Multiresolution triangular mesh (MTM) models are widely used to improve the performance of large terrain visualization by replacing the original model with a simplified one. MTM models, which consist of both original and simplified data, are commonly stored in spatial database systems due to their size. The relatively slow access speed of disks makes data retrieval the bottleneck of such terrain visualization systems. Existing spatial access methods proposed to address this problem rely on main-memory MTM models, which leads to significant overhead during query processing. In this paper, we approach the problem from a new perspective and propose a novel MTM called direct mesh that is designed specifically for secondary storage. It supports available indexing methods natively and requires no modification to MTM structure. Experiment results, which are based on two real-world data sets, show an average performance improvement of 5-10 times over the existing methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fuzzy Sets Defined on a Hierarchical Domain

    Page(s): 1397 - 1410
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1658 KB) |  | HTML iconHTML  

    This paper presents a new type of fuzzy sets, called "hierarchical fuzzy sets", that apply when the considered domain of values is not "flat," but contains values that are more specific than others according to the "kind of" relation. We study the properties of such fuzzy sets, that can be defined in a short way on a part of the hierarchy, or exhaustively (by their "closure") on the whole hierarchy. We show that hierarchical fuzzy sets form equivalence classes in regard to their closures and that each class has a particular representative called "minimal fuzzy set". We propose a use of this minimal fuzzy set for query enlargement purposes and, thus, present a methodology for hierarchical fuzzy set generalization. We finally present an experimental evaluation of the theoretical results described in the paper, in a practical application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Survey of Web Information Extraction Systems

    Page(s): 1411 - 1428
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4446 KB) |  | HTML iconHTML  

    The Internet presents a huge amount of useful information which is usually formatted for its users, which makes it difficult to extract relevant data from various sources. Therefore, the availability of robust, flexible information extraction (IE) systems that transform the Web pages into program-friendly structures such as a relational database will become a great necessity. Although many approaches for data extraction from Web pages have been developed, there has been limited effort to compare such tools. Unfortunately, in only a few cases can the results generated by distinct tools be directly compared since the addressed extraction tasks are different. This paper surveys the major Web data extraction approaches and compares them in three dimensions: the task domain, the automation degree, and the techniques used. The criteria of the first dimension explain why an IE system fails to handle some Web sites of particular structures. The criteria of the second dimension classify IE systems based on the techniques used. The criteria of the third dimension measure the degree of automation for IE systems. We believe these criteria provide qualitatively measures to evaluate various IE approaches View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Weight Design of Maximum Weighted Likelihood and an Extended EM Algorithm

    Page(s): 1429 - 1434
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1338 KB) |  | HTML iconHTML  

    The recent maximum weighted likelihood (MWL) has provided a general learning paradigm for density-mixture model selection and learning, in which weight design, however, is a key issue. This paper will therefore explore such a design, and through which a heuristic extended expectation-maximization (X-EM) algorithm is presented accordingly. Unlike the EM algorithm, the X-EM algorithm is able to perform model selection by fading the redundant components out from a density mixture, meanwhile estimating the model parameters appropriately. The numerical simulations demonstrate the efficacy of our algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Class Noise Handling for Effective Cost-Sensitive Learning by Cost-Guided Iterative Classification Filtering

    Page(s): 1435 - 1440
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2321 KB) |  | HTML iconHTML  

    Recent research in machine learning, data mining, and related areas has produced a wide variety of algorithms for cost-sensitive (CS) classification, where instead of maximizing the classification accuracy, minimizing the misclassification cost becomes the objective. These methods often assume that their input is quality data without conflict or erroneous values, or the noise impact is trivial, which is seldom the case in real-world environments. In this paper, we propose a cost-guided iterative classification filter (CICF) to identify noise for effective CS learning. Instead of putting equal weights on handling noise in all classes in existing efforts, CICF puts more emphasis on expensive classes, which makes it attractive in dealing with data sets with a large cost-ratio. Experimental results and comparative studies indicate that the existence of noise may seriously corrupt the performance of the underlying CS learners and by adopting the proposed CICF algorithm, we can significantly reduce the misclassification cost of a CS classifier in noisy environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TKDE Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (98 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (143 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University