By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 6 • Date June 2008

Filter Results

Displaying Results 1 - 16 of 16
  • [Front cover]

    Publication Year: 2008 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2008 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (81 KB)  
    Freely Available from IEEE
  • C-TREND: Temporal Cluster Graphs for Identifying and Visualizing Trends in Multiattribute Transactional Data

    Publication Year: 2008 , Page(s): 721 - 735
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2349 KB) |  | HTML iconHTML  

    Organizations and firms are capturing increasingly more data about their customers, suppliers, competitors, and business environment. Most of this data is multiattribute (multidimensional) and temporal in nature. Data. mining and business intelligence, techniques are often used to discover patterns in such data; however, mining temporal relationships typically is a complex task. We propose a new data analysis and visualization technique for representing trends in multiattribute temporal data using a clustering- based approach. We introduce Cluster-based Temporal Representation of EveNt Data (C-TREND), a system that implements the temporal cluster graph construct, which maps multiattribute temporal data to a two-dimensional directed graph that identifies trends in dominant data types over time. In this paper, we present our temporal clustering-based technique, discuss its algorithmic implementation and performance, demonstrate applications of the technique by analyzing data on wireless networking technologies and baseball batting statistics, and introduce a set of metrics for further analysis of discovered trends. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms for Storytelling

    Publication Year: 2008 , Page(s): 736 - 751
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3996 KB) |  | HTML iconHTML  

    We formulate a new data mining problem called storytelling as a generalization of redescription mining. In traditional redescription mining, we are given a set of objects and a collection of subsets defined over these objects. The goal is to view the set system as a vocabulary and identify two expressions in this vocabulary that induce the same set of objects. Storytelling, on the other hand, aims to explicitly relate object sets that are disjoint (and, hence, maximally dissimilar) by finding a chain of (approximate) redescriptions between the sets. This problem finds applications in bioinformatics, for instance, where the biologist is trying to relate a set of genes expressed in one experiment to another set, implicated in a different pathway. We outline an efficient storytelling implementation that embeds the CARTwheels redescription mining algorithm in an A* search procedure, using the former to supply next move operators on search branches to the latter. This approach is practical and effective for mining large data sets and, at the same time, exploits the structure of partitions imposed by the given vocabulary. Three application case studies are presented: a study of word overlaps in large English dictionaries, exploring connections between gene sets in a bioinformatics data set, and relating publications in the PubMed index of abstracts. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient Clustering Scheme to Exploit Hierarchical Data in Network Traffic Analysis

    Publication Year: 2008 , Page(s): 752 - 767
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2153 KB) |  | HTML iconHTML  

    There is significant interest in the data mining and network management communities about the need to improve existing techniques for clustering multivariate network traffic flow records so that we can quickly infer underlying traffic patterns. In this paper, we investigate the use of clustering techniques to identify interesting traffic patterns from network traffic data in an efficient manner. We develop a framework to deal with mixed type attributes including numerical, categorical, and hierarchical attributes for a one-pass hierarchical clustering algorithm. We demonstrate the improved accuracy and efficiency of our approach in comparison to previous work on clustering network traffic. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounded Approximation: A New Criterion for Dimensionality Reduction Approximation in Similarity Search

    Publication Year: 2008 , Page(s): 768 - 783
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3197 KB) |  | HTML iconHTML  

    We examine the problem of efficient distance-based similarity search over high-dimensional data. We show that a promising approach to this problem is to reduce dimensions and allow fast approximation. Conventional reduction approaches, however, entail a significant shortcoming: The approximation volume extends across the dataspace, which causes overestimation of retrieval sets and impairs performance. This paper focuses on a new criterion for dimensionality reduction methods: bounded approximation. We show that this requirement can be accomplished by a novel nonlinear transformation scheme that extracts two important parameters from the data. We devise two approximation formulations, namely, rectangular and spherical range search, each corresponding to a closed volume around the original search sphere. We discuss in detail how we can derive tight bounds for the parameters and prove further results, as well as highlight insights into the problems and our proposed solutions. To demonstrate the benefits of the new criterion, we study the effects of (un)boundedness on approximation performance, including selectivity, error toleration, and efficiency. Extensive experiments confirm the superiority of this technique over recent state-of-the-art schemes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hardware-Enhanced Association Rule Mining with Hashing and Pipelining

    Publication Year: 2008 , Page(s): 784 - 795
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2233 KB) |  | HTML iconHTML  

    Generally speaking, to implement Apriori-based association rule mining in hardware, one has to load candidate itemsets and a database into the hardware. Since the capacity of the hardware architecture is fixed, if the number of candidate itemsets or the number of items in the database is larger than the hardware capacity, the items are loaded into the hardware separately. The time complexity of those steps that need to load candidate itemsets or database items into the hardware is in proportion to the number of candidate itemsets multiplied by the number of items in the database. Too many candidate itemsets and a large database would create a performance bottleneck. In this paper, we propose a HAsh-based and Pipelined (abbreviated as HAPPI) architecture for hardware- enhanced association rule mining. We apply the pipeline methodology in the HAPPI architecture to compare itemsets with the database and collect useful information for reducing the number of candidate itemsets and items in the database simultaneously. When the database is fed into the hardware, candidate itemsets are compared with the items in the database to find frequent itemsets. At the same time, trimming information is collected from each transaction. In addition, itemsets are generated from transactions and hashed into a hash table. The useful trimming information and the hash table enable us to reduce the number of items in the database and the number of candidate itemsets. Therefore, we can effectively reduce the frequency of loading the database into the hardware. As such, HAPPI solves the bottleneck problem in a priori-based hardware schemes. We also derive some properties to investigate the performance of this hardware implementation. As shown by the experiment results, HAPPI significantly outperforms the previous hardware approach and the software algorithm in terms of execution time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Truth Discovery with Multiple Conflicting Information Providers on the Web

    Publication Year: 2008 , Page(s): 796 - 808
    Cited by:  Papers (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3322 KB) |  | HTML iconHTML  

    The World Wide Web has become the most important information source for most of us. Unfortunately, there is no guarantee for the correctness of information on the Web. Moreover, different websites often provide conflicting information on a subject, such as different specifications for the same product. In this paper, we propose a new problem, called Veracity, i.e., conformity to truth, which studies how to find true facts from a large amount of conflicting information on many subjects that is provided by various websites. We design a general framework for the Veracity problem and invent an algorithm, called TRUTHFlNDER, which utilizes the relationships between websites and their information, i.e., a website is trustworthy if it provides many pieces of true information, and a piece of information is likely to be true if it is provided by many trustworthy websites. An iterative method is used to infer the trustworthiness of websites and the correctness of information from each other. Our experiments show that TRUTHFlNDER successfully finds true facts among conflicting information and identifies trustworthy websites better than the popular search engines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Probabilistic Group Nearest Neighbor Queries in Uncertain Databases

    Publication Year: 2008 , Page(s): 809 - 824
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3643 KB) |  | HTML iconHTML  

    The importance of query processing over uncertain data has recently arisen due to its wide usage in many real-world applications. In the context of uncertain databases, previous works have studied many query types such as nearest neighbor query, range query, top-k query, skyline query, and similarity join. In this paper, we focus on another important query, namely, probabilistic group nearest neighbor (PGNN) query, in the uncertain database, which also has many applications. Specifically, given a set, Q, of query points, a PGNN query retrieves data objects that minimize the aggregate distance (e.g., sum, min, and max) to query set Q. Due to the inherent uncertainty of data objects, previous techniques to answer group nearest neighbor (GNN) query cannot be directly applied to our PGNN problem. Motivated by this, we propose effective pruning methods, namely, spatial pruning and probabilistic pruning, to reduce the PGNN search space, which can be seamlessly integrated into our PGNN query procedure. Extensive experiments have demonstrated the efficiency and effectiveness of our proposed approach, in terms of the wall clock time and the speed-up ratio against linear scan. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Signature-Based Indexing Method for Efficient Content-Based Retrieval of Relative Temporal Patterns

    Publication Year: 2008 , Page(s): 825 - 835
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1471 KB) |  | HTML iconHTML  

    A number of algorithms have been proposed for the discovery of temporal patterns. However, since the number of generated patterns can be large, selecting which patterns to analyze can be nontrivial. There is thus a need for algorithms and tools that can assist in the selection of discovered patterns so that subsequent analysis can be performed in an efficient and, ideally, interactive manner. In this paper, we propose a signature-based indexing method to optimize the storage and retrieval of a large collection of relative temporal patterns. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary Optimization of File Assignment for a Large-Scale Video-on-Demand System

    Publication Year: 2008 , Page(s): 836 - 850
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3414 KB) |  | HTML iconHTML  

    We present a genetic algorithm for tackling a file assignment problem for a large-scale video-on-demand system. The file assignment problem is to find the optimal replication and allocation of movie files to disks so that the request blocking probability is minimized subject to capacity constraints. We adopt a divide-and-conquer strategy, where the entire solution space of file assignments is divided into subspaces. Each subspace is an exclusive set of solutions sharing a common file replication instance. This allows us to utilize a greedy file allocation method for finding a good-quality heuristic solution within each subspace. We further design two performance indices to measure the quality of the heuristic solution on 1.) its assignment of multicopy movies and 2.) its assignment of single-copy movies. We demonstrate that these techniques, together with ad hoc population handling methods, enable genetic algorithms to operate in a significantly reduced search space and achieve good-quality file assignments in a computationally efficient way. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Grid Service Discovery with Rough Sets

    Publication Year: 2008 , Page(s): 851 - 862
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2209 KB) |  | HTML iconHTML  

    The computational grid is rapidly evolving into a service-oriented computing infrastructure that facilitates resource sharing and large-scale problem solving over the Internet. Service discovery becomes an issue of vital importance in utilizing grid facilities. This paper presents ROSSE, a Rough sets-based search engine for grid service discovery. Building on the Rough sets theory, ROSSE is novel in its capability to deal with the uncertainty of properties when matching services. In this way, ROSSE can discover the services that are most relevant to a service query from a functional point of view. Since functionally matched services may have distinct nonfunctional properties related to the quality of service (QoS), ROSSE introduces a QoS model to further filter matched services with their QoS values to maximize user satisfaction in service discovery. ROSSE is evaluated from the aspects of accuracy and efficiency in discovery of computing services. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Join the IEEE Computer Society [advertisement]

    Publication Year: 2008 , Page(s): 863
    Save to Project icon | Request Permissions | PDF file iconPDF (63 KB)  
    Freely Available from IEEE
  • Build Your Career in Computing [advertisement]

    Publication Year: 2008 , Page(s): 864
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • TKDE Information for authors

    Publication Year: 2008 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (81 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2008 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (97 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University

Associate Editor-in-Chief
Xuemin Lin
University of New South Wales

Associate Editor-in-Chief
Lei Chen
Hong Kong University of Science and Technology