By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 7 • Date July 2012

Filter Results

Displaying Results 1 - 18 of 18
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (123 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (203 KB)  
    Freely Available from IEEE
  • A Decision-Theoretic Framework for Numerical Attribute Value Reconciliation

    Page(s): 1153 - 1169
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1499 KB)  

    One of the major challenges of data integration is to resolve conflicting numerical attribute values caused by data heterogeneity. In addressing this problem, existing approaches proposed in prior literature often ignore such data inconsistencies or resolve them in an ad hoc manner. In this study, we propose a decision-theoretical framework that resolves numerical value conflicts in a systematic manner. The framework takes into consideration the consequences of incorrect numerical values and selects the value that minimizes the expected cost of errors for all data application problems under consideration. Experimental results show that significant savings can be achieved by adopting the proposed framework instead of ad hoc approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Look-Ahead Approach to Secure Multiparty Protocols

    Page(s): 1170 - 1185
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1701 KB)  

    Secure multiparty protocols have been proposed to enable noncolluding parties to cooperate without a trusted server. Even though such protocols prevent information disclosure other than the objective function, they are quite costly in computation and communication. The high overhead motivates parties to estimate the utility that can be achieved as a result of the protocol beforehand. In this paper, we propose a look-ahead approach, specifically for secure multiparty protocols to achieve distributed k-anonymity, which helps parties to decide if the utility benefit from the protocol is within an acceptable range before initiating the protocol. The look-ahead operation is highly localized and its accuracy depends on the amount of information the parties are willing to share. Experimental results show the effectiveness of the proposed methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Combining Tag and Value Similarity for Data Extraction and Alignment

    Page(s): 1186 - 1200
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2112 KB)  

    Web databases generate query result pages based on a user's query. Automatically extracting the data from these query result pages is very important for many applications, such as data integration, which need to cooperate with multiple web databases. We present a novel data extraction and alignment method called CTVS that combines both tag and value similarity. CTVS automatically extracts data from query result pages by first identifying and segmenting the query result records (QRRs) in the query result pages and then aligning the segmented QRRs into a table, in which the data values from the same attribute are put into the same column. Specifically, we propose new techniques to handle the case when the QRRs are not contiguous, which may be due to the presence of auxiliary information, such as a comment, recommendation or advertisement, and for handling any nested structure that may exist in the QRRs. We also design a new record alignment algorithm that aligns the attributes in a record, first pairwise and then holistically, by combining the tag and data value similarity information. Experimental results show that CTVS achieves high precision and outperforms existing state-of-the-art data extraction methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous Detour Queries in Spatial Networks

    Page(s): 1201 - 1215
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3255 KB)  

    We study the problem of finding the shortest route between two locations that includes a stopover of a given type. An example scenario of this problem is given as follows: “On the way to Bob's place, Alice searches for a nearby take-away Italian restaurant to buy a pizza.” Assuming that Alice is interested in minimizing the total trip distance, this scenario can be modeled as a query where the current Alice's location (start) and Bob's place (destination) function as query points. Based on these two query points, we find the minimum detour object (MDO), i.e., a stopover that minimizes the sum of the distances: 1) from the start to the stopover, and 2) from the stopover to the destination. In a realistic location-based application environment, a user can be indecisive about committing to a particular detour option. The user may wish to browse multiple (k) MDOs before making a decision. Furthermore, when a user moves, the kMDO results at one location may become obsolete. We propose a method for continuous detour query (CDQ) processing based on incremental construction of a shortest path tree. We conducted experimental studies to compare the performance of our proposed method against two methods derived from existing k-nearest neighbor querying techniques using real road-network data sets. Experimental results show that our proposed method significantly outperforms the two competitive techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dense Subgraph Extraction with Application to Community Detection

    Page(s): 1216 - 1230
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2312 KB)  

    This paper presents a method for identifying a set of dense subgraphs of a given sparse graph. Within the main applications of this “dense subgraph problem,” the dense subgraphs are interpreted as communities, as in, e.g., social networks. The problem of identifying dense subgraphs helps analyze graph structures and complex networks and it is known to be challenging. It bears some similarities with the problem of reordering/blocking matrices in sparse matrix techniques. We exploit this link and adapt the idea of recognizing matrix column similarities, in order to compute a partial clustering of the vertices in a graph, where each cluster represents a dense subgraph. In contrast to existing subgraph extraction techniques which are based on a complete clustering of the graph nodes, the proposed algorithm takes into account the fact that not every participating node in the network needs to belong to a community. Another advantage is that the method does not require to specify the number of clusters; this number is usually not known in advance and is difficult to estimate. The computational process is very efficient, and the effectiveness of the proposed method is demonstrated in a few real-life examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic In-Page Logging for B⁺-tree Index

    Page(s): 1231 - 1243
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1566 KB)  

    Unlike database tables, B+-tree indexes are hierarchical and their structures change over time by node splitting operations, which may propagate changes from one node to another. The node splitting operation is difficult for the basic In-Page Logging (IPL) scheme to deal with, because it involves more than one node that may be stored separately in different flash blocks. In this paper, we propose Dynamic IPL B+-tree (d-IPL B+-tree in short) as a variant of the IPL scheme tailored for flash-based B+-tree indexes. The d-IPL B+-tree addresses the problem of frequent log overflow by allocating a log area in a flash block dynamically. It also avoids a page evaporation problem, imposed by the contemporary NAND flash chips, by introducing ghost nodes to d-IPL B+-tree. This simple but elegant design of the d-IPL B+-tree provides significant performance improvement over existing approaches. For a random insertion workload, the d-IPL B+-tree outperformed a B+-tree with the plain IPL scheme by more than a factor of two in terms of page write and block erase operations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Computation of Range Aggregates against Uncertain Location-Based Queries

    Page(s): 1244 - 1258
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1731 KB)  

    In many applications, including location-based services, queries may not be precise. In this paper, we study the problem of efficiently computing range aggregates in a multidimensional space when the query location is uncertain. Specifically, for a query point Q whose location is uncertain and a set S of points in a multidimensional space, we want to calculate the aggregate (e.g., count, average and sum) over the subset S' of S such that for each p ϵ S', Q has at least probability θ within the distance γ to p. We propose novel, efficient techniques to solve the problem following the filtering-and-verification paradigm. In particular, two novel filtering techniques are proposed to effectively and efficiently remove data points from verification. Our comprehensive experiments based on both real and synthetic data demonstrate the efficiency and scalability of our techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy-Efficient Reverse Skyline Query Processing over Wireless Sensor Networks

    Page(s): 1259 - 1275
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1830 KB)  

    Reverse skyline query plays an important role in many sensing applications, such as environmental monitoring, habitat monitoring, and battlefield monitoring. Due to the limited power supplies of wireless sensor nodes, the existing centralized approaches, which do not consider energy efficiency, cannot be directly applied to the distributed sensor environment. In this paper, we investigate how to process reverse skyline queries energy efficiently in wireless sensor networks. Initially, we theoretically analyzed the properties of reverse skyline query and proposed a skyband-based approach to tackle the problem of reverse skyline query answering over wireless sensor networks. Then, an energy-efficient approach is proposed to minimize the communication cost among sensor nodes of evaluating range reverse skyline query. Moreover, optimization mechanisms to improve the performance of multiple reverse skylines are also discussed. Extensive experiments on both real-world data and synthetic data have demonstrated the efficiency and effectiveness of our proposed approaches with various experimental settings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating Path Queries over Frequently Updated Route Collections

    Page(s): 1276 - 1290
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2379 KB)  

    The recent advances in the infrastructure of Geographic Information Systems (GIS), and the proliferation of GPS technology, have resulted in the abundance of geodata in the form of sequences of points of interest (POIs), waypoints, etc. We refer to sets of such sequences as route collections. In this work, we consider path queries on frequently updated route collections: given a route collection and two points ns and nt, a path query returns a path, i.e., a sequence of points, that connects ns to nt. We introduce two path query evaluation paradigms that enjoy the benefits of search algorithms (i.e., fast index maintenance) while utilizing transitivity information to terminate the search sooner. Efficient indexing schemes and appropriate updating procedures are introduced. An extensive experimental evaluation verifies the advantages of our methods compared to conventional graph-based search. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model-Based Method for Projective Clustering

    Page(s): 1291 - 1305
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1402 KB)  

    Clustering high-dimensional data is a major challenge due to the curse of dimensionality. To solve this problem, projective clustering has been defined as an extension to traditional clustering that attempts to find projected clusters in subsets of the dimensions of a data space. In this paper, a probability model is first proposed to describe projected clusters in high-dimensional data space. Then, we present a model-based algorithm for fuzzy projective clustering that discovers clusters with overlapping boundaries in various projected subspaces. The suitability of the proposal is demonstrated in an empirical study done with synthetic data set and some widely used real-world data set. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing the Calculation of Conditional Probability Tables in Hybrid Bayesian Networks Using Binary Factorization

    Page(s): 1306 - 1312
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (631 KB)  

    Reducing the computational complexity of inference in Bayesian Networks (BNs) is a key challenge. Current algorithms for inference convert a BN to a junction tree structure made up of clusters of the BN nodes and the resulting complexity is time exponential in the size of a cluster. The need to reduce the complexity is especially acute where the BN contains continuous nodes. We propose a new method for optimizing the calculation of Conditional Probability Tables (CPTs) involving continuous nodes, approximated in Hybrid Bayesian Networks (HBNs), using an approximation algorithm called dynamic discretization. We present an optimized solution to this problem involving binary factorization of the arithmetical expressions declared to generate the CPTs for continuous nodes for deterministic functions and statistical distributions. The proposed algorithm is implemented and tested in a commercial Hybrid Bayesian Network software package and the results of the empirical evaluation show significant performance improvement over unfactorized models. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Saturn: Range Queries, Load Balancing and Fault Tolerance in DHT Data Systems

    Page(s): 1313 - 1327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1906 KB)  

    In this paper, we present Saturn, an overlay architecture for large-scale data networks maintained over Distributed Hash Tables (DHTs) that efficiently processes range queries and ensures access load balancing and fault-tolerance. Placing consecutive data values in neighboring peers is desirable in DHTs since it accelerates range query processing; however, such a placement is highly susceptible to load imbalances. At the same time, DHTs may be susceptible to node departures/failures and high data availability and fault tolerance are significant issues. Saturn deals effectively with these problems through the introduction of a novel multiple ring, order-preserving architecture. The use of a novel order-preserving hash function ensures fast range query processing. Replication across and within data rings (termed vertical and horizontal replication) forms the foundation over which our mechanisms are developed, ensuring query load balancing and fault tolerance, respectively. Our detailed experimentation study shows strong gains in range query processing efficiency, access load balancing, and fault tolerance, with low replication overheads. The significance of Saturn is not only that it effectively tackles all three issues together - i.e., supporting range queries, ensuring load balancing, and providing fault tolerance over DHTs - but also that it can be applied on top of any order-preserving DHT enabling it to dynamically handle replication and, thus, to trade off replication costs for fair load distribution and fault tolerance. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Segmentation and Sampling of Moving Object Trajectories Based on Representativeness

    Page(s): 1328 - 1343
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3373 KB)  

    Moving Object Databases (MOD), although ubiquitous, still call for methods that will be able to understand, search, analyze, and browse their spatiotemporal content. In this paper, we propose a method for trajectory segmentation and sampling based on the representativeness of the (sub)trajectories in the MOD. In order to find the most representative subtrajectories, the following methodology is proposed. First, a novel global voting algorithm is performed, based on local density and trajectory similarity information. This method is applied for each segment of the trajectory, forming a local trajectory descriptor that represents line segment representativeness. The sequence of this descriptor over a trajectory gives the voting signal of the trajectory, where high values correspond to the most representative parts. Then, a novel segmentation algorithm is applied on this signal that automatically estimates the number of partitions and the partition borders, identifying homogenous partitions concerning their representativeness. Finally, a sampling method over the resulting segments yields the most representative subtrajectories in the MOD. Our experimental results in synthetic and real MOD verify the effectiveness of the proposed scheme, also in comparison with other sampling techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Computer Society OnlinePlus Coming Soon to TKDE

    Page(s): 1344
    Save to Project icon | Request Permissions | PDF file iconPDF (199 KB)  
    Freely Available from IEEE
  • [Inside back cover]

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (203 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (123 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University