By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 12 • Date Dec. 2009

Filter Results

Displaying Results 1 - 18 of 18
  • [Front cover]

    Publication Year: 2009 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (213 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2009 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE
  • A Model-Based Approach for Discrete Data Clustering and Feature Weighting Using MAP and Stochastic Complexity

    Publication Year: 2009 , Page(s): 1649 - 1664
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3838 KB) |  | HTML iconHTML  

    In this paper, we consider the problem of unsupervised discrete feature selection/weighting. Indeed, discrete data are an important component in many data mining, machine learning, image processing, and computer vision applications. However, much of the published work on unsupervised feature selection has concentrated on continuous data. We propose a probabilistic approach that assigns relevance weights to discrete features that are considered as random variables modeled by finite discrete mixtures. The choice of finite mixture models is justified by its flexibility which has led to its widespread application in different domains. For the learning of the model, we consider both Bayesian and information-theoretic approaches through stochastic complexity. Experimental results are presented to illustrate the feasibility and merits of our approach on a difficult problem which is clustering and recognizing visual concepts in different image data. The proposed approach is successfully applied also for text clustering. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clustering with Local and Global Regularization

    Publication Year: 2009 , Page(s): 1665 - 1678
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4702 KB) |  | HTML iconHTML  

    Clustering is an old research topic in data mining and machine learning. Most of the traditional clustering methods can be categorized as local or global ones. In this paper, a novel clustering method that can explore both the local and global information in the data set is proposed. The method, Clustering with Local and Global Regularization (CLGR), aims to minimize a cost function that properly trades off the local and global costs. We show that such an optimization problem can be solved by the eigenvalue decomposition of a sparse symmetric matrix, which can be done efficiently using iterative methods. Finally, the experimental results on several data sets are presented to show the effectiveness of our method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Continuous K-Means Monitoring with Low Reporting Cost in Sensor Networks

    Publication Year: 2009 , Page(s): 1679 - 1691
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1259 KB) |  | HTML iconHTML  

    In this paper, we study an interesting problem: continuously monitoring k-means clustering of sensor readings in a large sensor network. Given a set of sensors whose readings evolve over time, we want to maintain the k-means of the readings continuously. The optimization goal is to reduce the reporting cost in the network, that is, let as few sensors as possible report their current readings to the data center in the course of maintenance. To tackle the problem, we propose the reading reporting tree, a hierarchical data collection, and analysis framework. Moreover, we develop several reporting cost-effective methods using reading reporting trees in continuous k-means monitoring. First, a uniform sampling method using a reading reporting tree can achieve good quality approximation of k-means. Second, we propose a reporting threshold method which can guarantee the approximation quality. Last, we explore a lazy approach which can reduce the intermediate computation substantially. We conduct a systematic simulation evaluation using synthetic data sets to examine the characteristics of the proposed methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discovering Transitional Patterns and Their Significant Milestones in Transaction Databases

    Publication Year: 2009 , Page(s): 1692 - 1707
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4493 KB) |  | HTML iconHTML  

    A transaction database usually consists of a set of time-stamped transactions. Mining frequent patterns in transaction databases has been studied extensively in data mining research. However, most of the existing frequent pattern mining algorithms (such as Apriori and FP-growth) do not consider the time stamps associated with the transactions. In this paper, we extend the existing frequent pattern mining framework to take into account the time stamp of each transaction and discover patterns whose frequency dramatically changes over time. We define a new type of patterns, called transitional patterns, to capture the dynamic behavior of frequent patterns in a transaction database. Transitional patterns include both positive and negative transitional patterns. Their frequencies increase/decrease dramatically at some time points of a transaction database. We introduce the concept of significant milestones for a transitional pattern, which are time points at which the frequency of the pattern changes most significantly. Moreover, we develop an algorithm to mine from a transaction database the set of transitional patterns along with their significant milestones. Our experimental studies on real-world databases illustrate that mining positive and negative transitional patterns is highly promising as a practical and useful approach for discovering novel and interesting knowledge from large databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Tree Structures for High Utility Pattern Mining in Incremental Databases

    Publication Year: 2009 , Page(s): 1708 - 1721
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3288 KB) |  | HTML iconHTML  

    Recently, high utility pattern (HUP) mining is one of the most important research issues in data mining due to its ability to consider the nonbinary frequency values of items in transactions and different profit values for every item. On the other hand, incremental and interactive data mining provide the ability to use previous data structures and mining results in order to reduce unnecessary calculations when a database is updated, or when the minimum threshold is changed. In this paper, we propose three novel tree structures to efficiently perform incremental and interactive HUP mining. The first tree structure, Incremental HUP Lexicographic Tree (IHUPL-Tree), is arranged according to an item's lexicographic order. It can capture the incremental data without any restructuring operation. The second tree structure is the IHUP transaction frequency tree (IHUPTF-Tree), which obtains a compact size by arranging items according to their transaction frequency (descending order). To reduce the mining time, the third tree, IHUP-transaction-weighted utilization tree (IHUPTWU-Tree) is designed based on the TWU value of items in descending order. Extensive performance analyses show that our tree structures are very efficient and scalable for incremental and interactive HUP mining. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Join of Multiple Data Streams in Sensor Networks

    Publication Year: 2009 , Page(s): 1722 - 1736
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1852 KB) |  | HTML iconHTML  

    Sensor networks are multihop wireless networks of resource-constrained sensor nodes used to realize high-level collaborative sensing tasks. To query or access data generated by the sensor nodes, the sensor network can be viewed as a distributed database. In this paper, we develop algorithms for communication-efficient implementation of join of multiple (two or more) data streams in a sensor network. The distributed implementation of join in sensor networks is particularly challenging due to unique characteristics of the sensor networks such as limited memory and battery energy on individual nodes, arbitrary and dynamic network topology, multihop communication, and unreliable infrastructure. One of our proposed approaches, viz., the perpendicular approach (PA), is load balanced, and in fact, incurs near-optimal communication cost for the special case of binary joins in grid networks under the assumption of uniform generation of tuples across the network. We compare the performance of our designed approaches through extensive simulations on the ns2 simulator, and show that PA results in substantially prolonging the network lifetime compared to other approaches, especially for joins involving spatial constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Locating XML Documents in a Peer-to-Peer Network Using Distributed Hash Tables

    Publication Year: 2009 , Page(s): 1737 - 1752
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2000 KB) |  | HTML iconHTML  

    One of the key challenges in a peer-to-peer (P2P) network is to efficiently locate relevant data sources across a large number of participating peers. With the increasing popularity of the extensible markup language (XML) as a standard for information interchange on the Internet, XML is commonly used as an underlying data model for P2P applications to deal with the heterogeneity of data and enhance the expressiveness of queries. In this paper, we address the problem of efficiently locating relevant XML documents in a P2P network, where a user poses queries in a language such as XPath. We have developed a new system called psiX that runs on top of an existing distributed hashing framework. Under the psiX system, each XML document is mapped into an algebraic signature that captures the structural summary of the document. An XML query pattern is also mapped into a signature. The query's signature is used to locate relevant document signatures. Our signature scheme supports holistic processing of query patterns without breaking them into multiple path queries and processing them individually. The participating peers in the network collectively maintain a collection of distributed hierarchical indexes for the document signatures. Value indexes are built to handle numeric and textual values in XML documents. These indexes are used to process queries with value predicates. Our experimental study on PlanetLab demonstrates that psiX provides an efficient location service in a P2P network for a wide variety of XML documents. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pooling for Combination of Multilevel Forecasts

    Publication Year: 2009 , Page(s): 1753 - 1766
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1990 KB) |  | HTML iconHTML  

    In this paper, we provide a theoretical analysis of effects of applying different forecast diversification methods on the structure of the forecast error covariance matrices and decomposed forecast error components based on the bias-variance-Bayes error decomposition of James and Hastie. We express the "diversityrdquo of different forecasts in relation to different error components and propose a measure in order to quantify it. We illustrate and discuss typical inhomogeneities frequently occurring in the forecast error covariance matrices and show that previously proposed pooling based only on error variances cannot fully exploit the complementary information present in a set of diverse forecasts to be combined. If covariance values could be reliably calculated, they could be taken into account during the pooling process. We study the difficult case in which covariance information cannot be measured properly and propose a novel simplified representation of the covariance matrix, which is only based on knowledge about the forecast generation process. Finally, we propose a new pooling approach that avoids inhomogeneities in the forecast error covariance matrix by considering the information contained in the simplified covariance representation and compare it with the error-variance-based pooling approach introduced by Aiolfi and Timmermann. Applying our approach more than once leads to the generation of multistep and multilevel forecast combination structures, which have generated significantly improved forecasts in our previous extensive experimental work; the summary of which is also provided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Privacy-Preserving Tuple Matching in Distributed Databases

    Publication Year: 2009 , Page(s): 1767 - 1782
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1366 KB)  

    We address the problems of privacy-preserving duplicate tuple matching (PPDTM) and privacy-preserving threshold attributes matching (PPTAM) in the scenario of a horizontally partitioned database among N parties, where each party holds a private share of the database's tuples and all tuples have the same set of attributes. In PPDTM, each party determines whether its tuples have any duplicate on other parties' private databases. In PPTAM, each party determines whether all attribute values of each tuple appear at least a threshold number of times in the attribute unions. We propose protocols for the two problems using additive homomorphic cryptosystem based on the subgroup membership assumption, e.g., Paillier's and ElGamal's schemes. By analysis on the total numbers of modular exponentiations, modular multiplications and communication bits, with a reduced computation cost which dominates the total cost, by trading off communication cost, our PPDTM protocol for the semihonest model is superior to the solution derivable from existing techniques in total cost. Our PPTAM protocol is superior in both computation and communication costs. The efficiency improvements are achieved mainly by using random numbers instead of random polynomials as existing techniques for perturbation, without causing successful attacks by polynomial interpolations. We also give detailed constructions on the required zero-knowledge proofs and extend our two protocols to the malicious model, which were previously unknown. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tuning On-Air Signatures for Balancing Performance and Confidentiality

    Publication Year: 2009 , Page(s): 1783 - 1797
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2881 KB) |  | HTML iconHTML  

    In this paper, we investigate the trade off between performance and confidentiality in signature-based air indexing schemes for wireless data broadcast. Two metrics, namely, false drop probability and false guess probability, are defined to quantify the filtering efficiency and confidentiality loss of a signature scheme. Our analysis reveals that false drop probability and false guess probability share a similar trend as the tuning parameters of a signature scheme change and it is impossible to achieve a low false drop probability and a high false guess probability simultaneously. In order to balance the performance and confidentiality, we perform an analysis to provide a guidance for parameter settings of the signature schemes to meet different system requirements. In addition, we propose the jump pointer technique and the XOR signature scheme to further improve the performance and confidentiality. A comprehensive simulation has been conducted to validate our findings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Local Kernel Regression Score for Selecting Features of High-Dimensional Data

    Publication Year: 2009 , Page(s): 1798 - 1802
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1193 KB) |  | HTML iconHTML  

    In general, irrelevant features of high-dimensional data will degrade the performance of an inference system, e.g., a clustering algorithm or a classifier. In this paper, we therefore present a Local Kernel Regression (LKR) scoring approach to evaluate the relevancy of features based on their capabilities of keeping the local configuration in a small patch of data. Accordingly, a score index featuring applicability to both of supervised learning and unsupervised learning is developed to identify the relevant features within the framework of local kernel regression. Experimental results show the efficacy of the proposed approach in comparison with the existing methods. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Manufacturing-Oriented Discrete Process Modeling Approach Using the Predicate Logic

    Publication Year: 2009 , Page(s): 1803 - 1806
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (861 KB) |  | HTML iconHTML  

    Part machining is a discrete manufacturing process. In order to evaluate the manufacturing process, an intelligent modeling method based on the first-order predicate logic is proposed. First, the basic predicate formula is defined according to the machining method, and the predicate and variables are illustrated in detail. Thus, the process representation is completed. Second, to construct the process model, the modeling element is put forward, which includes three nodes. Components of modeling element are, respectively, discussed, as well as the mapping relationship between modeling element and predicate. After the definition of modeling predicate formula, five basic inference rules are established. Consequently, the manufacturing process model is constructed. Third, on the basis of the process model, the process simulation is carried out to evaluate the manufacturing performances, such as the production efficiency, the utilization rate of machining equipment, the production bottleneck, etc. Finally, a case study is conducted to explain this modeling method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • IEEE Computer Society 2010 New Student Member Package

    Publication Year: 2009 , Page(s): 1807
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE
  • IEEE Computer Society Computing Now [advertisement]

    Publication Year: 2009 , Page(s): 1808
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB)  
    Freely Available from IEEE
  • TKDE Information for authors

    Publication Year: 2009 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (173 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2009 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (213 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University