Notification:
We are currently experiencing intermittent issues impacting performance. We apologize for the inconvenience.
By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 7 • Date July 2008

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2008 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2008 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (122 KB)  
    Freely Available from IEEE
  • Editorial: TKDE Editorial Board Changes

    Publication Year: 2008 , Page(s): 865 - 867
    Save to Project icon | Request Permissions | PDF file iconPDF (219 KB)  
    Freely Available from IEEE
  • A Niching Memetic Algorithm for Simultaneous Clustering and Feature Selection

    Publication Year: 2008 , Page(s): 868 - 879
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2798 KB) |  | HTML iconHTML  

    Clustering is inherently a difficult task, and is made even more difficult when the selection of relevant features is also an issue. In this paper we propose an approach for simultaneous clustering and feature selection using a niching memetic algorithm. Our approach (which we call NMA_CFS) makes feature selection an integral part of the global clustering search procedure and attempts to overcome the problem of identifying less promising locally optimal solutions in both clustering and feature selection, without making any a priori assumption about the number of clusters. Within the NMA_CFS procedure, a variable composite representation is devised to encode both feature selection and cluster centers with different numbers of clusters. Further, local search operations are introduced to refine feature selection and cluster centers encoded in the chromosomes. Finally, a niching method is integrated to preserve the population diversity and prevent premature convergence. In an experimental evaluation we demonstrate the effectiveness of the proposed approach and compare it with other related approaches, using both synthetic and real data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cluster Kernels: Resource-Aware Kernel Density Estimators over Streaming Data

    Publication Year: 2008 , Page(s): 880 - 893
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1310 KB) |  | HTML iconHTML  

    A variety of real-world applications heavily relies on an adequate analysis of transient data streams. Due to the rigid processing requirements of data streams, common analysis techniques as known from data mining are not directly applicable. A fundamental building block of many data mining and analysis approaches is density estimation. It provides a well-defined estimation of a continuous data distribution, a fact, which makes its adaptation to data streams desirable. A convenient method for density estimation utilizes kernels. The computational complexity of kernel density estimation, however, renders its application to data streams impossible. In this paper, we tackle this problem and propose our Cluster Kernel approach which provides continuously computed kernel density estimators over streaming data. Not only do Cluster Kernels meet the rigid processing requirements of data streams, they also allocate only a constant amount of memory, even with the opportunity to adapt it dynamically to changing system resources. For this purpose, we develop an intelligent merge scheme for Cluster Kernels and utilize continuously collected local statistics to resample already processed data. We focus on Cluster Kernels for one-dimensional data streams, but also address the multi-dimensional case. We validate the efficacy of Cluster Kernels for a variety of real-world data streams in an extensive experimental study. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ranked Reverse Nearest Neighbor Search

    Publication Year: 2008 , Page(s): 894 - 910
    Cited by:  Papers (8)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5126 KB) |  | HTML iconHTML  

    Given a set of data points P and a query point q in a multidimensional space, reverse nearest neighbor (RNN) query finds data points in P whose nearest neighbors are q. Reverse k-nearest neighbor (RkNN) query (where k ges 1) generalizes RNN query to find data points whose kNNs include q. For RkNN query semantics, q is said to have influence to all those answer data points. The degree of q's influence on a data point p (isin P) is denoted by kappap where q is the kappap-th NN of p. We introduce a new variant of RNN query, namely, ranked reverse nearest neighbor (RRNN) query, that retrieves t data points most influenced by q, i.e., the t data points having the smallest kappa's with respect to q. To answer this RRNN query efficiently, we propose two novel algorithms, kappa-counting and kappa-browsing that are applicable to both monochromatic and bichromatic scenarios and are able to deliver results progressively. Through an extensive performance evaluation, we validate that the two proposed RRNN algorithms are superior to solutions derived from algorithms designed for RkNN query. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Simultaneous Pattern and Data Clustering for Pattern Cluster Analysis

    Publication Year: 2008 , Page(s): 911 - 923
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2745 KB) |  | HTML iconHTML  

    In data mining and knowledge discovery, pattern discovery extracts previously unknown regularities in the data and is a useful tool for categorical data analysis. However, the number of patterns discovered is often overwhelming. It is difficult and time-consuming to 1) interpret the discovered patterns and 2) use them to further analyze the data set. To overcome these problems, this paper proposes a new method that clusters patterns and their associated data simultaneously. When patterns are clustered, the data containing the patterns are also clustered; and the relation between patterns and data is made explicit. Such an explicit relation allows the user on the one hand to further analyze each pattern cluster via its associated data cluster, and on the other hand to interpret why a data cluster is formed via its corresponding pattern cluster. Since the effectiveness of clustering mainly depends on the distance measure, several distance measures between patterns and their associated data are proposed. Their relationships to the existing common ones are discussed. Once pattern clusters and their associated data clusters are obtained, each of them can be further analyzed individually. To evaluate the effectiveness of the proposed approach, experimental results on synthetic and real data are reported. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analyzing and Managing Role-Based Access Control Policies

    Publication Year: 2008 , Page(s): 924 - 939
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1864 KB) |  | HTML iconHTML  

    Today more and more security-relevant data is stored on computer systems; security-critical business processes are mapped to their digital counterparts. This situation applies to various domains such as health care industry, digital government, and financial service institutes requiring that different security requirements must be fulfilled. Authorisation constraints can help the policy architect design and express higher-level organisational rules. Although the importance of authorisation constraints has been addressed in the literature, there does not exist a systematic way to verify and validate authorisation constraints. In this paper, we specify both non-temporal and history-based authorisation constraints in the Object Constraint Language (OCL) and first-order linear temporal logic (LTL). Based upon these specifications, we attempt to formally verify role-based access control policies with the help of a theorem prover and to validate policies with the USE system, a validation tool for OCL constraints. We also describe an authorisation engine, which supports the enforcement of authorisation constraints. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating Data Warehouses with Web Data: A Survey

    Publication Year: 2008 , Page(s): 940 - 955
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1778 KB) |  | HTML iconHTML  

    This paper surveys the most relevant research on combining Data Warehouse (DW) and Web data. It studies the XML technologies that are currently being used to integrate, store, query and retrieve web data, and their application to DWs. The paper reviews different DW distributed architectures and the use of XML languages as an integration tool in these systems. It also introduces the problem of dealing with semi-structured data in a DW. It studies Web data repositories, the design of multidimensional databases for XML data sources and the XML extensions of On-Line Analytical Processing techniques. The paper addresses the application of information retrieval technology in a DW to exploit text-rich documents collections. The authors hope that the paper will help to discover the main limitations and opportunities that offer the combination of the DW and the Web fields, as well as, to identify open research lines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Chaotic Time Series Prediction Using a Neuro-Fuzzy System with Time-Delay Coordinates

    Publication Year: 2008 , Page(s): 956 - 964
    Cited by:  Papers (10)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3096 KB) |  | HTML iconHTML  

    This paper presents an investigation into the use of the delay coordinate embedding technique in the multi-input- multioutput-adaptive-network-based fuzzy inference system (MANFIS) for chaotic time series prediction. The inputs to the MANFIS are embedded-phase-space (EPS) vectors preprocessed from the time series under test, while the output time series is extracted from the output EPS vectors from the MANFIS. A moving root-mean-square error is used to monitor the error over the prediction horizon and to tune the membership functions in the MANFIS. With the inclusion of the EPS preprocessing step, the prediction performance of the MANFIS is improved significantly. The proposed method has been tested with one periodic function and two chaotic functions including Mackey-Glass chaotic time series and Duffing forced-oscillation system. The prediction performances with and without EPS preprocessing are statistically compared by using the t-test method. The results show that EPS preprocessing can help improve the prediction performance of a MANFIS significantly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Molecular Verification of Rule-Based Systems Based on DNA Computation

    Publication Year: 2008 , Page(s): 965 - 975
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3858 KB) |  | HTML iconHTML  

    Various graphic techniques have been developed to analyze structural errors in rule-based systems that utilize inference (propositional) logic rules. Four typical errors in rule-based systems are: redundancy (numerous rule sets resulting in the same conclusion); circularity (a rule leading back to itself); incompleteness (deadends or a rule set conclusion leading to unreachable goals); and inconsistency (rules conflicting with each other). This study presents a new DNA-based computing algorithm mainly based upon Adleman's DNA operations. It can be used to detect such errors. There are three phases to this molecular solution: rule-to-DNA transformation design, solution space generation, and rule verification. We first encode individual rules using relatively short DNA strands, and then generate all possible rule paths by the directed joining of such short strands to form longer strands. We then conduct the verification algorithm to detect errors. The potential of applying this proposed DNA computation algorithm to rule verification is promising given the operational time complexity of O(n*q), in which n denotes the number of fact clauses in the rule base and q is the number of rules with longest inference chain. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Meshing Streaming Updates with Persistent Data in an Active Data Warehouse

    Publication Year: 2008 , Page(s): 976 - 991
    Cited by:  Papers (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2541 KB) |  | HTML iconHTML  

    Active data warehousing has emerged as an alternative to conventional warehousing practices in order to meet the high demand of applications for up-to-date information. In a nutshell, an active warehouse is refreshed online and thus achieves a higher consistency between the stored information and the latest data updates. The need for online warehouse refreshment introduces several challenges in the implementation of data warehouse transformations, with respect to their execution time and their overhead to the warehouse processes. In this paper, we focus on a frequently encountered operation in this context, namely, the join of a fast stream 5" of source updates with a disk-based relation R, under the constraint of limited memory. This operation lies at the core of several common transformations such as surrogate key assignment, duplicate detection, or identification of newly inserted tuples. We propose a specialized join algorithm, termed mesh join (MESHJOIN), which compensates for the difference in the access cost of the two join inputs by 1) relying entirely on fast sequential scans of R and 2) sharing the I/O cost of accessing R across multiple tuples of 5". We detail the MESHJOIN algorithm and develop a systematic cost model that enables the tuning of MESHJOIN for two objectives: maximizing throughput under a specific memory budget or minimizing memory consumption for a specific throughput. We present an experimental study that validates the performance of MESHJOIN on synthetic and real-life data. Our results verify the scalability of MESHJOIN to fast streams and large relations and demonstrate its numerous advantages over existing join algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Streaming Time Series Summarization Using User-Defined Amnesic Functions

    Publication Year: 2008 , Page(s): 992 - 1006
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2285 KB) |  | HTML iconHTML  

    The past decade has seen a wealth of research on time series representations. The vast majority of research has concentrated on representations that are calculated in batch mode and represent each value with approximately equal fidelity. However, the increasing deployment of mobile devices and real time sensors has brought home the need for representations that can be incrementally updated, and can approximate the data with fidelity proportional to its age. The latter property allows us to answer queries about the recent past with greater precision, since in many domains recent information is more useful than older information. We call such representations amnesic. While there has been previous work on amnesic representations, the class of amnesic functions possible was dictated by the representation itself. In this work, we introduce a novel representation of time series that can represent arbitrary, user-specified amnesic functions. We propose online algorithms for our representation, and discuss their properties. Finally, we perform an extensive empirical evaluation on 40 datasets, and show that our approach can efficiently maintain a high quality amnesic approximation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Build Your Career in Computing [advertisement]

    Publication Year: 2008 , Page(s): 1007
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB)  
    Freely Available from IEEE
  • Join the IEEE Computer Society today! [advertisement]

    Publication Year: 2008 , Page(s): 1008
    Save to Project icon | Request Permissions | PDF file iconPDF (63 KB)  
    Freely Available from IEEE
  • TKDE Information for authors

    Publication Year: 2008 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (122 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2008 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University