By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 5 • Date May 2005

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2005 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Publication Year: 2005 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (92 KB)  
    Freely Available from IEEE
  • On change diagnosis in evolving data streams

    Publication Year: 2005 , Page(s): 587 - 600
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1400 KB) |  | HTML iconHTML  

    In recent years, the progress in hardware technology has made it possible for organizations to store and record large streams of transactional data. This results in databases which grow without limit at a rapid rate. This data can often show important changes in trends over time. In such cases, it is useful to understand, visualize, and diagnose the evolution of these trends. In this paper, we introduce the concept of velocity density estimation, a technique used to understand, visualize, and determine trends in the evolution of fast data streams. We show how to use velocity density estimation in order to create both temporal velocity profiles and spatial velocity profiles at periodic instants in time. These profiles are then used in order to predict three kinds of data evolution: dissolution, coagulation, and shift. Methods are proposed to visualize the changing data trends in a single online scan of the data stream and a computational requirement which is linear in the number of data points. The visualization techniques can also be used to provide online animations which show the changes in the data characteristics while they occur. In addition, batch processing techniques are proposed in order to quantify the level of change across different combinations of dimensions. This quantification is then used in order to determine dimensional combinations with significant evolution. The techniques discussed in this paper can be easily extended to spatiotemporal data, changes in data snapshots at fixed instances in time, or any other data which has a temporal component during its evolution. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fuzzy-set-based Reconstructed Phase Space method for identification of temporal patterns in complex time series

    Publication Year: 2005 , Page(s): 601 - 613
    Cited by:  Papers (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1776 KB) |  | HTML iconHTML  

    The new time series data mining framework proposed in this paper applies Reconstructed Phase Space (RPS) to identify temporal patterns that are characteristic and predictive of significant events in a complex time series. The new framework utilizes the fuzzy set and the Gaussian-shaped membership function to define temporal patterns in the time-delay embedding phase space. The resulting objective function represents not only the overall value of the event function, but also the weight of the vector in the temporal pattern cluster to which it contributes. Also, the new objective function is continuously differentiate so the gradient descent optimization such as quasiNewton's method can be applied to search the optimal temporal patterns with much faster speed of convergence. The computational stability is significantly improved over the genetic algorithm originally used in our early framework. A new simple but effective two-step optimization strategy is proposed which further improves the search performance. Another significant contribution is the use of mutual information and false neighbors methods to estimate the time delay and the phase space dimension. We also implemented two experimental applications to demonstrate the effectiveness of the new framework with comparisons to the original framework and to the neural network prediction approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WISDOM: Web intrapage informative structure mining based on document object model

    Publication Year: 2005 , Page(s): 614 - 627
    Cited by:  Papers (14)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2344 KB) |  | HTML iconHTML  

    To increase the commercial value and accessibility of pages, most content sites tend to publish their pages with intrasite redundant information, such as navigation panels, advertisements, and copyright announcements. Such redundant information increases the index size of general search engines and causes page topics to drift. In this paper, we study the problem of mining intrapage informative structure in news Web sites in order to find and eliminate redundant information. Note that intrapage informative structure is a subset of the original Web page and is composed of a set of fine-grained and informative blocks. The intrapage informative structures of pages in a news Web site contain only anchors linking to news pages or bodies of news articles. We propose an intrapage informative structure mining system called WISDOM (Web intrapage informative structure mining based on the document object model) which applies Information Theory to DOM tree knowledge in order to build the structure. WISDOM splits a DOM tree into many small subtrees and applies a top-down informative block searching algorithm to select a set of candidate informative blocks. The structure is built by expanding the set using proposed merging methods. Experiments on several real news Web sites show high precision and recall rates which validates WISDOM'S practical applicability. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dual clustering: integrating data clustering over optimization and constraint domains

    Publication Year: 2005 , Page(s): 628 - 637
    Cited by:  Papers (14)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2016 KB) |  | HTML iconHTML  

    Spatial clustering has attracted a lot of research attention due to its various applications. In most conventional clustering problems, the similarity measurement mainly takes the geometric attributes into consideration. However, in many real applications, the nongeometric attributes are what users are concerned about. In the conventional spatial clustering, the input data set is partitioned into several compact regions and data points which are similar to one another in their nongeometric attributes may be scattered over different regions, thus making the corresponding objective difficult to achieve. To remedy this, we propose and explore in this paper a new clustering problem on two domains, called dual clustering, where one domain refers to the optimization domain and the other refers to the constraint domain. Attributes on the optimization domain are those involved in the optimization of the objective function, while those on the constraint domain specify the application dependent constraints. Our goal is to optimize the objective function in the optimization domain while satisfying the constraint specified in the constraint domain. We devise an efficient and effective algorithm, named Interlaced Clustering-Classification, abbreviated as ICC, to solve this problem. The proposed ICC algorithm combines the information in both domains and iteratively performs a clustering algorithm on the optimization domain and also a classification algorithm on the constraint domain to reach the target clustering effectively. The time and space complexities of the ICC algorithm are formally analyzed. Several experiments are conducted to provide the insights into the dual clustering problem and the proposed algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effectively mining and using coverage and overlap statistics for data integration

    Publication Year: 2005 , Page(s): 638 - 651
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1112 KB) |  | HTML iconHTML  

    Recent work in data integration has shown the importance of statistical information about the coverage and overlap of sources for efficient query processing. Despite this recognition, there are no effective approaches for learning the needed statistics. The key challenge in learning such statistics is keeping the number of needed statistics low enough to have the storage and learning costs manageable. In this paper, we present a set of connected techniques that estimate the coverage and overlap statistics, while keeping the needed statistics tightly under control. Our approach uses a hierarchical classification of the queries and threshold-based variants of familiar data mining techniques to dynamically decide the level of resolution at which to learn the statistics. We describe the details of our method, and, present experimental results demonstrating the efficiency of the learning algorithms and the effectiveness of the learned statistics over both controlled data sources and in the context of BibFinder with autonomous online sources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TFP: an efficient algorithm for mining top-k frequent closed itemsets

    Publication Year: 2005 , Page(s): 652 - 663
    Cited by:  Papers (38)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (752 KB) |  | HTML iconHTML  

    Frequent itemset mining has been studied extensively in literature. Most previous studies require the specification of a min_support threshold and aim at mining a complete set of frequent itemsets satisfying min_support. However, in practice, it is difficult for users to provide an appropriate min_support threshold. In addition, a complete set of frequent itemsets is much less compact than a set of frequent closed itemsets. In this paper, we propose an alternative mining task: mining top-k frequent closed itemsets of length no less than min_l, where k is the desired number of frequent closed itemsets to be mined, and min_l is the minimal length of each itemset. An efficient algorithm, called TFP, is developed for mining such itemsets without mins_support. Starting at min_support = 0 and by making use of the length constraint and the properties of top-k frequent closed itemsets, min_support can be raised effectively and FP-Tree can be pruned dynamically both during and after the construction of the tree using our two proposed methods: the closed node count and descendant_sum. Moreover, mining is further speeded up by employing a top-down and bottom-up combined FP-Tree traversing strategy, a set of search space pruning methods, a fast 2-level hash-indexed result tree, and a novel closed itemset verification scheme. Our extensive performance study shows that TFP has high performance and linear scalability in terms of the database size. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Video data mining: semantic indexing and event detection from the association perspective

    Publication Year: 2005 , Page(s): 665 - 677
    Cited by:  Papers (36)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1648 KB) |  | HTML iconHTML  

    Advances in the media and entertainment industries, including streaming audio and digital TV, present new challenges for managing and accessing large audio-visual collections. Current content management systems support retrieval using low-level features, such as motion, color, and texture. However, low-level features often have little meaning for naive users, who much prefer to identify content using high-level semantics or concepts. This creates a gap between systems and their users that must be bridged for these systems to be used effectively. To this end, in this paper, we first present a knowledge-based video indexing and content management framework for domain specific videos (using basketball video as an example). We will provide a solution to explore video knowledge by mining associations from video data. The explicit definitions and evaluation measures (e.g., temporal support and confidence) for video associations are proposed by integrating the distinct feature of video data. Our approach uses video processing techniques to find visual and audio cues (e.g., court field, camera motion activities, and applause), introduces multilevel sequential association mining to explore associations among the audio and visual cues, classifies the associations by assigning each of them with a class label, and uses their appearances in the video to construct video indices. Our experimental results demonstrate the performance of the proposed approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ε-SSVR: a smooth support vector machine for ε-insensitive regression

    Publication Year: 2005 , Page(s): 678 - 685
    Cited by:  Papers (34)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (648 KB) |  | HTML iconHTML  

    A new smoothing strategy for solving ε-support vector regression (ε-SVR), tolerating a small error in fitting a given data set linearly or nonlinearly, is proposed in this paper. Conventionally, ε-SVR is formulated as a constrained minimization problem, namely, a convex quadratic programming problem. We apply the smoothing techniques that have been used for solving the support vector machine for classification, to replace the ε-insensitive loss function by an accurate smooth approximation. This will allow us to solve ε-SVR as an unconstrained minimization problem directly. We term this reformulated problem as ε-smooth support vector regression (ε-SSVR). We also prescribe a Newton-Armijo algorithm that has been shown to be convergent globally and quadratically to solve our ε-SSVR. In order to handle the case of nonlinear regression with a massive data set, we also introduce the reduced kernel technique in this paper to avoid the computational difficulties in dealing with a huge and fully dense kernel matrix. Numerical results and comparisons are given to demonstrate the effectiveness and speed of the algorithm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Toward an agent-based and context-oriented approach for Web services composition

    Publication Year: 2005 , Page(s): 686 - 697
    Cited by:  Papers (81)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1112 KB) |  | HTML iconHTML  

    This paper presents an agent-based and context-oriented approach that supports the composition of Web services. A Web service is an accessible application that other applications and humans can discover and invoke to satisfy multiple needs. To reduce the complexity featuring the composition of Web services, two concepts are put forward, namely, software agent and context. A software agent is an autonomous entity that acts on behalf of users and the context is any relevant information that characterizes a situation. During the composition process, software agents engage in conversations with their peers to agree on the Web services that participate in this process. Conversations between agents take into account the execution context of the Web services. The security of the computing resources on which the Web services are executed constitutes another core component of the agent-based and context-oriented approach presented in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Techniques for efficient road-network-based tracking of moving objects

    Publication Year: 2005 , Page(s): 698 - 712
    Cited by:  Papers (52)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1048 KB) |  | HTML iconHTML  

    With the continued advances in wireless communications, geo-positioning, and consumer electronics, an infrastructure is emerging that enables location-based services that rely on the tracking of the continuously changing positions of entire populations of service users, termed moving objects. This scenario is characterized by large volumes of updates, for which reason location update technologies become important. A setting is assumed in which a central database stores a representation of each moving object's current position. This position is to be maintained so that it deviates from the user's real position by at most a given threshold. To do so, each moving object stores locally the central representation of its position. Then, an object updates the database whenever the deviation between its actual position (as obtained from a GPS device) and the database position exceeds the threshold. The main issue considered is how to represent the location of a moving object in a database so that tracking can be done with as few updates as possible. The paper proposes to use the road network within which the objects are assumed to move for predicting their future positions. The paper presents algorithms that modify an initial road-network representation, so that it works better as a basis for predicting an object's position; it proposes to use known movement patterns of the object, in the form of routes; and, it proposes to use acceleration profiles together with the routes. Using real GPS-data and a corresponding real road network, the paper offers empirical evaluations and comparisons that include three existing approaches and all the proposed approaches. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ontological evaluation of enterprise systems interoperability using ebXML

    Publication Year: 2005 , Page(s): 713 - 725
    Cited by:  Papers (18)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2088 KB) |  | HTML iconHTML  

    Enterprise systems interoperability (ESI) is an important topic for business currently. This situation is evidenced, at least in part, by the number and extent of potential candidate protocols for such process interoperation, viz., ebXML, BPML, BPEL, and WSCI. Wide-ranging support for each of these candidate standards already exists. However, despite broad acceptance, a sound theoretical evaluation of these approaches has not yet been provided. We use the Bunge-Wand-Weber (BWW) models, in particular, the representation model, to provide the basis for such a theoretical evaluation. We, and other researchers, have shown the usefulness of the representation model for analyzing, evaluating, and engineering techniques in the areas of traditional and structured systems analysis, object-oriented modeling, and process modeling. In this work, we address the question, what are the potential semantic weaknesses of using ebXML alone for process interoperation between enterprise systems? We find that users lack important implementation information because of representational deficiencies; due to ontological redundancy, the complexity of the specification is unnecessarily increased; and, users of the specification have to bring in extra-model knowledge to understand constructs in the specification due to instances of ontological excess. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Web surfer model incorporating topic continuity

    Publication Year: 2005 , Page(s): 726 - 729
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    This paper describes a surfer model which incorporates information about topic continuity derived from the surfer's history. Therefore, unlike earlier models, it captures the interrelationship between categorization (context) and ranking of Web documents simultaneously. The model is mathematically formulated. A scalable and convergent iterative procedure is provided for its implementation. Its different characteristic features, as obtained from the joint probability matrix, and their significance in Web intelligence are mentioned. Experiments performed on Web pages obtained from WebBase confirm the superiority of the model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Advertisement]

    Publication Year: 2005 , Page(s): 730
    Save to Project icon | Request Permissions | PDF file iconPDF (407 KB)  
    Freely Available from IEEE
  • TKDE Information for authors

    Publication Year: 2005 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (92 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2005 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University