By Topic

Knowledge and Data Engineering, IEEE Transactions on

Issue 2 • Date March-April 2000

Filter Results

Displaying Results 1 - 13 of 13
  • An implementation of logical analysis of data

    Page(s): 292 - 306
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (661 KB)  

    Describes a new, logic-based methodology for analyzing observations. The key features of this "logical analysis of data" (LAD) methodology are the discovery of minimal sets of features that are necessary for explaining all observations and the detection of hidden patterns in the data that are capable of distinguishing observations describing "positive" outcome events from "negative" outcome events. Combinations of such patterns are used for developing general classification procedures. An implementation of this methodology is described in this paper, along with the results of numerical experiments demonstrating the classification performance of LAD in comparison with the reported results of other procedures. In the final section, we describe three pilot studies on applications of LAD to oil exploration, psychometric testing and the analysis of developments in the Chinese transitional economy. These pilot studies demonstrate not only the classification power of LAD but also its flexibility and capability to provide solutions to various case-dependent problems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient evaluation of a fuzzy equi-join using fuzzy equality indicators

    Page(s): 225 - 237
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB)  

    Proposes a new measure of fuzzy equality (FE) comparison based on the similarity of possibility distributions. We define a type of fuzzy equi-join based on the new FE comparison and allow threshold values to be associated with predicates of the join condition. A sort-merge join algorithm based on a partial order of intervals is used to evaluate the fuzzy equi-join. In order for the evaluation to be efficient, we identify various mappings, called FE indicators, that determine appropriate intervals for fuzzy data with different characteristics. Experimental results from our preliminary simulation of the algorithm show a significant improvement of efficiency when FE indicators are used with the sort-merge join algorithm View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization and evaluation of disjunctive queries

    Page(s): 238 - 260
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB)  

    It is striking that the optimization of disjunctive queries-i.e. those which contain at least one OR-connective in the query predicate-has been vastly neglected in the literature, as well as in commercial systems. In this paper, we propose a novel technique, called bypass processing, for evaluating such disjunctive queries. The bypass processing technique is based on new selection and join operators that produce two output streams: the TRUE-stream with tuples satisfying the selection (join) predicate and the FALSE-stream with tuples not satisfying the corresponding predicate. Splitting the tuple streams in this way enables us to “bypass” costly predicates whenever the “fate” of the corresponding tuple (stream) can be determined without evaluating this predicate. In the paper, we show how to systematically generate bypass evaluation plans utilizing a bottom-up building-block approach. We show that our evaluation technique allows us to incorporate the standard SQL semantics of null values. For this, we devise two different approaches: one is based on explicitly incorporating three-valued logic into the evaluation plans; the other one relies on two-valued logic by “moving” all negations to atomic conditions of the selection predicate. We describe how to extend an iterator-based query engine to support bypass evaluation with little extra overhead. This query engine was used to quantitatively evaluate the bypass evaluation plans against the traditional evaluation techniques utilizing a CNFor DNF-based query predicate View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal data placement on disks: a comprehensive solution for different technologies

    Page(s): 324 - 330
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    The problem of optimally placing data on disks (ODP) to maximize disk-access performance has long been recognized as important. Solutions to this problem have been reported for some widely available disk technologies, such as magnetic CAV and optical CLV disks. However, important new technologies such as multizoned magnetic disks, have been recently introduced. For such technologies no formal solution to the ODP problem has been reported. In this paper, we first identify the fundamental characteristics of disk-device technologies which influence the solution to the ODP problem. We develop a comprehensive solution to the problem that covers all currently available disk technologies. We show how our comprehensive solution can be reduced to the solutions for existing disk technologies, contributing thus a solution to the ODP problem for multizoned disks. Our analytical solution has been validated through simulations and through its reduction to the known solutions for particular disks. Finally, we study how the solution for multizoned disks is affected by the disk and data characteristics View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of an integrated checkpointing and recovery scheme for distributed applications

    Page(s): 174 - 186
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB)  

    An integrated checkpointing and recovery scheme which exploits the low latency and high coverage characteristics of a concurrent error detection scheme is presented. Message dependency, which is the main source of multistep rollback in distributed systems, is minimized by using a new message validation technique derived from the notion of concurrent error detection. The concept of a new global state matrix is introduced to track error checking and message dependency in a distributed system and assist in the recovery. The analytical model, algorithms and data structures to support an easy implementation of the new scheme are presented. The completeness and correctness of the algorithms are proved. A number of scenarios and illustrations that give the details of the analytical model are presented. The benefits of the integrated checkpointing scheme are quantified by means of simulation using an object-oriented test framework View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Consistent schema version removal: an optimization technique for object-oriented views

    Page(s): 261 - 280
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB)  

    Powerful solutions enabling interoperability must allow applications to evolve and the requirements of shared databases to change, while minimizing such changes on other integrated applications. Several approaches have been proposed to make interoperability possible by using object-oriented techniques. These approaches may generate a large number of schema versions over time, resulting in an excessive build-up of classes and underlying object instances, not all being necessarily still in use. This results in degradation of system performance due to the view maintenance and the storage overhead costs. In this paper, we address the problem of removing obsolete view schemas. We characterize four potential problems of schema consistency that could be caused by the removal of a single derived class. We demonstrate that schema version removal is sensitive to the order in which individual classes are processed, and present a formal dependency model that captures all dependencies between classes as logic clauses and manipulates them to make decisions on class deletions and non-deletions while guaranteeing the consistency of the schema. We have also developed and proven consistent a dependency graph (DG) representation of the formal model. Lastly, we present a cost model for evaluating alternative removal patterns on a DG to assure selection of the optimal solution. The proposed techniques have been implemented in our Schema View Removal (SVR) tool. Lastly, we report experimental findings for applying our techniques for consistent schema version removal on the MultiView/TSE (Transparent Schema Evolution) system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient subgraph isomorphism detection: a decomposition approach

    Page(s): 307 - 323
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB)  

    Graphs are a powerful and universal data structure useful in various subfields of science and engineering. In this paper, we propose a new algorithm for subgraph isomorphism detection from a set of a priori known model graphs to an input graph that is given online. The new approach is based on a compact representation of the model graphs that is computed offline. Subgraphs that appear more than once within the same or within different model graphs are represented only once, thus reducing the computational effort to detect them in an input graph. In the extreme case where all model graphs are highly similar, the run-time of the new algorithm becomes independent of the number of model graphs. Both a theoretical complexity analysis and practical experiments characterizing the performance of the new approach are given View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hierarchical error detection in a software implemented fault tolerance (SIFT) environment

    Page(s): 203 - 224
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2536 KB)  

    Proposes a hierarchical error detection framework for a software-implemented fault tolerance (SIFT) layer of a distributed system. A four-level error detection hierarchy is proposed in the context of Chameleon, a software environment for providing adaptive fault tolerance in an environment of commercial off-the-shelf (COTS) system components and software. The design and implementation of a software-based distributed signature monitoring scheme, which is central to the proposed four-level hierarchy, is described. Both intra-level and inter-level optimizations that minimize the overhead of detection and are capable of adapting to runtime requirements are proposed. The paper presents results from a prototype implementation of two levels of the error detection hierarchy and results of a detailed simulation of the overall environment. The results indicate a substantial increase in availability due to the detection framework and help in understanding the tradeoffs between overhead and coverage for different combinations of techniques View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An architecture for survivable coordination in large distributed systems

    Page(s): 187 - 202
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (348 KB)  

    Coordination among processes in a distributed system can be rendered very complex in a large-scale system where messages may be delayed or lost and when processes may participate only transiently or behave arbitrarily, e.g. after suffering a security breach. In this paper, we propose a scalable architecture to support coordination in such extreme conditions. Our architecture consists of a collection of persistent data servers that implement simple shared data abstractions for clients, without trusting the clients or even the servers themselves. We show that, by interacting with these untrusted servers, clients can solve distributed consensus, a powerful and fundamental coordination primitive. Our architecture is very practical, and we describe the implementation of its main components in a system called Fleet View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The PSTR/SNS scheme for real-time fault tolerance via active object replication and network surveillance

    Page(s): 145 - 159
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2044 KB)  

    The TMO (Time-triggered Message-triggered Object) scheme was formulated as a major extension of the conventional object structuring schemes with the idealistic goal of facilitating general-form design and timeliness-guaranteed design of complex real-time application systems. Recently, as a new scheme for realizing TMO-structured distributed and parallel computer systems that are capable of both hardware and software fault tolerance, we have formulated and demonstrated the PSTR (Primary-Shadow TMO Replication) scheme. An important new extension of the PSTR scheme discussed in this paper is an integration of the PSTR scheme and a network surveillance (NS) scheme. This extension results in a significant improvement in the fault coverage and recovery time bound achieved. The NS scheme adopted is a recently-developed scheme that is effective in a wide range of point-to-point networks, and it is called the SNS (Supervisor-based Network Surveillance) scheme. The integration of the PSTR scheme and the SNS scheme is called the PSTR/SNS scheme. The recovery time bound of the PSTR/SNS scheme is analyzed on the basis of an implementation model that can be easily adapted to various commercial operating system kernels View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A generalized definition of rough approximations based on similarity

    Page(s): 331 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    This paper proposes new definitions of lower and upper approximations, which are basic concepts of the rough set theory. These definitions follow naturally from the concept of ambiguity introduced in this paper. The new definitions are compared to the classical definitions and are shown to be more general, in the sense that they are the only ones which can be used for any type of indiscernibility or similarity relation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Justification for inclusion dependency normal form

    Page(s): 281 - 291
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    Functional dependencies (FDs) and inclusion dependencies (INDs) are the most fundamental integrity constraints that arise in practice in relational databases. In this paper, we address the issue of normalization in the presence of FDs and INDs and, in particular, the semantic justification for an inclusion dependency normal form (IDNF), which combines the Boyce-Codd normal form with the restriction on the INDs that they be noncircular and key-based. We motivate and formalize three goals of database design in the presence of FDs and INDs: noninteraction between FDs and INDs, elimination of redundancy and update anomalies, and preservation of entity integrity. We show that (as for FDs), in the presence of INDs, being free of redundancy is equivalent to being free of update anomalies. Then, for each of these properties, we derive equivalent syntactic conditions on the database design. Individually, each of these syntactic conditions is weaker than IDNF and the restriction that an FD is not embedded in the right-hand side of an IND is common to three of the conditions. However, we also show that, for these three goals of database design to be satisfied simultaneously, IDNF is both a necessary and a sufficient condition View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The cost of recovery in message logging protocols

    Page(s): 160 - 173
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (460 KB)  

    Past research in message logging has focused on studying the relative overhead imposed by pessimistic, optimistic and causal protocols during failure-free executions. In this paper, we give the first experimental evaluation of the performance of these protocols during recovery. Our results suggest that applications face a complex tradeoff when choosing a message logging protocol for fault tolerance. On the one hand, optimistic protocols can provide fast failure-free execution and good performance during recovery, but are complex to implement and can create orphan processes. On the other hand, orphan-free protocols either risk being slow during recovery (e.g. sender-based pessimistic and causal protocols) or incur a substantial overhead during failure-free execution (e.g. receiver-based pessimistic protocols). To address this tradeoff, we propose hybrid logging protocols, which are a new class of orphan-free protocols. We show that hybrid protocols perform within 2% of causal logging during failure-free execution and within 2% of receiver-based logging during recovery View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

IEEE Transactions on Knowledge and Data Engineering (TKDE) informs researchers, developers, managers, strategic planners, users, and others interested in state-of-the-art and state-of-the-practice activities in the knowledge and data engineering area.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Jian Pei
Simon Fraser University