By Topic

Data Engineering, 1998. Proceedings., 14th International Conference on

Date 23-27 Feb. 1998

Filter Results

Displaying Results 1 - 25 of 76
  • Proceedings 14th International Conference on Data Engineering

    Publication Year: 1998
    Save to Project icon | Request Permissions | PDF file iconPDF (619 KB)  
    Freely Available from IEEE
  • WWW and the Internet - Did We Miss the Boat?

    Publication Year: 1998 , Page(s): 74
    Save to Project icon | Request Permissions | PDF file iconPDF (6 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 1998 , Page(s): 603 - 605
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB)  
    Freely Available from IEEE
  • Asynchronous version advancement in a distributed three version database

    Publication Year: 1998 , Page(s): 424 - 435
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (144 KB)  

    We present an efficient protocol for multi-version concurrency control in distributed databases. The protocol creates no more than three versions of any data item, while guaranteeing that: update transactions never interfere with read-only transactions; the version advancement mechanism is completely asynchronous with (both update and read-only) user transactions; and read-only transactions do not acquire locks and do not write control information into the data items being read. This is an improvement over existing multi-versioning schemes for distributed databases, which either require a potentially unlimited number of versions, or require coordination between version advancement and user transactions. Our protocol can be applied in a centralized system also, where the improvement over existing techniques is in reducing the number of versions from four to three. The proposed protocol is valuable in large applications that currently shut off access to the system while managing version advancement manually, but now have a need for automating this process and providing continuous access to the data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A tightly-coupled architecture for data mining

    Publication Year: 1998 , Page(s): 316 - 323
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (204 KB)  

    Current approaches to data mining are based on the use of a decoupled architecture, where data are first extracted from a database and then processed by a specialized data mining engine. This paper proposes instead a tightly-coupled architecture, where data mining is integrated within a classical SQL server. The premise of this work is a SQL-like operator, called MINE RULE. We show how the various syntactic features of the operator can be managed by either a SQL engine or a classical data mining engine; our main objective is to identify the border between typical relational processing, executed by the relational server, and data mining processing, executed by a specialized component. The resulting architecture exhibits portability at the SQL level and integration of inputs and outputs of the data mining operator with the database, and provides the guidelines for promoting the integration of other data mining techniques and systems with SQL servers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and performance of an assertional concurrency control system

    Publication Year: 1998 , Page(s): 436 - 445
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    Serializability has been widely accepted as the correctness criterion for databases subject to concurrent access. Serializable execution is generally implemented using a two phase locking algorithm that locks items in the database to delay transactions that care in danger of performing in a nonserializable fashion. Such delays are unacceptable in high performance database systems and in systems supporting long running transactions. A number of models have been proposed in which transactions are decomposed into smaller, atomic, interleavable steps. A shortcoming of much of this work is that little guidance is provided as to how transactions should be decomposed and what interleavings preserve correct execution. We previously proposed a new correctness criterion, weaker than serializability, that guarantees that each transaction satisfies its specification (A. Bernstein and P. Lewis, 1996). Based on that correctness criterion, we have designed and implemented a new concurrency control. Experiments using the new concurrency control demonstrate significant improvement in performance when lock contention is high View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distribution-based clustering algorithm for mining in large spatial databases

    Publication Year: 1998 , Page(s): 324 - 331
    Cited by:  Papers (15)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (124 KB)  

    The problem of detecting clusters of points belonging to a spatial point process arises in many applications. In this paper, we introduce the new clustering algorithm DBCLASD (Distribution-Based Clustering of LArge Spatial Databases) to discover clusters of this type. The results of experiments demonstrate that DBCLASD, contrary to partitioning algorithms such as CLARANS (Clustering Large Applications based on RANdomized Search), discovers clusters of arbitrary shape. Furthermore, DBCLASD does not require any input parameters, in contrast to the clustering algorithm DBSCAN (Density-Based Spatial Clustering of Applications with Noise) requiring two input parameters, which may be difficult to provide for large databases. In terms of efficiency, DBCLASD is between CLARANS and DBSCAN, close to DBSCAN. Thus, the efficiency of DBCLASD on large spatial databases is very attractive when considering its nonparametric nature and its good quality for clusters of arbitrary shape View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Encoded bitmap indexing for data warehouses

    Publication Year: 1998 , Page(s): 220 - 230
    Cited by:  Papers (11)  |  Patents (13)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (380 KB)  

    Complex query types, huge data volumes, and very high read/update ratios make the indexing techniques designed and tuned for traditional database systems unsuitable for data warehouses (DW). We propose an encoded bitmap indexing for DWs which improves the performance of known bitmap indexing in the case of large cardinality domains. A performance analysis and theorems which identify properties of good encodings for better performance are presented. We compare encoded bitmap indexing with related techniques, such as bit slicing, projection-, dynamic-, and range-based indexing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data logging: a method for efficient data updates in constantly active RAIDs

    Publication Year: 1998 , Page(s): 144 - 153
    Cited by:  Papers (4)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (268 KB)  

    RAIDs (Redundant Arrays of Independent Disks) are a set of disks organized to achieve parallel I/O to multiple disks and to provide tolerance of disk failures. RAIDs offer these advantages at the cost of additional space and additional disk I/O for writes. Previous methods of reducing this I/O overhead suffered from such problems as requiring periods during which data is reorganized and not available, destroying the physical locality of data, or weakening the RAID's fault-tolerance properties. We propose a new method called data logging which reduces the I/O overhead without requiring periodic downtime for reorganization. Instead, incremental maintenance can be performed concurrently with routine processing. This is particularly advantageous in applications requiring “24×7” uptime. Data logging preserves both physical locality of data and RAID fault tolerance. The major cost of our method is a moderate amount of nonvolatile RAM. This paper describes our method, as well as two schemes for efficient encoding of the information that must be stored in nonvolatile RAM View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Failure handling and coordinated execution of concurrent workflows

    Publication Year: 1998 , Page(s): 334 - 341
    Cited by:  Papers (1)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    Workflow management systems (WFMSs) coordinate the execution of applications distributed over networks. In WFMSs, data inconsistencies can arise due to: the interaction between steps of concurrent threads within a workflow (intra-workflow coordination); the interaction between steps of concurrent workflows (inter-workflow coordination); and the presence of failures. Since these problems have not received adequate attention, this paper focuses on developing the necessary concepts and infrastructure to handle them. First, to deal with inter- and intra-workflow coordination requirements we have identified a set of high level building blocks. Secondly, to handle failures we propose a novel and pragmatic approach called opportunistic compensation and re-execution that allows a workflow designer to customize workflow recovery from correctness as well as performance perspectives. Thirdly based on these concepts we have designed a workflow specification language that expresses new requirements for workflow executions and implemented a run-time system for managing workflow executions while satisfying the new requirements. These ideas are geared towards improving the modeling and correctness properties offered by WFMSs and making them more robust and flexible View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Global integration of visual databases

    Publication Year: 1998 , Page(s): 542 - 549
    Cited by:  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (132 KB)  

    Different visual databases have been designed in various locations. The global integration of such databases can enable users to access data across the world in a transparent manner. In this paper, we investigate an approach to the design and creation of an integrated information system which supports global visual query access to various visual databases over the Internet. Specifically, a metaserver, including a hierarchical metadatabase, a metasearch agent and a query manager, is designed to support such an integration. The metadatabase houses abstracted data about individual remote visual databases. To support visual content-based queries, the abstracted data in the metadatabase reflect the semantics of each visual database. The query manager extracts the feature contents from the queries. The metasearch agent processes the queries by matching their feature contents with the metadata. A list of relevant database sites is derived for efficient retrieval of the query in the selected databases. The performance of the system is refined based on the user's feedback. The proposed system is implemented using Java in a Web-based environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mining for strong negative associations in a large database of customer transactions

    Publication Year: 1998 , Page(s): 494 - 502
    Cited by:  Papers (32)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    Mining for association rules is considered an important data mining problem. Many different variations of this problem have been described in the literature. We introduce the problem of mining for negative associations. A naive approach to finding negative associations leads to a very large number of rules with low interest measures. We address this problem by combining previously discovered positive associations with domain knowledge to constrain the search space such that fewer but more interesting negative rules are mined. We describe an algorithm that efficiently finds all such negative associations and present the experimental results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query processing in a video retrieval system

    Publication Year: 1998 , Page(s): 276 - 283
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB)  

    A.P. Sistla et al. (1997) designed a similarity-based video retrieval system. Queries were specified in a language called the Hierarchical Temporal Language (HTL). In this paper, we present several extensions of HTL. These extensions include queries that can have the negation operator and any other logical and temporal operators such as disjunction. Efficient algorithms for processing queries in the extended language are also presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Junglee: integrating data of all shapes and sizes

    Publication Year: 1998
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8 KB)  

    Junglee Corp. is engaged in the field of data integration. We develop general technology for integrating data, bridging many dimensions of heterogeneity, and have then applied the technology to vertical application areas. This paper covers the technology and applications that are being developed at Junglee Corp View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Persistent applications using generalized redo recovery

    Publication Year: 1998 , Page(s): 154 - 163
    Cited by:  Papers (2)  |  Patents (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1560 KB)  

    We describe how to recover applications after system crashes using database recovery. Earlier efforts, based on frequent application checkpoints and/or logging values read, are very expensive. We treat application state as a cached object and log application execution as operations in the recovery framework of D. Lomet and M. Tuttle (1995). Logging application execution does not require logging the application state. Further logged application reads are mostly logical operations in which only the data source identity is logged. We describe a cache manager that handles the flush order dependencies introduced by these log operations and a recovery process that restores application state by replaying the application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Back to the future: dynamic hierarchical clustering

    Publication Year: 1998 , Page(s): 578 - 587
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1420 KB)  

    Describes a new method for dynamically clustering hierarchical data which maintains good clustering within disk pages in the presence of insertions and deletions. This simple but effective method, which we call Enc, encodes the insertion order of children with respect to their parents and concatenates the insertion numbers to form a compact key for the data. This compact key is stored only in the indexing structure and does not affect the logical database schema. Experimental results show that our Enc method is very efficient for hierarchical queries and performs reasonably well for random access queries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Methodical restructuring of complex workflow activities

    Publication Year: 1998 , Page(s): 342 - 350
    Cited by:  Papers (4)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (216 KB)  

    We describe a family of activity-split and activity-join operations with a notion of validity. The key idea of introducing the set of activity-split and activity-join operations is to allow users to restructure ongoing activities in anticipation of uncertainty so that any significant performance loss due to unexpected unavailablity or delay of shared resources can be avoided or reduced through release of early committed resources or transferring ownership of uncommitted resources. To guarantee the correctness of new activities generated by activity-split or activity-join operations, we define the notion of validity of activity restructuring operations and identify the cases where the correctness is ensured and the cases where activity-split or activity-join are illegal due to the inconsistency incurred View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Compressing relations and indexes

    Publication Year: 1998 , Page(s): 370 - 379
    Cited by:  Papers (6)  |  Patents (15)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (252 KB)  

    We propose a new compression algorithm that is tailored to database applications. It can be applied to a collection of records, and is especially effective for records with many low to medium cardinality fields and numeric fields. In addition, this new technique supports very fast decompression. Promising application domains include decision support systems (DSS), since fact tables, which are by far the largest tables in these applications, contain many low and medium cardinality fields and typically no text fields. Further, our decompression rates are faster than typical disk throughputs for sequential scans; in contrast, gzip is slower. This is important in DSS applications, which often scan large ranges of records. An important distinguishing characteristic of our algorithm, in contrast to compression algorithms proposed earlier, is that we can decompress individual tuples (even individual fields), rather than a full page (or an entire relation) at a time. Also, all the information needed for tuple decompression resides on the same page with the tuple. This means that a page can be stored in the buffer pool and used in compressed form, simplifying the job of the buffer manager and improving memory utilization. Our compression algorithm also improves index structures such as B-trees and R-trees significantly by reducing the number of leaf pages and compressing index entries, which greatly increases the fan-out. We can also use lossy compression on the internal nodes of an index View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient discovery of functional and approximate dependencies using partitions

    Publication Year: 1998 , Page(s): 392 - 401
    Cited by:  Papers (10)  |  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    Discovery of functional dependencies from relations has been identified as an important database analysis technique. We present a new approach for finding functional dependencies from large databases, based on partitioning the set of rows with respect to their attribute values. The use of partitions makes the discovery of approximate functional dependencies easy and efficient, and the erroneous or exceptional rows can be identified easily. Experiments show that the new algorithm is efficient in practice. For benchmark databases the running times are improved by several orders of magnitude over previously published results. The algorithm is also applicable to much larger datasets than the previous methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processing incremental multidimensional range queries in a direct manipulation visual query environment

    Publication Year: 1998 , Page(s): 458 - 465
    Cited by:  Papers (1)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    We have developed a MultiMedia Visual Information Seeking (MMVIS) Environment designed to support an integrated approach to direct manipulation temporal querying and browsing of temporal relationship results. We address the optimization of queries specified via our visual query interface. Queries in MMVIS are incrementally specified and continuously refined multidimensional range queries. We present our k-Array index structure and its bucket based counterpart, the k-Bucket, as new indexes optimized for processing these direct manipulation queries. In an experimental evaluation comparing our k-Array and k-Bucket solutions to alternate techniques from the literature, we show that the k-Bucket performs generally equal to or better than the other techniques and is the best overall approach for such environments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Alps at your fingertips: virtual reality and geoinformation systems

    Publication Year: 1998 , Page(s): 550 - 557
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (96 KB)  

    Advocates a desktop virtual reality (VR) interface to a geographic information system (GIS). The navigational capability to explore large topographic scenes is a powerful metaphor and a natural way of interacting with a GIS. VR systems succeed in providing visual realism and real-time navigation and interaction, but fail to cope with very large amounts of data and to provide the general functionality of information systems. We suggest a way to overcome these problems. We describe a prototype system, called ViRGIS (Virtual Reality GIS), that integrates two system platforms: a client that runs the VR component interacts via a (local or wide area) network with a server that runs an object-oriented database containing geographic data. For the purpose of accessing data efficiently, we describe how to integrate a geometric index into the database, and how to perform the operations that are requested in a real-time trip through the virtual world View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ending the ROLAP/MOLAP debate: usage based aggregation and flexible HOLAP

    Publication Year: 1998
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4 KB)  

    Summary form only given, as follows. Over the past few years, OLAP vendors have engaged in a debate regarding relational versus multidimensional data stores. This debate has obscured the more significant problems facing today's OLAP customers: managing the exponential growth generated by multidimensional pre-aggregations, and architectural support for a wide array of OLAP data models. Microsoft discusses several aspects of its upcoming OLAP Server product, placing special emphasis on these areas. Solutions for managing voluminous pre-aggregates are discussed in the context of understanding of the dynamics of the data explosion problem, and a partial aggregation scheme that is adjusted according to user query needs. Flexible Hybrid OLAP is discussed as a compelling solution to a wide array of user needs and data requirements, with a focus on understanding the many different meanings associated with Hybrid OLAP and the strengths and weaknesses of each View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The effect of buffering on the performance of R-trees

    Publication Year: 1998 , Page(s): 164 - 171
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    Past R tree studies have focused on the number of nodes visited as a metric of query performance. Since database systems usually include a buffering mechanism, we propose that the number of disk accesses is a more realistic measure of performance. We develop a buffer model to analyze the number of disk accesses required for spatial queries using R trees. The model can be used to evaluate the quality of R tree update operations, such as various node splitting and tree restructuring policies, as measured by query performance on the resulting tree. We use our model to study the performance of three well known R tree packing algorithms. We show that ignoring buffer behavior and using number of nodes accessed as a performance metric can lead to incorrect conclusions, not only quantitatively, but also qualitatively. In addition, we consider the problem of how many levels of the R tree should be pinned in the buffer View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost and imprecision in modeling the position of moving objects

    Publication Year: 1998 , Page(s): 588 - 596
    Cited by:  Papers (23)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (184 KB)  

    Consider a database that represents the location of moving objects, such as taxi-cabs (typical query: “retrieve the cabs that are currently within 1 mile of 33 Michigan Ave., Chicago”), or objects in a battle-field. Existing database management systems (DBMSs) are not well equipped to handle continuously changing data, such as the position of moving objects, since data is assumed to be constant unless it is explicitly modified. In this paper, we address position-update policies and imprecision. Assuming that the actual position of a moving object m deviates from the position computed by the DBMS, when should m update its position in the database in order to eliminate the deviation? Furthermore, how can the DBMS provide a bound on the error (i.e. the deviation) when it replies to a query, such as: “what is the current position of m?” We propose a cost-based approach to update policies that answers both questions. We develop several update policies and analyze them theoretically and experimentally View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Red Brick VistaTM: aggregate computation and management

    Publication Year: 1998 , Page(s): 174 - 177
    Cited by:  Papers (2)  |  Patents (19)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    Aggregate query processing in large data warehouses is computationally intensive. Precomputation is an approach that can be used to speed up aggregate queries. However, in order to make precomputation a truly viable solution to the aggregate query processing problem, it is important to identify the best set of aggregates to precompute and to use these precomputed aggregates effectively. The Red Brick aggregate computation and management system (Red Brick Vista) provides a complete server integrated solution to these problems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.