By Topic

Data Engineering, 1994. Proceedings.10th International Conference

Date 14-18 Feb. 1994

Filter Results

Displaying Results 1 - 25 of 56
  • Proceedings of 1994 IEEE 10th International Conference on Data Engineering

    Save to Project icon | Request Permissions | PDF file iconPDF (82 KB)  
    Freely Available from IEEE
  • Exploiting uniqueness in query optimization

    Page(s): 68 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (924 KB)  

    Consider an SQL query that specifies duplicate elimination via a DISTINCT clause. Because duplicate elimination often requires an expensive sort of the query result, it is often worthwhile to identify unnecessary DISTINCT clauses and avoid the sort altogether. We prove a necessary and sufficient condition for deciding if a query requires duplicate elimination. The condition exploits knowledge about keys, table constraints, and query predicates. Because the condition cannot always be tested efficiently, we offer a practical algorithm that tests a simpler, sufficient condition. We consider applications of this condition for various types of queries, and show that we can exploit this condition in both relational and nonregulation database systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance analysis of RAIDS disk arrays with a vacationing server model for rebuild mode operation

    Page(s): 111 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (752 KB)  

    We analyze the performance of RAIDS disk arrays in normal, degraded, and rebuild modes. The analysis, which is shown to be highly accurate through validation against simulation results, achieves its accuracy by (1) modeling detailed disk characteristics; (2) developing a simple approximation to compute the mean response time for fork-join requests arising in degraded mode operation; and (3) using a vacationing server model with multiple vacation types for rebuild mode analysis. According to this model vacations (rebuild reads) are started when the server (disk) becomes idle and are repeated until the arrival of an external disk request. Type one (two) vacations correspond to the reading of the first track which requires a seek (successive tracks requiring no seeks). The analytic solution is used to quantify the effect of different rebuild options, such as read redirection and the split-seek option View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Index structures for information filtering under the vector space model

    Page(s): 337 - 347
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (792 KB)  

    The authors study what data structures and algorithms can be used to efficiently perform large-scale information filtering under the vector space model, a retrieval model established as being effective. They apply the idea of the standard inverted index to index user profiles. They devise an alternative to the standard inverted index, in which they, instead of indexing every term in a profile, select only the significant ones to index. They evaluate their performance and show that the indexing methods require orders of magnitude fewer I/Os to process a document than when no index is used. They also show that the proposed alternative performs better in terms of I/O and CPU processing time in many cases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient organization of large multidimensional arrays

    Page(s): 328 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (640 KB)  

    Large multidimensional arrays are widely used in scientific and engineering database applications. The authors present methods of organizing arrays to make their access on secondary and tertiary memory devices fast and efficient. They have developed four techniques for doing this: (1) storing the array in multidimensional “chunks” to minimize the number of blocks fetched, (2) reordering the chunked array to minimize seek distance between accessed blocks, (3) maintaining redundant copies of the array, each organized for a different chunk size and ordering and (4) partitioning the array onto platters of a tertiary memory device so as to minimize the number of platter switches. The measurements on real data obtained from global change scientists show that accesses on arrays organized using these techniques are often an order of magnitude faster than on the unoptimized data View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data management in delayed conferencing

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (24 KB)  

    Abstract only given, as follows. Video conferencing has become an alternative way to get people to communicate with each other without having to travel long distances. Though it has proven to be very useful in many occasions, in order to have a successful video conference, (1) it requires preliminary scheduling to have all involved parties present at the same time, (2) all parties need to be fully prepared and give quick responses to minimise expensive “dead space”, and (3) there never seems to be enough communication bandwidth. Of course, as is common in most conference settings, people inevitably try to rush for conclusions near the end. With the above in mind, we are developing new data management techniques to conduct video conferences in which all parties need not be present at the same time, people can communicate at their own pace, and network bandwidth is utilized more effectively. We briefly describe the basic ideas behind such a system and show a prototype system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel approaches to database management

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (28 KB)  

    Abstract only given, as follows. A variety of parallel approaches have been used to support database processing across a spectrum of machine architectures. We begin by describing areas where parallelism is potentially important in dealing with very large databases, including loading, query/update, and database administration. We then discuss hardware tradeoffs, including multicomputers versus multiprocessors, distributed versus centralized memory, and specialised versus general-purpose architectures. At the software level, we cover a number of approaches, including running multiple transactions in parallel, decomposing queries into parallel subqueries, executing low-level query operations in parallel, running multiple instances of the DBMS, and partitioning data over disks. We characterise the impact of these approaches on performance, scalability, and ease of use, for both decision support and transaction processing. Finally, the approaches taken in several commercial DBMSs are described, as well as extensions such as the Kendall Square Query Decomposer View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On a more realistic lock contention model and its analysis

    Page(s): 2 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (640 KB)  

    Most performance modeling studies of lock contention in transaction processing systems are deficient in that they postulate a homogeneous database access model. The non-homogeneous database access model described in this paper allows multiple transaction classes with different access patterns to the database regions. The performance of the system from the viewpoint of lock contention is analyzed in the context of the standard two-phase locking concurrency control method with the general waiting policy. The approximate analysis is based on mean values of parameters and derives expressions for the probability of lock conflict (usually leading to transaction blocking) and the mean blocking time. The latter requires estimating the distribution of the effective wait-depth encountered by blocked transactions and the mean waiting time associated with different blocking levels. The accuracy of the analysis is validated against simulation results and also shown to be more accurate than analytic solutions considering only two levels of transaction blocking. Previously proposed metrics for load control have limited applicability for the model under consideration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The TP-Index: a dynamic and efficient indexing mechanism for temporal databases

    Page(s): 274 - 281
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB)  

    To support temporal operators efficiently, indexing based on temporal attributes must be supported. The authors propose a dynamic and efficient index scheme called the time polygon (TP-index) for temporal databases. In the scheme, temporal data are mapped into a two-dimensional temporal space, where the data can be clustered based on time. The date space is then partitioned into time polygons where each polygon corresponds to a data page. The time polygon directory can be organized as a hierarchical index. The index handles long duration temporal data elegantly and efficiently. The performance analysis indicates that the time polygon index is efficient both in storage utilization and query search View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Polymorphic reuse mechanisms for object-oriented database specifications

    Page(s): 180 - 189
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (644 KB)  

    A polymorphic approach to the incremental design and reuse of object-oriented methods and query specifications is presented. Using this approach, the effort required for manually reprogramming methods and queries due to schema modifications can be avoided or minimized. The salient features of of our approach are the use of propagation patterns and a mechanism for propagation pattern refinement. Propagation patterns can be employed as an interesting specification formalism for modeling operational requirements in object-oriented database systems. They encourage the reuse of operational specifications against the structural modification of an object-oriented schema. Propagation pattern refinement is suited for the specification of reusable operational modules, and for achieving reusability of propagation patterns towards the operational requirement changes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cooperative problem solving using database conversations

    Page(s): 134 - 143
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (872 KB)  

    Cooperative problem solving is a joint style of producing and consuming data. Unfortunately, most database mechanisms developed so far; are more appropriate for competitive usage than for a cooperative working style. They mostly adopt an operational point of view which binds data to applications. Data-oriented mechanisms like check-in/out avoid this binding but do not improve synchronization towards concurrent usage of data. Conversations are an application-independent, tight framework for jointly modifying common data. The idea is to create transaction-spanning conversational working stages that organize different contributions instead of serializing accesses. To illustrate the conversation concept, an extended query language with conversational operations is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A hybrid transitive closure algorithm for sequential and parallel processing

    Page(s): 498 - 505
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (536 KB)  

    A new hybrid algorithm is proposed for well-formed path problems including the transitive closure problem. The CPU time for computation is O(ne), and blocking technique is incorporated to reduce the disk I/O cost in disk-resident environment. The new features of the new algorithm are that only parents sets instead of descendant sets are loaded in from disk, and the computation can be parallelized efficiently. Simulation results show that our algorithm is superior to other existing algorithms in sequential computation, and that linear speedup is achieved in parallel computation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of common subexpression exploitation models in multiple-query processing

    Page(s): 488 - 497
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (748 KB)  

    In multiple-query processing, a subexpression that appears in more than one query is called a common subexpression (CSE). A CSE needs to he evaluated once only to produce a temporary result that can then be used to evaluate all the queries containing the CSE. Therefore, the cost of evaluating the CSE is amortized over the queries requiring its evaluation. Two queries, posed simultaneously to the optimizer, may however contain subexpression that are not equivalent but are, nevertheless related by implication (the extension of one is a proper subset of the other) or intersection (the intersection of the two extensions is a proper subset of both extensions). In order to exploit the opportunity for cost amortization offered by the two latter relationships. the optimizer must rewrite the two queries in such a way that a CSE is induced. This paper compares, empirically and analytically, the performance of the various query execution models that are implied by different approaches to query rewriting View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A query sampling method for estimating local cost parameters in a multidatabase system

    Page(s): 144 - 153
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (732 KB)  

    In a multidatabase system (MDBS), some query optimization information related to local database systems may not be available at the global level because of local autonomy. To perform global query optimization, a method is required to derive the necessary local information. This paper presents a new method that employs a query sampling technique to estimate the cost parameters of an autonomous local database system. We introduce a classification for grouping local queries and suggest a cost estimation formula for the queries in each class. We present a procedure to draw a sample of queries from each class and use the observed costs of sample queries to determine the cost parameters by multiple regression. Experimental results indicate that the method is quite promising for estimating the cost of local queries in an MDBS View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantics-based multilevel transaction management in federated systems

    Page(s): 452 - 461
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (844 KB)  

    A federated database management system (FDBMS) is a special type of distributed database system that enables existing local databases, in a heterogeneous environment, to maintain a high degree of autonomy. One of the key problems in this setting is the coexistence of local transactions and global transactions, where the latter access and manipulate data of multiple local databases. In modeling FDBMS transaction executions the authors propose a more realistic model than the traditional read/write model; in their model a local database exports high-level operations which are the only operations distributed global transactions can execute to access data in the shared local databases. Such restrictions are not unusual in practice as, for example, no airline or bank would ever permit foreign users to execute ad hoc queries against their databases for fear of compromising autonomy. The proposed architecture can be elegantly modeled using the multilevel nested transaction model for which a sound theoretical foundation exists to prove concurrent executions correct. A multilevel scheduler that is able to exploit the semantics of exported operations can significantly increase concurrency by ignoring pseudo conflicts. A practical scheduling mechanism for FDBMSs is described that offers the potential for greater performance and more flexibility than previous approaches based on the read/write model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data placement and buffer management for concurrent mergesorts with parallel prefetching

    Page(s): 418 - 427
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (748 KB)  

    Various data placement policies are studied for the merge phase of concurrent mergesorts using parallel prefetching, where initial sorted runs (input) of a merge and its final sorted run (output) are stored on multiple disks but each run resides only on a single disk. Since the merge phase involves only sequential references, parallel prefetching can be attractive an reducing the average response time for concurrent merges. However, without careful buffer control, severe thrashing may develop under certain run placement policies, reducing the benefits of prefetching. The authors examine through detailed simulations three different run placement policies. The results show that even though buffer thrashing can be almost avoided by placing the output run of a job on the same disk with at least one of its input runs, this thrashing-avoiding run placement policy can be substantially outperformed by other policies that use buffer thrashing control. With buffer thrashing avoidance, the best performance as achieved by a run placement policy that uses a proper subset of disks dedicated for writing the output runs while the rest of the disks are used for prefetching the input runs in parallel View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast ranking in limited space

    Page(s): 428 - 437
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (744 KB)  

    Ranking techniques have long been suggested as alternatives to conventional Boolean methods for searching document collections. The cost of computing a ranking is, however, greater than the cost of performing a Boolean search, in terms of both memory space and processing time. The authors consider the resources required by the cosine method of ranking, and show that, with a careful application of indexing and selection techniques, both the space and the time required by ranking can be substantially reduced. The methods described in the paper have been used to build a retrieval system with which it is possible to process ranked queries of 40 terms in about 5% of the space required by previous implementations; in as little as 25% of the time; and without measurable degradation in retrieval effectiveness View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Declustering techniques for parallelizing temporal access structures

    Page(s): 232 - 242
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (932 KB)  

    This paper addresses the issues of declustering temporal index and access structures for a single processor multiple independent disk architecture. The temporal index is the Monotonic B+-Tree which uses the time index temporal access structure. We devise a new algorithm, called multi-level round robin, for assigning tree nodes to multiple disks. The multi-level round robin declustering technique takes advantage of the append-only nature of temporal databases to achieve uniform load distribution, decrease response time, and increase the fanout of the tree by eliminating the need to store disk numbers within the tree nodes. We propose two declustering techniques for the time index access structures; one considers only time proximity while declustering, whereas the other considers both time proximity and data size. We investigate their performance over different types of temporal queries and show that various temporal queries have conflicting allocation criteria for the time index buckets. In addition, we devise two disk partition techniques for the time index buckets. The mutually exclusive technique partitions the disks into disjoint groups, whereas the shared disk technique allows the different types of buckets to share all disks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specification and management of extended transactions in a programmable transaction environment

    Page(s): 462 - 473
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1088 KB)  

    A Transaction Specification and Management Environment (TSME) is a transaction processing system toolkit that supports the definition and construction of application-specific extended transaction models (ETMs). The TSME provides a transaction specification language that allows a transaction model designer to create implementation-independent specifications of extended transactions. In addition, the TSME provides a programmable transaction management mechanism that assembles and configures a run-time environment to support specified ETMs. The authors discuss the TSME in the context of a distributed object management system (DOMS), and describe specifications of extended transactions and corresponding configurations of transaction management mechanisms View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • X.500 Directory Schema management

    Page(s): 393 - 400
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (628 KB)  

    The X.500 Directory Service provides a powerful mechanism for storing and retrieving information about objects in a distributed computing environment. This requires the functional components of the Directory Service to have knowledge of the structure and representation, or schema, of the information held within the directory. The management of the Directory Schema is a subject requiring further research and development. We identify the three major technical elements required for properly managing the Directory Schema. We then focus on one of these elements; propagation of the schema between functional components of the Directory Service. Three subproblems from within schema propagation are presented, along with several alternative solutions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of grid based multi-attribute record declustering methods

    Page(s): 356 - 365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB)  

    We focus on multi-attribute declustering methods which are based on some type of grid-based partitioning of the data space. Theoretical results are derived which show that no declustering method can be strictly optimal for range queries if the number of disks is greater than 5. A detailed performance evaluation is carried out to see how various declustering schemes perform under a wide range of query and database scenarios (both relative to each other and to the optimal). Parameters that are varied include shape and size of queries, database size, number of attributes and the number of disks. The results show that information about common queries on a relation is very important and ought to be used in deciding the declustering for it, and that this is especially crucial for small queries. Also, there is no clear winner, and as such parallel database systems must support a number of declustering methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Comparing and synthesizing integrity checking methods for deductive databases

    Page(s): 214 - 222
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (580 KB)  

    We compare and synthesize different methods for integrity checking in deductive databases. First, we state simplified integrity checking for deductive databases independently of the particular strategy used by different methods found in the literature. In accordance with this statement, we classify integrity checking methods into two main groups: methods with a generation phase without fact access and methods with a generation phase with fact access. Then, we propose an implementation scheme (a metaprogram) where the differences and similarities among the methods can be pointed out. In this common implementation framework, we compare the methods; this comparison is based on the number of facts accessed by each of them during integrity checking. Finally and from the analysis of the results, we define a convergence method which synthesizes some different features from several methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supporting high-bandwidth navigation in object-bases

    Page(s): 294 - 301
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (760 KB)  

    Magritte is an attempt to construct a high-bandwidth front-end to an object-base containing meta-data about SCAD designs. SCAD is a small part of a family of visualization applications where the end-user concurrently manipulates large collections of active data. Such end-user interfaces require a different paradigm of interaction than the object-at-a-time interfaces of current databases. Proposals here can be divided into mechanisms for scene creation and those for scene integration. The former allow a user to create a single scene with ease. The latter help in desktop management by allowing scenes to be combined and correlated. The implementation experience points out a number of shortcomings in current database offerings that need to be solved so as to ease the design of high-bandwidth front-ends View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resolving attribute incompatibility in database integration: an evidential reasoning approach

    Page(s): 154 - 163
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (776 KB)  

    Resolving domain incompatibility among independently developed databases often involves uncertain information. DeMichiel (1989) showed that uncertain information can be generated by the mapping of conflicting attributes to a common domain, based on some domain knowledge. The authors show that uncertain information can also arise when the database integration process requires information not directly represented in the component databases, but can be obtained through some summary of data. They therefore propose an extended relational model based on Dempster-Shafer theory of evidence (1976) to incorporate such uncertain knowledge about the source databases. They also develop a full set of extended relational operations over the extended relations. In particular, an extended union operation has been formalized to combine two extended relations using Dempster's rule of combination. The closure and boundedness properties of the proposed extended operations are formulated View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A multi-set extended relational algebra: a formal approach to a practical issue

    Page(s): 80 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (592 KB)  

    The relational data model is based on sets of tuples, i.e. it does not allow duplicate tuples an a relation. Many database languages and systems do require multi-set semantics though, either because of functional requirements or because of the high costs of duplicate removal in database operations. Several proposals have been presented that discuss multi-set semantics. As these proposals tend to be either rather practical, lacking the formal background, or rather formal, lacking the connection to database practice, the gap between theory and practice has not been spanned yet. This paper proposes a complete extended relational algebra with multi-set semantics, having a clear formal background and a close connection to the standard relational algebra. It includes constructs that extend the algebra to a complete sequential database manipulation language that can either be used as a formal background to other multi-set languages like SQL, or as a database manipulation language on its own. The practical usability of the latter option has been demonstrated in the PRISMA/DB database project, where a variant of the language has been used as the primary database language View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.