By Topic

Database Engineering and Application Symposium, 2005. IDEAS 2005. 9th International

Date 25-27 July 2005

Filter Results

Displaying Results 1 - 25 of 55
  • Proceedings. 9th International Database Engineering and Applications Symposium (IDEAS 2005)

    Save to Project icon | Request Permissions | PDF file iconPDF (148 KB)  
    Freely Available from IEEE
  • 9th International Database Engineering & Application Symposium - Title Page

    Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (52 KB)  
    Freely Available from IEEE
  • 9th International Database Engineering & Application Symposium - Copyright Page

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (41 KB)  
    Freely Available from IEEE
  • 9th International Database Engineering & Application Symposium - Table of contents

    Page(s): v - viii
    Save to Project icon | Request Permissions | PDF file iconPDF (54 KB)  
    Freely Available from IEEE
  • Foreword

    Page(s): ix
    Save to Project icon | Request Permissions | PDF file iconPDF (24 KB)  
    Freely Available from IEEE
  • Preface

    Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE
  • Program Committee Reviewers

    Page(s): xi
    Save to Project icon | Request Permissions | PDF file iconPDF (34 KB)  
    Freely Available from IEEE
  • External reviewers

    Page(s): xii - xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • Incremental methods for simple problems in time series: algorithms and experiments

    Page(s): 3 - 14
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB) |  | HTML iconHTML  

    A time series (or equivalently a data stream) consists of data arriving in time order. Single or multiple data streams arise in fields including physics, finance, medicine, and music, to name a few. Often the data comes from sensors (in physics and medicine for example) whose data rates continue to improve dramatically as sensor technology improves and as the number of sensors increases. So fast algorithms become ever more critical in order to distill knowledge from the data. This paper presents our recent work regarding the incremental computation of various primitives: windowed correlation, matching pursuit, sparse space discovery and elastic burst detection. The incremental idea reflects the fact that recent data is more important than older data. Our StatStream system contains an implementation of these algorithms, permitting us to do empirical studies on both simulated and real data. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Detection for conflicts of dependencies in advanced transaction models

    Page(s): 17 - 26
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB) |  | HTML iconHTML  

    Transactional dependencies play an important role in coordinating the execution of subtransactions in advanced transaction models, such as, nested transactions and workflow transactions. The correct execution of the advanced transactions depends on ensuring the satisfaction of all the dependencies, which are specified by the application developer. Incorrect specification of transaction dependencies might lead to information integrity problems and unavailability of resources. An example of incorrect specification of dependencies is the presence of conflicts - the satisfaction of constraints imposed by one dependency may violate the constraints imposed by another dependency. Algorithms that can analyze and detect dependency conflicts are necessary. Although a lot of research appears on advanced transactions, no previous work has been done on analysis of dependency conflicts. In this work, we analyze different kinds of dependency conflicts, propose algorithms to detect and remove the conflicts of dependencies in advanced transaction specifications. This will enable the application developer to get assurance about the correctness of the dependency specification and the correct behavior of the underlying advanced transaction model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building information systems by orchestrating open services

    Page(s): 27 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5112 KB) |  | HTML iconHTML  

    Service oriented computing has gained a considerable momentum as a new paradigm for building enterprise information systems. Notable efforts have been made recently from both researchers and industrials to support the construction of service-based applications, nevertheless several issues still need to be tackled including service definition and adaptation, and services orchestration. This work proposes an approach for building and finely orchestrating open and adaptable services. An open service is represented by a workflow that coordinates calls to service provider methods. Thereby component activities and the way they are synchronized are rendered visible. Service adaptability refers to the possibility to modify an open service. Through adaptation operations a service can be customized according to given user (application) requirements. In order to finely orchestrate services, they are associated with entry points. An entry point acts as a gateway for inserting and getting information about the progress of service execution. Defined services and orchestration are verified to ensure a correct behaviour of the resulting application. The paper details our approach for building and orchestrating services, and presents associated architectural choices. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Persistent middle tier components without logging

    Page(s): 37 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2848 KB) |  | HTML iconHTML  

    Enterprise applications need to be highly available and scalable. In the past, this has required "stateless" applications, which essentially require the application to manage its state explicitly by storing it in transactional resource managers. Despite "stateful" applications being more natural and hence easier to write and get correct, having the system manage this state automatically has been considered too difficult and too costly. The Phoenix/App system showed how to manage state in stateful applications transparently, by logging interactions between components, guaranteeing "exactly once" execution of the application. By introducing some minor restrictions on Phoenix/App components, no logging need be done for middle tier components, thus making it easy to provide both availability and scalability. Because there is no logging, the performance of failure free application executions is excellent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the intersection of XPath expressions

    Page(s): 49 - 57
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (240 KB) |  | HTML iconHTML  

    XPath is a common language for selecting nodes in an XML document. XPath uses so called path expressions which describe a navigation path through semistructured data. In the last years some of the characteristics of XPath have been discussed. Examples include the containment of two XPath expressions p and p' (p ⊆ p'). To the best of our knowledge the intersection of two XPath expressions (p ∩ p') has not been treated yet. The intersection of p and p' is the set that contains all XML nodes that are selected both by p and p'. In the context of indexes in XML databases the emptiness of the intersection of p and p' is a major issue when updating the index. In order to keep the index consistent to the indexed data, it has to be detected if an index that is defined upon p is affected by a modifying database operation with the path expression p'. In this paper, we introduce the intersection problem for XPath and give a motivation for its relevance. We present an efficient intersection algorithm for XPath expressions without the NOT operator that is based on finite automata. For expressions that contain the NOT operator the intersection problem becomes NP-complete leading to exponential computations in general. With an average case simulation we show that the NP-completeness is no significant limitation for most real-world database operations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Querying with negation in data integration systems

    Page(s): 58 - 64
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    Data integration is the problem of combining data residing at different sources, and providing the user with a unified view of these data. It is characterized by an architecture based on a global schema, with the set of integrity constraints, and a set of sources. In this paper, we investigate the way in which Closed World Assumption on a source data base can be coherently propagated to the global schema. The problem to resolve is directly connected by the fact that a global schema has a number (possibly infinite) of minimal models, caused by the incompleteness of source databases w.r.t. the integrity constraints over global schema. The aim of this preliminary work is to open the perspective for query language with negation in data integration framework. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of queries on tree-structured data using dimension graphs

    Page(s): 65 - 74
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB) |  | HTML iconHTML  

    The recent proliferation of XML-based standards and technologies for managing data on the Web demonstrates the need for effective and efficient management of tree-structured data. Querying tree-structured data is a challenging issue due to the diversity of the structural aspect in the same or in different trees. In this paper, we show how to evaluate queries on tree-structured data, called value trees. The formulation of these queries does not depend on the structure of a particular value tree. Our approach exploits semantic information provided by dimension graphs. Dimension graphs are semantically rich constructs that abstract the structural information of the value trees. We show how dimension graphs can be used to query efficiently value trees in the presence of structural differences and irregularities. Value trees and their dimension graphs are represented as XML documents. We present a method for transforming queries to XPath expressions to be evaluated on the XML documents. We also provide conditions for identifying strongly and weakly unsatisfiable queries. Finally, we conducted various experiments to compare our method for evaluating queries with one that does not exploit dimension graphs. Our results demonstrate the superiority of our approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data updating between the operational and analytical databases through dw-log algorithm

    Page(s): 77 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (208 KB) |  | HTML iconHTML  

    Data warehouse systems (DWS) make use of storage techniques for efficient end user accessing and query facilities. DWS applications have implemented classic data synchronism operations that do not support an immediate data update. With the evolution of semantic data representation in the operational database environment, the accomplished analysis in DWS demands new synchronism ways. Hence, there is a growing interest in DWS that can rapidly absorb the operational database updates, without compromising the operational query processes. Our research aims at the characterization of the synchronous and asynchronous algorithms limits for data updating in a DWS. This research proposes another way for update propagations of asynchronous transactions in DWS, the dw-log algorithm. The dw-log algorithm implementation is supported by the process algebra approach. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Utilizing indexes for approximate and on-line nearest neighbor queries

    Page(s): 83 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (168 KB) |  | HTML iconHTML  

    We explore using index structures for effective approximate and on-line nearest neighbor queries. While many index structures have showed to suffer from the dimensionality curse, we believe that indexes can still be useful in providing quick approximate solutions to the nearest neighbor queries. Moreover, the information provided by the indexes can provide certain bounds that can be invaluable for on-line nearest neighbor queries. This paper explores the idea of applying current R-tree based indexes to approximate and on-line nearest neighbors with bounds. We experiment with various heuristics and compare the trade-off between accuracy and efficiency. Our results are compared to locality sensitive hashing (LSH) and they show the effectiveness of the proposed scheme. We also provide guidelines on how this can be useful in a practical sense. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An incremental clustering scheme for duplicate detection in large databases

    Page(s): 89 - 95
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB) |  | HTML iconHTML  

    We propose an incremental algorithm for clustering duplicate tuples in large databases, which allows to assign any new tuple t to the cluster containing the database tuples which are most similar to t (and hence are likely to refer to the same real-world entity t is associated with). The core of the approach is a hash-based indexing technique that tends to assign highly similar objects to the same buckets. Empirical evaluation proves that the proposed method allows to gain considerable efficiency improvement over a state-of-art index structure for proximity searches in metric spaces. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • XML and relational data: towards a common model and algebra

    Page(s): 96 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    In this paper we present a model for the management of relational, XML, and mixed data. The main high-level approaches to manipulate XML, i.e., SQL/XML, XQuery, and object/relational XML columns, can all be based on our common model and algebra. Our query algebra, yet very simple, can represent queries not expressible by other proposals and by the current implementation of TAX. Moreover, we show that relational-like logical query rewriting can be extended to our algebraic expressions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatically maintaining wrappers for Web sources

    Page(s): 105 - 114
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB) |  | HTML iconHTML  

    A substantial subset of the Web data follows some kind of underlying structure. Nevertheless, HTML does not contain any schema or semantic information about the data it represents. A program able to provide software applications with a structured view of those semi-structured Web sources is usually called a wrapper. Wrappers are able to accept a query against the source and return a set of structured results, thus enabling applications to access Web data in a similar manner to that of information from databases. A significant problem in this approach arises because Web sources may experiment changes that invalidate the current wrappers. In this paper, we present novel heuristics and algorithms to address this problem. Our approach is based on collecting some query results during wrapper operation. Then, when the source changes, they are used to generate a set of labeled examples that are then provided as input to a wrapper induction algorithm able to regenerate the wrapper. We have tested our methods in several real-world Web data extraction domains, obtaining high accuracy in all the steps of the process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semi-structured data management in the enterprise: a nimble, high-throughput, and scalable approach

    Page(s): 115 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (808 KB) |  | HTML iconHTML  

    In this paper we describe an approach and system for managing enterprise semi-structured data that is high-throughput, nimble, and scalable. We present the NETMARK system, which provides for a "schemaless" way of managing semi-structured documents. We describe in particular detail the unique underlying data storage approach and efficient query processing mechanisms given this storage system. We present an extensive benchmark evaluation of the NETMARK system and also compare it with related XML management systems. At the heart of the approach is the philosophy of a focus on most common data management requirements in the enterprise, and not burdening users and application developers with unnecessary complexity and formal schemas. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pattern-based information integration in dynamic environments

    Page(s): 125 - 134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB) |  | HTML iconHTML  

    The convenient availability of information is an essential factor in science and business. While Internet technology has made large amounts of data available to the general public, the data is largely provided in human-readable format only. New technologies are now making direct access to millions of structured or semi-structured databases possible, but only through integration of these data sources maximum benefit can be gained. Traditional approaches to information integration, which involve human development teams and work in a controlled environment with a stable set of data sources, are not applicable due to the dynamic nature of such an environment. Therefore a higher degree of automation of this process is required. We present the PALADIN project (Pattern-based Architecture for LArge-scale Dynamic INformation integration), that uses machine-understandable patterns to capture and apply expert experience in the integration planning process. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Agents and databases: friends or foes?

    Page(s): 137 - 147
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB) |  | HTML iconHTML  

    On first glance agent technology seems more like a hostile intruder into the database world. On the other hand, the two could easily complement each other, since agents carry out information processes whereas databases supply information to processes. Nonetheless, to view agent technology from a database perspective seems to question some of the basic paradigms of database technology, particularly the premise of semantic consistency of a database. The paper argues that the ensuing uncertainty in distributed databases can be modelled by beliefs, and develops the basic concepts for adjusting peer-to-peer databases to the individual beliefs in single nodes and collective beliefs in the entire distributed database. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • RelaXML: bidirectional transfer between relational and XML data

    Page(s): 151 - 162
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB) |  | HTML iconHTML  

    In modern enterprises, almost all data is stored in relational databases. Additionally, most enterprises increasingly collaborate with other enterprises in long-running read-write workflows, primarily through XML-based data exchange technologies such as Web services. However, bidirectional XML data exchange is cumbersome and must often be hand-coded, at considerable expense. This paper remedies the situation by proposing RELAXML, an automatic and effective approach to bidirectional XML-based exchange of relational data. RELAXML supports re-use through multiple inheritance, and handles both export of relational data to XML documents and (re-)import of XML documents with a large degree of flexibility in terms of the SQL statements and XML document structures supported. Import and export are formally defined so as to avoid semantic problems, and algorithms to implement both are given. A performance study shows that the approach has a reasonable overhead compared to hand-coded programs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rewriting-based optimization for XQuery transformational queries

    Page(s): 163 - 174
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (152 KB) |  | HTML iconHTML  

    The modern XML query language called XQuery includes advanced facilities both to query and to transform XML data. An XQuery query optimizer should be able to optimize any query. For "querying" queries almost all techniques inherited from SQL-oriented DBMS may be applied. The XQuery transformation facilities are XML-specific and have no counterparts in other query languages. That is why XQuery transformational queries need to be optimized with novel techniques. In this paper two kinds of such techniques (namely push predicates down XML element constructors and projection of transformation) are considered. A subset of XQuery for which these techniques can be fully implemented is identified. This subset seems to be the most interesting from the practical viewpoint. Rewriting rules for this subset are proposed and the correctness of these rules is formally justified. For the rest of the language we propose solutions that work for the most of common cases or consider the problems we have encountered. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.