By Topic

Database Engineering and Applications, 2001 International Symposium on.

Date 16-18 July 2001

Filter Results

Displaying Results 1 - 25 of 41
  • Proceedings 2001 International Database Engineering and Applications Symposium

    Save to Project icon | Request Permissions | PDF file iconPDF (248 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 367 - 368
    Save to Project icon | Request Permissions | PDF file iconPDF (84 KB)  
    Freely Available from IEEE
  • Xyleme, a dynamic warehouse for XML data of the Web

    Page(s): 3 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    The current development of the Web and the generalization of XML technology provide a major opportunity which can radically change the face of the Web. Xyleme intends to be a leader of this revolution by providing database services over the XML data of the Web. Originally, Xyleme was a research project functioning as an open, loosely coupled network of researchers. At the end of 2000, a prototype had been implemented. A start-up company, also called Xyleme, is now turning into a product. The authors summarize the main research efforts of the Xyleme team. They concern: a scalable architecture; the efficient storage of huge quantities of XML data (hundreds of millions of pages); XML query processing with full-text and structural indexing; data acquisition strategies to build the repository and keep it up-to-date; change control with services such as query subscription; and semantic data integration to free users from having to deal with many specific DTDs when expressing queries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Interactive ROLAP on large datasets: a case study with UB-trees

    Page(s): 167 - 176
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (792 KB) |  | HTML iconHTML  

    Online analytical processing (OLAP) requires query response times within the range of a few seconds in order to allow for interactive drilling, slicing, or dicing through an OLAP cube. While small OLAP applications use multidimensional database systems, large OLAP applications like the SAP BW rely on relational (ROLAP) databases for efficient data storage and retrieval. ROLAP databases use specialized data models like star or snowflake schemata for data storage and create a large set of indexes or materialized views in order to answer queries efficiently. In our case study, we show the performance benefits of TransBase HyperCube, a commercial RDBMS, whose kernel fully integrates the UB-Tree, a multi-dimensional extension of the B-Tree. With this newly developed access structure, TransBase HyperCube enables interactive OLAP without the need for storing a large set of materialized views or creating a large set of indexes. We compare not only the query performance, but also consider index size and maintenance costs. For the case study we use a 42 million record ROLAP database of GfK, the largest German market research company View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting upper and lower bounds in top-down query optimization

    Page(s): 20 - 33
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1128 KB) |  | HTML iconHTML  

    System R's bottom-up query optimizer architecture forms the basis of most current commercial database managers. The paper compares the performance of top-down and bottom-up optimizers, using the measure of the number of plans generated during optimization. Top down optimizers are superior according to this measure because they can use upper and lower bounds to avoid generating groups of plans. Early during the optimization of a query, a top-down optimizer can derive upper bounds for the costs of the plans it generates. These bounds are not available to typical bottom-up optimizers since such optimizers generate and cost all subplans before considering larger containing plans. These upper bounds can be combined with lower bounds, based solely on logical properties of groups of logically equivalent subqueries, to eliminate entire groups of plans from consideration. We have implemented such a search strategy, in a top-down optimizer called Columbia. Our performance results show that the use of these bounds is quite effective, while preserving the optimality of the resulting plans. In many circumstances this new search strategy is even more effective than heuristics such as considering only left deep plans View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query scheduling in multi query optimization

    Page(s): 11 - 19
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (672 KB) |  | HTML iconHTML  

    Complex queries are becoming commonplace, with the growing use of decision support systems. Decision support queries often have a lot of common sub-expressions within each query, and queries are often run as a batch. Multi query optimization aims at exploiting common sub-expressions, to reduce the evaluation cost of queries, by computing them once and then caching them for future use, both within individual queries and across queries in a batch. In case cache space is limited, the total size of sub-expressions that are worth caching may exceed available cache space. Prior work in multi query optimization involves choosing a set of common sub-expressions that fit in available cache space, and once computed, retaining their results across the execution of all queries in a batch. Such optimization algorithms do not consider the possibility of dynamically changing the cache contents. This may lead to sub-expressions occupying cache space even if they are not used by subsequent queries. The available cache space can be best utilized by evaluating the queries in an appropriate order and changing the cache contents as queries are executed. We present several algorithms that consider these factors, in order to reduce the cost of query evaluation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploitation of pre-sortedness for sorting in query processing: the TempTris-algorithm for UB-trees

    Page(s): 155 - 166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1044 KB) |  | HTML iconHTML  

    Bulk loading is used to efficiently build a table or access structure if a large data set is available at index time, e.g., the spool process of a data warehouse or the creation of intermediate results during query processing. The authors introduce the TempTris algorithm that creates a multidimensional partitioning from a one-dimensionally sorted stream of tuples. In order to achieve that, TempTris exploits the fact that a one-dimensional order can be used as a partial multidimensional order for the creation of a multidimensional partitioning. In this way, TempTris avoids external sorting for the creation of a multidimensional index. In combination with the Tetris sort algorithm, TempTris can be used to create intermediate query processing results that can (without external sorting), be reused to generate various sort orders. As an example of this new processing technique we propose an efficient algorithm for computing an aggregation lattice. Thus, TempTris can also be used to speed up the processing of CUBE operators that frequently occur in OLAP applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework for understanding existing databases

    Page(s): 330 - 336
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (516 KB) |  | HTML iconHTML  

    The authors propose a framework for a broad class of data mining algorithms for understanding existing databases: functional and approximate dependency inference, minimal key inference, example relation generation and normal form tests. We point out that the common data centric step of these algorithms is the discovery of agree sets. A set-oriented approach for discovering agree sets from database relations based on SQL queries is proposed. Experiments have been performed in order to compare the proposed approach with a data mining approach. We also present a novel way to extract approximate functional dependencies having minimal errors from agree sets View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metadata management for data warehousing: between vision and reality

    Page(s): 129 - 135
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (620 KB) |  | HTML iconHTML  

    Capturing, representing and processing metadata promises to facilitate the management, consistent use and understanding of data and thus better support the exploitation of masses of information that is available online today. Despite the increasing interest in metadata management, its purpose, requirements and problems are still not clear. This is particularly true in the area of data warehousing. The reasons are multiple. Compared to the past, today's metadata management considers a significantly larger spectrum of information (including even certain pieces of programs). Moreover, metadata are produced by various tools and reside in different sources which need to be integrated in order to ensure consistency and provide uniform access, impact analysis and data tracking. Existing work has only partially covered some of these aspects. The paper summarizes the most important issues of metadata management for data warehousing, including the role of metadata and solved and unsolved problems of the available solutions. The design of an appropriate information model, metadata integration and advanced user interaction facilities are crucial questions to be answered View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrity constraint management for design object versions in a concurrent engineering design environment

    Page(s): 255 - 261
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (636 KB) |  | HTML iconHTML  

    Concurrent engineering (CE) is a product development approach which requires a collaborative multidisciplinary team environment with effective communication. Due to the evolutionary nature of the engineering design process, it is necessary to provide support for version management. The version model should be provided with an integrity checking mechanism to assure the consistency of object versions with the design constraints. Managing integrity constraints is complex due to their evolving nature and raises a number of issues that need to be dealt with. The authors present a conceptual framework for providing an integrity mechanism for object versions. The framework is based on the concept of the Constraint Version Object (CVO) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Version propagation in federated database systems

    Page(s): 189 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1036 KB) |  | HTML iconHTML  

    Integrated engineering environments, based on federated database technology, are, among others, a means to control the integrity of and dependencies between product data created in many different engineering applications. Most of these applications support the management of versions of a product and its parts, continuing the engineers' tradition of keeping different versions of drawings and documents. Consequently, federations in engineering environments have to provide version management on their global layer as well. The paper discusses the concepts for a flexible and customisable realisation of the federated database management system's version propagation service. This service is responsible for making a new local version “visible” at the global layer of the federation and vice versa. It tries to identify properties like a new version's history and predecessor automatically. We show how these properties can (and sometimes must) be completed or updated if they are incomplete, inapplicable, or even contradictory. We conclude that the customisation overhead is inevitable for any general solution bridging highly heterogeneous versioning models View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic integration of XML heterogeneous data sources

    Page(s): 199 - 208
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (812 KB) |  | HTML iconHTML  

    With the current explosion of data, retrieving and integrating information from various sources is a critical problem. The designer has to specify a mediated schema providing a homogeneous view of the sources. We report on an initial work toward automatically generating mappings between elements in the sources and the mediated schema. Information sources we are interested in are XML documents with respect to a document type definition (DTD). We describe the Xyleme project and present our approach implemented in the SAMAG system to automatically find mappings on the basis of semantic and structural criteria. Finally, we report the first results of an experiment where SAMAG has been applied to XML documents in the cultural domain View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtual integration of temporal and conflicting information

    Page(s): 243 - 248
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (520 KB) |  | HTML iconHTML  

    The paper presents a way of integrating conflicting temporal information from multiple information providers considering a property-based resolution. The properties considered in the paper are time and uncertainty because of conflicting information providers. The property based resolution requires a flexible query mechanism, where answers are considered as bounds, taking into account the tendency of things to occur and also the might happen ability of things. Finally, some attention is paid to a database environment with non-static members View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An adaptive and efficient clustering-based approach for content-based image retrieval in image databases

    Page(s): 356 - 365
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (956 KB) |  | HTML iconHTML  

    The authors present a novel content based image retrieval (CBIR) approach, for image databases, based on cluster analysis. CBIR relies on the representation (metadata) of images' visual content. In order to produce such metadata, we propose an efficient and adaptive clustering algorithm to segment the images into regions of high similarity. This approach contrasts with those that use a single color histogram for the whole image (global methods), or local color histograms for a fixed number of image cells (partition based methods). Our experimental results show that our clustering approach offers high retrieval effectiveness with low space overhead. For example, using a database of 20000 images, we obtained higher retrieval effectiveness than partition based methods with about the same space overhead of global methods, which are typically regarded as storage-wise compact View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving the processing of decision support queries: the case for a DSS optimizer

    Page(s): 177 - 186
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1016 KB) |  | HTML iconHTML  

    Many decision support applications are built upon data mining and OLAP tools and allow users to answer information requests based on a data warehouse that is managed by a powerful DBMS. We focus on tools that generate sequences of SQL statements in order to produce the requested information. Our thorough analysis revealed that many sequences of queries that are generated by commercial tools are not very efficient. An optimized system architecture is suggested for these applications. The main component is a DSS optimizer that accepts previously generated sequences of queries and remodels them according to a set of optimization strategies, before they are executed by the underlying database system. The advantages of this extended architecture are discussed and a couple of appropriate optimization strategies are identified. Experimental results are given, showing that these strategies are appropriate to optimize query sequences of OLAP applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Web document searching using enhanced hyperlink semantics based on XML

    Page(s): 34 - 43
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1048 KB) |  | HTML iconHTML  

    We present a system that aims at increasing the flexibility and accuracy of information retrieval tasks in the World Wide Web. The system offers extended searching capabilities by enriching information related to hyperlinks between documents. It offers to document authors the ability to attach additional information to hyperlinks and also provides suggestions on the information to be attached. In an effort to increase the integrity of hyperlink information, a conversion module extracts, from the pages, metadata concerning the linked documents as well as the link itself. The hyperlink metadata is appended to the original document metadata and an XML document is created. Another module allows the end users to query the XML-document base, taking advantage of the enhanced hyperlink information. We present an overview of the system and the solutions it provides in problems found to similar approaches View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling and management of spatio-temporal objects within temporal GIS application framework

    Page(s): 249 - 254
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (552 KB) |  | HTML iconHTML  

    Modeling and management of spatiotemporal-thematic data within an object oriented GIS application framework, are presented. The object oriented modeling concepts have been applied in the integration of spatial, thematic and temporal geographic information in the conceptual spatio-temporal object model presented using standard UML class diagram notation. By its implementation in the object oriented application domain and (object-) relational database domain, a spatio-temporal object database kernel has been developed as a spatio-temporal object storage manager. Based on the spatio-temporal object database kernel and the development of appropriate components on top of it for temporal GIS application functionality, a temporal GIS application framework has been developed. A description of its architecture and functional components dedicated to management of temporal aspects of geographic information is given. Desktop- and Internet-based temporal GIS frameworks are further refined from a generic framework View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • XPathLog: a declarative, native XML data manipulation language

    Page(s): 123 - 128
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (492 KB) |  | HTML iconHTML  

    XPathLog is a logic-based language for manipulating and integrating XML data. It extends the XPath query language with Prolog-style variables. Due to the close relationship with XPath, the semantics of rules is easy to grasp. In contrast to other approaches, the XPath syntax and semantics is also used for a declarative specification how the database should be updated: when used in rule heads, XPath filters are interpreted as specifications of elements and properties which should be added to the database. The formal semantics is defined wrt. a graph Herbrand structure which covers the XML tree data model. XPathLog has been implemented in LoPiX View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study on content-based classification and retrieval of audio database

    Page(s): 339 - 345
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (548 KB) |  | HTML iconHTML  

    Nowadays, available audio corpora are rapidly increasing from fast growing Internet and digitized libraries. How to effectively classify and retrieve such huge databases is a challenging task. Content based technology is studied to automatically classify audio into hierarchy classes. Based on a small set of features selected by the sequential forward selection (SFS) method from 87 extracted ones, four classifiers, namely nearest neighbor (NN), modified k-nearest neighbor (k-NN), Gaussian mixture model (GMM), and probabilistic neural network (PNN) are compared. Experiments were conducted on a common database and a more comprehensive database built by the authors. Finally, the PNN classifier combined with Euclidean distance measurement was chosen for audio retrieval, using query by example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An integrated graphical user interface for high performance distributed computing

    Page(s): 237 - 242
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (560 KB) |  | HTML iconHTML  

    It is very common that modern large-scale scientific applications employ multiple compute and storage resources in a heterogeneously distributed environment. Working effectively and efficiently in such an environment is one of the major concerns for designing meta-data management systems. The authors present an integrated graphical user interface (GUI) that makes the entire environment virtually an easy-to-use control platform for managing complex programs and their large datasets. To hide the I/O latency when the the user carries out interactive visualization, aggressive prefetching and caching techniques are employed in our GUI. The performance numbers show that the design of our Java GUI has achieved the goals of both high performance and ease-of-use View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data preparation framework based on a multidatabase language

    Page(s): 219 - 228
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (832 KB) |  | HTML iconHTML  

    Integration and analysis of data from different sources have to deal with several problems resulting from potential heterogeneities. The activities addressing these problems are called data preparation and are supported by various available tools. However, these tools process mostly in a batch-like manner, not supporting the iterative and explorative nature of the integration and analysis process. The authors present a framework for important data preparation tasks based on a multidatabase language. This language offers features for solving common integration and cleaning problems as part of query processing. Combining data preparation mechanisms and multidatabase query facilities permits applying and evaluating different integration and cleaning strategies without explicit loading and materialization of data. The paper introduces the language concepts and discusses their application for individual tasks of data preparation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and implementation of bitmap indices for scientific data

    Page(s): 47 - 57
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (792 KB) |  | HTML iconHTML  

    Bitmap indices are efficient multi-dimensional index data structures for handling complex adhoc queries in read-mostly environments. They have been implemented in several commercial database systems but are only well suited for discrete attribute values which are very common in typical business applications. However, many scientific applications usually operate on floating point numbers and cannot take advantage of the optimisation techniques offered by current database solutions. We thus present a novel algorithm called GenericRangeEval for processing one-sided range queries over floating point values. In addition, we present a cost model for predicting the performance of bitmap indices for high-dimensional search spaces. We verify our analytical results by a detailed experimental study, and show that the presented bitmap evaluation algorithm scales well also for high-dimensional search spaces requiring only a fairly small index. Because of its simple arithmetic structure, the cost model could easily be integrated into a query optimiser for deciding whether the current multi-dimensional query shall be answered by means of a bitmap index or better by sequentially scanning the data values, without using an index at all View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards an architecture for real-time decision support systems: challenges and solutions

    Page(s): 303 - 311
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (956 KB) |  | HTML iconHTML  

    In large enterprises, huge volumes of data are generated and consumed, and substantial fractions of the data change rapidly. Business managers need up-to-date information to make timely and sound business decisions. Unfortunately, conventional decision support systems do not provide the low latencies needed for decision making in this rapidly changing environment. The paper introduces the notion of real time decision support systems. It distills the requirements of such systems from two real-life IT outsourcing examples drawn from our extensive experience in developing and deploying such systems. We argue that real time decision support systems are complex because they must combine elements of several different types of technologies: enterprise integration real time systems, workflow systems, knowledge management, and data warehousing and data mining. We then describe an approach to addressing these challenges. The approach is based on the message brokering paradigm for enterprise integration, and combines this paradigm with workflow management, knowledge management, and dynamic data warehousing and analysis. We conclude with lessons learnt from building systems based on this architectural approach, and discuss some hard research problems that arise View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing inconsistency in integrating data from different sources

    Page(s): 209 - 218
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (740 KB) |  | HTML iconHTML  

    One of the main problems in integrating databases into a common repository is the possible inconsistency of the values stored in them, i.e., the very same term may have different values, due to misspelling, a permuted word order, spelling variants and so on. The authors present an automatic method for reducing inconsistency found in existing databases, and thus, improving data quality. All the values that refer to a same term are clustered by measuring their degree of similarity. The clustered values can be assigned to a common value that, in principle, could be substituted for the original values. We evaluate four different similarity measures for clustering with and without expansion of abbreviations. The method we propose may work well in practice but it is time-consuming. In order to reduce this problem, we remove stop words for speeding up the clustering View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolution of database technology: hyperdatabases

    Page(s): 139 - 141
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (228 KB) |  | HTML iconHTML  

    Our vision is that hyperdatabases become available that extend and evolve from database technology. Hyperdatabases move up to a higher level, closer to the applications. A hyperdatabase manages distributed objects and software components as well as workflows, in analogy to a database system that manages data and transactions. In short, hyperdatabases will provide “higher order data independence”, e.g., immunity of applications against changes in the implementation of components and workload transparency. Such an evolution of database technology should keep its pivotal role as infrastructure for application development for data-intensive, central and distributed application. The hyperdatabase concept abstracts from the host of current infrastructures and middleware technology View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.