By Topic

Database Engineering and Applications Symposium, 2003. Proceedings. Seventh International

Date 18-18 July 2003

Filter Results

Displaying Results 1 - 25 of 51
  • Proceedings International Database Engineering and Applications Symposium

    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    The following topics are dealt with: database engineering; database applications; data modeling, database modeling; data mining; XML data management; advanced database techniques; imprecise and temporal databases; XML data querying; Web issues; database reliability; database stability; and database security. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Page(s): 399 - 400
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE
  • Modeling and efficient mining of intentional knowledge of outliers

    Page(s): 44 - 53
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (418 KB) |  | HTML iconHTML  

    In this paper, we study in a general setting the notion of outliered patterns as intentional knowledge of outliers and algorithms to mine those patterns. Our contributions consist of a model for defining outliered patterns with the help of categorical and behavioral similarities of outliers, and efficient algorithms for mining knowledge sets of distance-based outliers and outliered patterns. Our algorithms require only very limited domain knowledge, and no classified information. We also present an empirical study to show the feasibility of our algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic tuning of XML storage schema in VXMLR

    Page(s): 76 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB) |  | HTML iconHTML  

    This paper reports the techniques of dynamic tuning of XML storage schema in VXMLR, which is a XML management system based on RDBMS. With two different tuning strategies, VXMLR can dynamically adjust its storage schema based on the latest query records to improve its query processing efficiency. When a tuning event is triggered, VXMLR first derives from its history queries the initial mapping rules that map XML DTD to relational schemas; then by vertically partitioning the relational tables or redundantly storing the data relevant to history queries, some candidate storage schemas are generated; following that, the benefit and cost of each candidate schema is estimated; and finally a cost-driven approach is proposed to select the final storage schema from the candidate schemas under a certain space constraint. Experimental results validate the practicability and effectiveness of the proposed techniques. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extending XML-RL with update

    Page(s): 66 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (273 KB) |  | HTML iconHTML  

    With the extensive use of XML in applications over the Web, how to update XML data is becoming an important issue because the role of XML has expanded beyond traditional applications, in which XML is used as a mean for data representation and exchange on the Web. This paper presents a novel declarative XML update language, which is an extension of the XML-RL query language. Compared with other existing XML update languages, it has the following features. First, it is the only XML data manipulation language based on a higher data model. All of the other update languages adopt so-called graph-based or tree-based data models. Therefore, update requests can be expressed in a more intuitive and natural way in our language than in the other languages. Second, our language is designed to deal with ordered and unordered data. Some of the existing languages cannot handle the order of documents. Third, our language can express complex update requests at multiple level in a hierarchy in a simple and fast way. Some existing languages have to express such complex requests in nested updates, which is too complicated and nonintuitive to comprehend for end users. Fourth, our language directly supports the functionality of updating complex objects while all other update language do not support these operations. Lastly, most of existing languages use rename to modify attribute and element names, which is a different way from updates on value. Our language modifies tag names, values, and objects in a unified way by the introduction of three kinds of logical binding variables: object variables, value variables, and name variables. The powerful ability of our language is shown by various examples. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query translation from XSLT to SQL

    Page(s): 87 - 96
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (420 KB) |  | HTML iconHTML  

    XML has been accepted as a universal format for data interchange and publication. It can be applied in the applications in which the data of a database needs to be viewed in XML format so that the data being viewed takes more semantics and is easily understood. In these applications, the user of the data to be viewed sees only XML data, not the database. He may use XML query languages such as XSLT to query data and the retrieved data is presented in XML format to them. We are interested in the connection between the data that the user sees and the data in the database. More specifically, we are interested in translating XSLT queries to SQL queries. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing views for light-weight Web ontologies

    Page(s): 160 - 169
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (361 KB) |  | HTML iconHTML  

    The Semantic Web aims at easy integration and usage of content by building on a semi-structured data model where data semantics are explicitly specified through ontologies. The use of ontologies in real-world applications such as community portals has shown that a new level of data independence is required for ontology-based applications. For example, the customization of information towards the needs of specific user communities is often needed. This paper extends previous work (2003) on this issue and presents a view language for the fundamental data models of the Semantic Web, viz. RDF and RDFS, and how it can be implemented. The basic novelty of the view language is the semantically appropriate classification of views into inheritance taxonomies based on query semantics. Additionally, the underlying distinction between unary predicates (classes) and binary predicates (properties) taken in RDF/S is maintained in the view language. So-called external ontologies allow the integration of multiple source databases, offer control over the publishing of data and enable the generation of views spanning across databases. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Enhancements on local outlier detection

    Page(s): 298 - 307
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (596 KB) |  | HTML iconHTML  

    Outliers, commonly referred to as exceptional cases, exist in many real-world databases. Detection of such outliers is important for many applications. In this paper, we focus on the density-based notion that discovers local outliers by means of the local outlier factor (LOF) formulation. Three enhancement schemes over LOF are introduced, namely LOF' and LOF" and GridLOF. Thorough explanation and analysis is given to demonstrate the abilities of LOF' in providing simpler and more intuitive meaning of local outlier-ness; LOF" in handling cases where LOF fails to work appropriately; and GridLOF in improving the efficiency and accuracy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A CBIR-framework: using both syntactical and semantical information for image description

    Page(s): 385 - 390
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (349 KB) |  | HTML iconHTML  

    Content-based image retrieval systems can use classification or indexing based on syntactical and/or semantic features of images. We aim at providing a framework, which can be instantiated for each specific application: a framework, which combines syntactical and semantic information for image description. We believe that a model, which integrates syntactical and semantic descriptions, together with its similarity measure between images, is the core of such a framework. In this paper, we propose an integrated model with two example applications on which expressiveness of our model have been tested. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Preferred repairs for inconsistent databases

    Page(s): 202 - 211
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (527 KB) |  | HTML iconHTML  

    The objective of this paper is to investigate the problems related to the extensional integration of information sources. In particular, we propose an approach for managing inconsistent databases, i.e. databases violating integrity constraints. The presence of inconsistent data can be resolved by "repairing" the database, i.e. by providing a computational mechanism that ensures obtaining consistent "scenarios" of the information or by consistently answering to queries posed on an inconsistent set of data. In this paper we consider preferences among repairs and possible answers by introducing a partial order among them on the base of some preference criteria. More specifically, preferences are expressed by considering polynomial functions applied to repairs and returning real numbers. The goodness of a repair is measured by estimating how much it violates the desiderata conditions and a repair is preferred if it minimizes the value of the polynomial function used to express the preference criteria. The main contribution of this work consists in the proposal of a logic approach for querying and repairing inconsistent databases that extends previous works by allowing to express and manage preference criteria. The approach here proposed allows to express reliability on the information sources and is also suitable for expressing decision and optimization problems. The introduction of preference criteria strongly reduces the number of feasible repairs and answers; for special classes of constraints and functions it gives a unique repair and answer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficiently mining maximal frequent sets for discovering association rules

    Page(s): 104 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (242 KB) |  | HTML iconHTML  

    We present Metamorphosis, an algorithm for mining maximal frequent sets (MFS) using data transformations. Metamorphosis efficiently transforms the dataset to maximum collapsible and compressible (MC2) format and employs a top down strategy with phased bottom up search for mining MFS. Using the chess and connect dataset [benchmark datasets created by Univ. of California, Irvine], we demonstrate that our algorithm offers better performance in mining MFS compared to dGenMax (an algorithm that offers better performance compared to other known algorithms) at higher support levels. Furthermore, we evaluate our algorithm for mining Top-K maximal frequent sets in chess and connect datasets. Our algorithm is especially efficient when the maximal frequent sets are longer. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High availability solutions for transactional database systems

    Page(s): 347 - 355
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB)  

    In our increasingly wired world, there is a stringent need for the IT community to provide uninterrupted services of networks, servers and databases. Considerable efforts, both by the industrial and academic community have been directed to this end. In this paper, we examine the requirements for high availability, measures used to express it, and approaches used to implement this for databases. We present a high availability solution, using off the shelf hardware and software components, for transactions based applications and give our experience with this system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Frequent itemsets mining for database auto-administration

    Page(s): 98 - 103
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (243 KB) |  | HTML iconHTML  

    With the wide development of databases in general and data warehouses in particular, it is important to reduce the tasks that a database administrator must perform manually. The aim of auto-administrative systems is to administrate and adapt themselves automatically without loss (or even with a gain) in performance. The idea of using data mining techniques to extract useful knowledge for administration from the data themselves has existed for some years. However, little research has been achieved. This idea nevertheless remains a very promising approach, notably in the field of data warehousing, where queries are very heterogeneous and cannot be interpreted easily. The aim of this study is to search for a way of extracting useful knowledge from stored data themselves to automatically apply performance optimization techniques, and more particularly indexing techniques. We have designed a tool that extracts frequent itemsets from a given workload to compute an index configuration that helps optimizing data access time. The experiments we performed showed that the index configurations generated by our tool allowed performance gains of 15% to 25% on a test database and a test data warehouse. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Database damage assessment using a matrix based approach: an intrusion response system

    Page(s): 336 - 341
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (235 KB) |  | HTML iconHTML  

    When an attacker or a malicious user updates a database, the resulting damage can spread to other parts of the database through valid users. A fast and accurate damage assessment must be performed as soon as such an attack is detected. In this paper, we have discussed two approaches for damage assessment in an affected database. While the first one uses transaction dependency relationships to determine affected transactions, the second approach considers data dependency relationships to identify affected data items for future recovery. These relationships are stored in a matrix format for faster manipulation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A portable interoperation module for workflow system

    Page(s): 403 - 406
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (203 KB) |  | HTML iconHTML  

    The interoperation between workflow management systems in different organization became indispensable. If the interfaces between the two have standardized specification, it will be easy to add module to workflow system. Therefore, we suggest a workflow engine independent interoperability module for workflow system using workflow interface 2. This approach will provide the portability with the interoperation support module. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automated EJB client code generation using database query rewriting

    Page(s): 308 - 317
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (343 KB) |  | HTML iconHTML  

    Enterprise JavaBean (hereafter EJB) technology has been widely adopted in software industry to develop Web information systems. However, most of EJB applications are reengineered from legacy database applications. This means that legacy SQL statements need to be translated into EJB client code. Since many methods in Enterprise Beans can be regarded as view definitions of the underlying database, the EJB client code generation can be mapped to the problem of query rewriting using views. This paper addresses the automatic generation of EJB client code using query rewriting. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effective schema-based XML query optimization techniques

    Page(s): 230 - 235
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (367 KB) |  | HTML iconHTML  

    Use of path expressions is a common feature in most XML query languages, and many evaluation methods for path expression queries have been proposed recently. However, there are few researches on the issue of optimizing regular path expression queries. In this paper, two kinds of path expression optimization principles are proposed, named path shortening and path complementing, respectively. The path shortening principle reduces the querying cost by shortening the path expressions with the knowledge of XML schema. While the path complementing principle substitutes the user queries with the equivalent lower-cost path expressions. The experimental results show that these two techniques can largely improve the performance of path expression query processing. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A stream segregation algorithm for polyphonic music databases

    Page(s): 130 - 138
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (359 KB) |  | HTML iconHTML  

    Most of the existing algorithms for music information retrieval are based on string matching. However, some searching results are perceptually insignificant in the sense that they cannot really be heard, owing to negligence of how people perceive music. When listening to music, it is perceived in groupings of musical notes called streams. Stream-crossing musical patterns are perceptually insignificant and should be pruned out from the final results. Stream segregation should be added as a pre-processing or post-processing step in existing retrieval systems in order to improve the quality of retrieval results. The key ideas are: (a) representation of music in the form of events, (b) formulation of the inter-event and the intercluster distance functions based on the findings in auditory psychology, and (c) application of the distance functions in the adapted single-link clustering algorithm without input of number of clusters. Experiments are performed on real music data to verify our proposed method. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation issues of Bio-AXS: an object-oriented framework for integrating biological data and applications

    Page(s): 409 - 413
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (810 KB) |  | HTML iconHTML  

    Bio-AXS is an object-oriented framework tool that aims at integrating genomic databases as well as related applications. This approach provides the expected flexibility, reusability and extensibility requirements of this domain. We present here an overview of Bio-AXS implementation issues that show how this tool may be effectively used in practice. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A model for schema integration in heterogeneous databases

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (722 KB)  

    Schema integration is the process by which schemata from heterogeneous databases are conceptually integrated into a single cohesive schema. In this work we propose a modeling framework for schema integration, capturing the inherent uncertainty accompanying the integration process. The model utilizes a fuzzy framework to express a confidence measure, associated with the outcome of a schema integration process. In this paper we provide a systematic analysis of the process properties and establish a criterion for evaluating the quality of matching algorithms, which map attributes among heterogeneous schemata. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms for balancing privacy and knowledge discovery in association rule mining

    Page(s): 54 - 63
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (327 KB) |  | HTML iconHTML  

    The discovery of association rules from large databases has proven beneficial for companies since such rules can be very effective in revealing actionable knowledge that leads to strategic decisions. In tandem with this benefit, association rule mining can also pose a threat to privacy protection. The main problem is that from non-sensitive information or unclassified data, one is able to infer sensitive information, including personal information, facts, or even patterns that are not supposed to be disclosed. This scenario reveals a pressing need for techniques that ensure privacy protection, while facilitating proper information accuracy and mining. In this paper, we introduce new algorithms for balancing privacy and knowledge discovery in association rule mining. We show that our algorithms require only two scans, regardless of the database size and the number of restrictive association rules that must be protected. Our performance study compares the effectiveness and scalability of the proposed algorithms and analyzes the fraction of association rules, which are preserved after sanitizing a database. We also report the main results of our performance evaluation and discuss some open research issues. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An empirical study of commutativity in application code

    Page(s): 361 - 369
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (234 KB) |  | HTML iconHTML  

    A typical object database manages concurrency control by instance locking, based on the identification of instance operations as "read" or "write". An alternative theory shows that additional concurrency can be obtained based on operation commutativity. Under commutativity theory, activities can be allowed concurrently as long as they commute, that is, the effect is the same in either order. In this paper, we study an extensive commercial application from a telecommunications domain, and determine how much concurrency is actually present for commutativity theory to use. Our study extends to identify not only the operations that commute, but the reasons for their commutativity as well. We separated the commutative operations into three categories: those that commute because both are read operations, those that commute because different fields are accessed, and those that commute for semantic reasons. By doing this in our analysis we were able to show a comparison in concurrency potential between commutative locking and the two other common locking protocols in existence: instance locking and attribute locking. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient subsequence matching for sequences databases under time warping

    Page(s): 139 - 148
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (341 KB) |  | HTML iconHTML  

    It has been found that the technique of searching for similar patterns among time series data is very important in a wide range of scientific and business applications. Most of the research works use Euclidean distance as their similarity metric. However, dynamic time warping (DTW) is a more robust distance measure than Euclidean distance in many situations, where sequences may have different lengths or have patterns which are out of phase in the time axis. Unfortunately, DTW does not satisfy the triangle inequality, so spatial indexing techniques cannot be applied. In this paper, we present a method that supports dynamic time warping for subsequence matching within a collection of sequences. Our method takes full advantage of the "sliding window" approach and can handle queries of arbitrary length. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Refining Web authoritative resource by frequent structures

    Page(s): 250 - 255
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (337 KB) |  | HTML iconHTML  

    The Web resource is a rich collection of the dynamic information, which is useful in various disciplines. There has also been much research work related to improving the quality of information searching in the Web. However, most of the work is still inadequate to satisfy a diversified demand from users. In this paper, we exploit the hyperlinks in the Web and propose a new approach called SFP in order to improve the quality of research results obtain from search engines. The SFP algorithm evolves from the frequent pattern mining technique, which is a common data mining technique for conventional databases. The essential idea of our approach is to mine the frequent structures of links from a given Web topology. By using the SFP algorithm, we extract the authoritative pages and communities from the complex Web topology. We demonstrate our approach by running several experiments and show that the performance and functionalities of using the SFP in managing search results are better than other known methods such as HITS. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Persistent applications via automatic recovery

    Page(s): 258 - 267
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (291 KB) |  | HTML iconHTML  

    Building highly available enterprise applications using Web-oriented middleware is hard. Runtime implementations frequently do not address the problems of application state persistence and fault-tolerance, placing the burden of managing session state and, in particular, handling system failures on application programmers. This paper describes Phoenix/APP, a runtime service based on the notion of recovery guarantees. Phoenix/APP transparently masks failures and automatically recovers component-based applications. This both increases application availability and simplifies application development. We demonstrate the feasibility of this approach by describing the design and implementation of Phoenix/APP in Microsoft's .NET runtime and present results on the cost of persisting and recovering component-based applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.