By Topic

Proceedings 18th International Conference on Data Engineering

Feb. 26 2002-March 1 2002

Filter Results

Displaying Results 1 - 25 of 97
  • Proceedings 18th International Conference on Data Engineering

    Publication Year: 2002
    Request permission for commercial reuse | PDF file iconPDF (357 KB)
    Freely Available from IEEE
  • Bioinformatics databases 1 [Advanced Technology Seminar 4]

    Publication Year: 2002, Page(s): 649
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (261 KB) | HTML iconHTML

    SUmmary form only given. The tutorial is intended to introduce database folk to database issues which arise in bioinformatics, i.e., molecular biology, genetics, and biochemistry. We will commence with a very brief introduction to molecular biology and genetics and the requisite vocabulary. However, this is NOT intended to be a biology tutorial, so attendees would be well advised to read a biology... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2002, Page(s):733 - 735
    Request permission for commercial reuse | PDF file iconPDF (322 KB)
    Freely Available from IEEE
  • Lossy reduction for very high dimensional data

    Publication Year: 2002, Page(s):663 - 672
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (343 KB) | HTML iconHTML

    We consider the use of data reduction techniques for the problem of approximate query answering. We focus on applications for which accurate answers to selective queries are required, and for which the data are very high dimensional (having hundreds of attributes). We present a new data reduction method for this type of application, called the RS kernel. We demonstrate the effectiveness of this me... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data cleaning and XML: the DBLP experience

    Publication Year: 2002
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (237 KB) | HTML iconHTML

    With the increasing popularity of data-centric XML, data warehousing and mining applications are being developed for rapidly burgeoning XML data repositories. Data quality will no doubt be a critical factor for the success of such applications. Data cleaning, which refers to the processes used to improve data quality, has been well researched in the context of traditional databases. In earlier wor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multivariate time series prediction via temporal classification

    Publication Year: 2002
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (196 KB) | HTML iconHTML

    In this paper, we study a special form of time-series prediction, viz. the prediction of a dependent variable taking discrete values. Although in a real application this variable may take numeric values, the users are usually only interested in its value ranges, e.g. normal or abnormal, not its actual values. In this work, we extended two traditional classification techniques, namely the naive Bay... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query estimation by adaptive sampling

    Publication Year: 2002, Page(s):639 - 648
    Cited by:  Papers (4)  |  Patents (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (831 KB) | HTML iconHTML

    The ability to provide accurate and efficient result estimations of user queries is very important for the query optimizer in database systems. In this paper, we show that the traditional estimation techniques with data reduction points of view do not produce satisfiable estimation results if the query patterns are dynamically changing. We further show that to reduce query estimation error, instea... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework towards efficient and effective sequence clustering

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (196 KB) | HTML iconHTML

    Analyzing sequence data (particularly in categorical domains) has become increasingly important, partially due to the significant advances in biology and other fields. Examples of sequence data include DNA sequences, unfolded protein sequences, text documents, Web usage data, system traces, etc. Previous work on mining sequence data has mainly focused on frequent pattern discovery. In this project... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Geometric-similarity retrieval in large image bases

    Publication Year: 2002, Page(s):441 - 450
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (441 KB) | HTML iconHTML

    We propose a novel approach to shape-based image retrieval that builds upon a similarity criterion which is based on the average point set distance. Compared to traditional techniques, such as dimensionality reduction, our method exhibits better behavior in that it maintains the average topology of shapes independently of the number of points used to represent them and is more resilient to noise. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Processing reporting function views in a data warehouse environment

    Publication Year: 2002, Page(s):176 - 185
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (463 KB) | HTML iconHTML

    Reporting functions reflect a novel technique to formulate sequence-oriented queries in SQL. They extend the classical way of grouping and applying aggregation functions by additionally providing a column-based ordering, partitioning, and windowing mechanism. The application area of reporting functions ranges from simple ranking queries (TOP(n)-analyses) over cumulative (Year-To-Date-analyses) to ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Extensible and similarity-based grouping for data integration

    Publication Year: 2002
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (206 KB) | HTML iconHTML

    The general concept of grouping and aggregation appears to be a fitting paradigm for various issues in data integration, but in its common form of equality-based grouping, a number of problems remain unsolved. We propose a generic approach to user-defined grouping as part of a SQL extension, allowing for more complex functions, for instance integration of data mining algorithms. Furthermore, we di... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • OntoWebber: a novel approach for managing data on the Web

    Publication Year: 2002, Page(s):488 - 489
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB) | HTML iconHTML

    OntoWebber is a system for managing data on the Web with formally encoded semantics. It aims at solving the problems current technologies are confronted with, namely, the reusability of software components, flexibility in personalization, and ease of maintenance for data intensive Web sites. Based on a domain ontology and a site modeling ontology, site views on the underlying data can be construct... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • NeT and CoT: inferring XML schemas from relational world

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (213 KB) | HTML iconHTML

    Two conversion algorithms, called NeT and COT, to translate relational schemas to XML schemas using various semantic constraints are presented. We first present a language-independent formalism named XSchema so that our algorithms are able to generate output schema in various XML schema language proposals. The benefits of such a formalism are that it is both precise and concise. Based on the XSche... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving range query estimation on histograms

    Publication Year: 2002, Page(s):628 - 638
    Cited by:  Papers (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (441 KB) | HTML iconHTML

    Histograms are used to summarize the contents of relations for the estimation of query result sizes into a number of buckets. Several techniques (e.g., MaxDiff and V-Optimal) have been proposed in the past for determining bucket boundaries which provide better estimations. This paper proposes to use 32 bit information (4-level tree index) for each bucket for storing approximated cumulative frequen... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • BestPeer: a self-configurable peer-to-peer system

    Publication Year: 2002
    Cited by:  Papers (13)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (198 KB) | HTML iconHTML

    We present BestPeer, a prototype P2P system that we have implemented at the National University of Singapore. BestPeer is a generic P2P system designed to serve as a platform on which P2P applications can be developed easily and efficiently. The network consists of two types of entities: a large number of computers (nodes), and a relatively fewer number of location independent global name lookup (... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The ATLaS system and its powerful database language based on simple extensions of SQL

    Publication Year: 2002, Page(s):280 - 281
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (224 KB) | HTML iconHTML

    A lack of power and extensibility in their query languages has seriously limited the generality of DBMSs and hampered their ability to support new applications domains, such as data mining. In this paper, we solve this problem by stream-oriented aggregate functions and generalized table functions which are definable by users in the SQL language itself, rather than in an external programming langua... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Keyword searching and browsing in databases using BANKS

    Publication Year: 2002, Page(s):431 - 440
    Cited by:  Papers (143)  |  Patents (22)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (382 KB) | HTML iconHTML

    With the growth of the Web, there has been a rapid increase in the number of users who need to access online databases without having a detailed knowledge of the schema or of query languages; even relatively simple query languages designed for non-experts are too complicated for them. We describe BANKS, a system which enables keyword-based search on relational databases, together with data and sch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Indexing spatio-temporal data warehouses

    Publication Year: 2002, Page(s):166 - 175
    Cited by:  Papers (30)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (374 KB) | HTML iconHTML

    Spatio-temporal databases store information about the positions of individual objects over time. In many applications, however, such as traffic supervision or mobile communication systems, only summarized data, like the average number of cars in an area for a specific period, or the number of phones serviced by a cell each day, is required. Although this information can be obtained from operationa... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How good are association-rule mining algorithms?

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (202 KB) | HTML iconHTML

    Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficiently ordering query plans for data integration

    Publication Year: 2002, Page(s):393 - 402
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB) | HTML iconHTML

    The goal of a data integration system is to provide a uniform interface to a multitude of data sources. Given a user query formulated in this interface, the system translates it into a set of query plans. Each plan is a query formulated over the data sources, and specifies a way to access sources and combine data to answer the user query. In practice, when the number of sources is large, a data-in... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advanced process-based component integration in Telcordia's Cable OSS

    Publication Year: 2002, Page(s):485 - 487
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (282 KB) | HTML iconHTML

    Operation support systems (OSSs) integrate software components and network elements to automate the provisioning and monitoring of telecommunications services. This paper illustrates Telcordia's Cable OSS and shows how customers may use this OSS to provision IP and telephone services over the cable infrastructure. Telcordia's Cable OSS is a process-based application, i.e. a collection of flows, sp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mixing querying and navigation in MIX

    Publication Year: 2002, Page(s):245 - 254
    Cited by:  Papers (3)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (511 KB) | HTML iconHTML

    Web-based information systems provide to their users the ability to interleave querying and browsing during their information discovery efforts. The MIX system provides an API called QDOM (Querible Document Object Model) that supports the interleaved querying and browsing of virtual XML views, specified in an XQuery-like language. QDOM is based on the DOM standard. It allows the client application... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploiting local similarity for indexing paths in graph-structured data

    Publication Year: 2002, Page(s):129 - 140
    Cited by:  Papers (59)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (454 KB) | HTML iconHTML

    XML and other semi-structured data may have partially specified or missing schema information, motivating the use of a structural summary which can be automatically computed from the data. These summaries also serve as indices for evaluating the complex path expressions common to XML and semi-structured query languages. However, to answer all path queries accurately, summaries must encode informat... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient index structure for shift and scale invariant search of mufti-attribute time sequences

    Publication Year: 2002
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (197 KB) | HTML iconHTML

    We consider the problem of shift and scale invariant search for multi-attribute time sequences. Our work fills a void in the existing literature for time sequence similarity since the existing techniques do not consider the general symmetric formulation of the problem. We define a new distance function for mufti-attribute time sequences that is symmetric: the distance between two time sequences is... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A sampling-based estimator for top-k selection query

    Publication Year: 2002, Page(s):617 - 627
    Cited by:  Papers (7)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (498 KB) | HTML iconHTML

    Top-k queries arise naturally in many database applications that require searching for records whose attribute values are close to those specified in a query. We study the problem of processing a top-k query by translating it into an approximate range query that can be efficiently processed by traditional relational DBMSs. We propose a sampling-based approach, along with various query mapping stra... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.