Proceedings 18th International Conference on Data Engineering

Feb. 26 2002-March 1 2002

Filter Results

Displaying Results 1 - 25 of 97
  • Proceedings 18th International Conference on Data Engineering

    Publication Year: 2002
    Request permission for commercial reuse | |PDF file iconPDF (357 KB)
    Freely Available from IEEE
  • Bioinformatics databases 1 [Advanced Technology Seminar 4]

    Publication Year: 2002, Page(s): 649
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (261 KB) | HTML iconHTML

    SUmmary form only given. The tutorial is intended to introduce database folk to database issues which arise in bioinformatics, i.e., molecular biology, genetics, and biochemistry. We will commence with a very brief introduction to molecular biology and genetics and the requisite vocabulary. However, this is NOT intended to be a biology tutorial, so attendees would be well advised to read a biology... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2002, Page(s):733 - 735
    Request permission for commercial reuse | |PDF file iconPDF (322 KB)
    Freely Available from IEEE
  • Attribute classification using feature analysis

    Publication Year: 2002
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (307 KB) | HTML iconHTML

    The basis of many systems that integrate data from multiple sources is a set of correspondences between source schemata and a target schema. Correspondences express a relationship between sets of source attributes, possibly from multiple sources, and a set of target attributes. Clio is an integration tool that assists users in defining value correspondences between attributes. In real life scenari... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Smodels (declarative logic programming) to verify correctness of certain active rules

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (200 KB) | HTML iconHTML

    In this paper we show that the language of declarative logic programming (DLP) with answer sets and its extensions can be used to specify database evolution due to updates and active rules, and to verify correctness of active rules with respect to a specification described using temporal logic and aggregate operators. We classify the specification of active rules into four kind of constraints whic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast mining of massive tabular data via approximate distance computations

    Publication Year: 2002, Page(s):605 - 614
    Cited by:  Papers (4)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (563 KB) | HTML iconHTML

    Tabular data abound in many data stores: traditional relational databases store tables, and new applications also generate massive tabular datasets. We present methods for determining similar regions in massive tabular data. Our methods are for computing the "distance" between any two subregions of tabular data: they are approximate, but highly accurate as we prove mathematically, and they are fas... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data mining meets performance evaluation: fast algorithms for modeling bursty traffic

    Publication Year: 2002, Page(s):507 - 516
    Cited by:  Papers (44)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (573 KB) | HTML iconHTML

    Network, Web, and disk I/O traffic are usually bursty and self-similar and therefore cannot be modeled adequately with Poisson arrivals. However, we wish to model these types of traffic and generate realistic traces, because of obvious applications for disk scheduling, network management, and Web server design. Previous models (like fractional Brownian motion and FARIMA, etc.) tried to capture the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An authorization system for temporal data

    Publication Year: 2002, Page(s):339 - 340
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (412 KB) | HTML iconHTML

    We present a system, called the Temporal Data Authorization Model (TDAM), for managing authorizations for temporal data. TDAM is capable of expressing access control policies based on the temporal characteristics of data. TDAM extends existing authorization models to allow the specifications of temporal constraints on data, based on data validity, data capture time, and replication time, using eit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Techniques for storing XML

    Publication Year: 2002, Page(s): 323
    Cited by:  Papers (1)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (261 KB) | HTML iconHTML

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data cleaning and XML: the DBLP experience

    Publication Year: 2002
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (237 KB) | HTML iconHTML

    With the increasing popularity of data-centric XML, data warehousing and mining applications are being developed for rapidly burgeoning XML data repositories. Data quality will no doubt be a critical factor for the success of such applications. Data cleaning, which refers to the processes used to improve data quality, has been well researched in the context of traditional databases. In earlier wor... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards meaningful high-dimensional nearest neighbor search by human-computer interaction

    Publication Year: 2002, Page(s):593 - 604
    Cited by:  Papers (13)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1248 KB) | HTML iconHTML

    Nearest neighbor search is an important and widely used problem in a number of important application domains. In many of these domains, the dimensionality of the data representation is often very high. Recent theoretical results have shown that the concept of proximity or nearest neighbors may not be very meaningful for the high dimensional case. Therefore, it is often a complex problem to find go... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • StreamCorder: fast trial-and-error analysis in scientific databases

    Publication Year: 2002, Page(s):500 - 501
    Cited by:  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (316 KB) | HTML iconHTML

    We have implemented a client/server system for fast trial-and-error analysis: the StreamCorder. The server streams wavelet-encoded views to the clients, where they are cached, decoded and processed. Low-quality decoding is beneficial for slow network connections. Low-resolution decoding greatly accelerates decoding and analysis. Depending on the system resources, cached data and analysis requireme... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A graphical XML query language

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (366 KB) | HTML iconHTML

    Informally presents the query language 𝒳𝒢L (eXtensible Graphical Language). The main features of the language are described by means of two queries on a document named "bib.xml" (a document describing the bibliographic details of a book) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SG-WRAP: a schema-guided wrapper generator

    Publication Year: 2002, Page(s):331 - 332
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (292 KB) | HTML iconHTML

    Although wrapper generation work has been reported in the literature, there seem no standard ways to evaluate the performance of such systems. We conducted a series of experiments to evaluate the usability, correctness and efficiency of SG-WRAP. The usability tests selected a number of users to use the system. The results indicated that, with minimal introduction of the system, DTD definition and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The BINGO! focused crawler: from bookmarks to archetypes

    Publication Year: 2002, Page(s):337 - 338
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (277 KB) | HTML iconHTML

    The BINGO! system implements an approach to focused crawling that aims to overcome the limitations of the initial training data. To this end, BINGO! identifies, among the crawled and positively classified documents of a topic, characteristic "archetypes" and uses them for periodically re-training the classifier; this way the crawler is dynamically adapted based on the most significant documents se... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • DBXplorer: a system for keyword-based search over relational databases

    Publication Year: 2002, Page(s):5 - 16
    Cited by:  Papers (160)  |  Patents (26)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (552 KB) | HTML iconHTML

    Internet search engines have popularized the keyword-based search paradigm. While traditional database management systems offer powerful query languages, they do not allow keyword-based search. In this paper, we discuss DBXplorer, a system that enables keyword-based searches in relational databases. DBXplorer has been implemented using a commercial relational database and Web server and allows use... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A publish and subscribe architecture for distributed metadata management

    Publication Year: 2002, Page(s):309 - 320
    Cited by:  Papers (4)  |  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (543 KB) | HTML iconHTML

    The emergence of electronic marketplaces and other electronic services and applications on the Internet is creating a growing demand for the effective management of resources. Due to the nature of the Internet, such information changes rapidly. Furthermore, such information must be available for a large number of users and applications, and copies of pieces of information should be stored near tho... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast regular expression indexing engine

    Publication Year: 2002, Page(s):419 - 430
    Cited by:  Papers (2)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (363 KB) | HTML iconHTML

    In this paper; we describe the design, architecture, and lessons learned from the implementation of a fast regular-expression indexing engine FREE. FREE uses a prebuilt index to identify the text data units which may contain a matching string and only examines these further. In this way, FREE shows orders of magnitude performance improvement in certain cases over standard regular expression matchi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multivariate time series prediction via temporal classification

    Publication Year: 2002
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (196 KB) | HTML iconHTML

    In this paper, we study a special form of time-series prediction, viz. the prediction of a dependent variable taking discrete values. Although in a real application this variable may take numeric values, the users are usually only interested in its value ranges, e.g. normal or abnormal, not its actual values. In this work, we extended two traditional classification techniques, namely the naive Bay... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • OSSM: a segmentation approach to optimize frequency counting

    Publication Year: 2002, Page(s):583 - 592
    Cited by:  Papers (10)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (424 KB) | HTML iconHTML

    Computing the frequency of a pattern is one of the key operations in data mining algorithms. We describe a simple yet powerful way of speeding up any form of frequency counting satisfying the monotonicity condition. Our method, the optimized segment support map (OSSM), is a light-weight structure which partitions the collection of transactions into m segments, so as to reduce the number of candida... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Condensed cube: an effective approach to reducing data cube size

    Publication Year: 2002, Page(s):155 - 165
    Cited by:  Papers (40)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (377 KB) | HTML iconHTML

    Pre-computed data cube facilitates OLAP (on-line analytical processing). It is well-known that data cube computation is an expensive operation. While most algorithms have been devoted to optimizing memory management and reducing computation costs, less work has addressed a fundamental issue: the size of a data cube is huge when a large base relation with a large number of attributes is involved. I... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping XML and relational schemas with Clio

    Publication Year: 2002, Page(s):498 - 499
    Cited by:  Papers (6)  |  Patents (10)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (260 KB) | HTML iconHTML

    Merging and coalescing data from multiple and diverse sources into different data formats continues to be an important problem in modern information systems. Schema matching (the process of matching elements of a source schema with elements of a target schema) and schema mapping (the process of creating a query that maps between two disparate schemas) are at the heart of data integration systems. ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How good are association-rule mining algorithms?

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (202 KB) | HTML iconHTML

    Addresses the question of how much space remains for performance improvement over current association rule mining algorithms. Our approach is to compare their performance against an "Oracle algorithm" that knows in advance the identities of all frequent item sets in the database and only needs to gather the actual supports of these item sets, in one scan over the database, to complete the mining p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Lossy reduction for very high dimensional data

    Publication Year: 2002, Page(s):663 - 672
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (343 KB) | HTML iconHTML

    We consider the use of data reduction techniques for the problem of approximate query answering. We focus on applications for which accurate answers to selective queries are required, and for which the data are very high dimensional (having hundreds of attributes). We present a new data reduction method for this type of application, called the RS kernel. We demonstrate the effectiveness of this me... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A framework towards efficient and effective sequence clustering

    Publication Year: 2002
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (196 KB) | HTML iconHTML

    Analyzing sequence data (particularly in categorical domains) has become increasingly important, partially due to the significant advances in biology and other fields. Examples of sequence data include DNA sequences, unfolded protein sequences, text documents, Web usage data, system traces, etc. Previous work on mining sequence data has mainly focused on frequent pattern discovery. In this project... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.