By Topic

Scientific and Statistical Database Management, 2001. SSDBM 2001. Proceedings. Thirteenth International Conference on

Date 18-20 July 2001

Filter Results

Displaying Results 1 - 25 of 30
  • Proceedings Thirteenth International Conference on Scientific and Statistical Database Management. SSDBM 2001

    Save to Project icon | Request Permissions | PDF file iconPDF (194 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 279
    Save to Project icon | Request Permissions | PDF file iconPDF (63 KB)  
    Freely Available from IEEE
  • Ontology negotiation between scientific archives

    Page(s): 245 - 250
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (472 KB) |  | HTML iconHTML  

    Describes an approach to ontology negotiation between information agents. Ontologies are declarative (data-driven) expressions of an agent's “world”: the objects, operations, facts and rules that constitute the logical space within which an agent performs. Ontology negotiation enables agents to cooperate in performing a task, even if they are based on different ontologies. The process allows agents to discover ontology conflicts and then, though incremental interpretation, clarification and explanation, establish a common basis for communication with each other View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semistructured probabilistic databases

    Page(s): 36 - 45
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB) |  | HTML iconHTML  

    The article describes a novel theoretical framework for uniform storage and management of diverse probabilistic information. The semistructured data model has gained wide acceptance recently as a means of representing data which lacks a rigid structure of schema. In particular, the similarity of the semistructured data model and the underlying data model for eXtensible Markup Language (XML), the emerging open standard for data storage and transmission over the Internet, make our choice of this approach attractive. The authors present the formal model for semistructured probabilistic objects. They provide the theoretical foundations for storing and managing semistructured probabilistic objects. Previously (S. Hawkes and A. Dekhtyar, 2001), we started the process of translating this model into XML. We introduce the advising application and give formal definitions of semistructured probabilistic objects. Finally, we introduce the underlying algebra for semistructured probabilistic databases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specifying OLAP cubes on XML data

    Page(s): 101 - 112
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1088 KB) |  | HTML iconHTML  

    On-Line Analytical Processing (OLAP) enables analysts to gain insight into data through fast and interactive access to a variety of possible views on information, organized in a dimensional model. The demand for data integration is rapidly becoming larger as more and more information sources appear in modern enterprises. In the data warehousing approach, selected information is extracted in advance and stored in a repository. This approach is used because of its high performance. However, in many situations a logical (rather than physical) integration of data is preferable. Previous Web-based data integration efforts have focused almost exclusively on the logical level of data models, creating a need for techniques focused on the conceptual level. Also, previous integration techniques for Web-based data have not addressed the special needs of OLAP tools such as handling dimensions with hierarchies. Extensible Markup Language (XML) is fast becoming the new standard for data representation and exchange on the World Wide Web. The rapid emergence of XML data on the Web, e.g., business-to-business (B2B) e-commerce, is making it necessary for OLAP and other data analysis tools to handle XML data as well as traditional data formats. Based on a real-world case study, the paper presents an approach to the conceptual specification of OLAP DBs based on Web data. Unlike previous work, this approach takes special OLAP issues such as dimension hierarchies and correct aggregation of data into account. Additionally, an integration architecture that allows the logical integration of XML and relational data sources for use by OLAP tools is presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An extensible index for spatial databases

    Page(s): 49 - 58
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (820 KB) |  | HTML iconHTML  

    Emerging database applications require the use of new indexing structures beyond B-trees and R-trees. Examples are the k-D tree, the trie, the quadtree, and their variants. They are often proposed as supporting structures in data mining, GIS, and CAD/CAM applications. A common feature of all these indexes is that they recursively divide the spare into partitions. A novel extensible index structure, termed SP-GiST, is presented that supports this class of data structure, mainly the class of space partitioning unbalanced trees. Simple method implementations are provided that demonstrate how SP-GiST can behave as a k-D tree, a trie, a quadtree, or any of their variants. Issues related to clustering tree nodes into pages as well as concurrency control for SP-GiST are addressed. A dynamic minimum-height clustering technique is applied to minimize disk accesses and to make using such trees in database systems possible and efficient. A prototype implementation of SP-GiST is presented as well as performance studies of the various SP-GiST's tuning parameters View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using association rules to add or eliminate query constraints automatically

    Page(s): 124 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (696 KB) |  | HTML iconHTML  

    Much interesting work has been done on the use of semantic associations for optimizing query execution. Our objective is to study the use of association rules to add or eliminate constraints in the where clause of a select query. In particular, we take advantage of the following heuristics presented by Siegel et al. (1992): i) if a selection on attribute A is implied by another selection condition on attribute B and A is not an index attribute, then the selection on A can be removed from the query; ii) if a relation R in the query has a restricted attribute A and an unrestricted cluster index attribute B, then look for a rule where the restriction on A implies a restriction on B. The contribution of our work is twofold. First, we present detailed algorithms that apply these heuristics. Hence, our ideas are easy to implement. Second we discuss conditions under which it is worth applying these optimization techniques, and we show the extent to which they speed up query execution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient historical R-trees

    Page(s): 223 - 232
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (792 KB) |  | HTML iconHTML  

    The historical R-tree (HR-tree) is a spatio-temporal access method aimed at the retrieval of window queries in the past. The concept behind the method is to keep an R-tree for each timestamp in history, but to allow consecutive trees to share branches when the underlying objects do not change. New branches are only created to accommodate updates from the previous timestamp. Although existing implementations of HR-trees process timestamp (window) queries very efficiently, they are hardly applicable in practice due to excessive space requirements and poor interval query performance. This paper addresses these problems by proposing the HR+-tree, which occupies a small fraction of the space required for the corresponding HR-tree (for typical conditions about 20%), while improving interval query performance several times. Our claims are supported by extensive experimental evaluation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient query strategy for integrated remote sensing and inventory (spatial) databases

    Page(s): 115 - 123
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (852 KB) |  | HTML iconHTML  

    The integration of disparate heterogeneous spatial databases for extending queries is a challenging task. The authors present a novel framework, based on a k-nearest neighbor (kNN) algorithm, for integrating remote sensing imagery with Forest Inventory Analysis (FIA) sample point/plot data managed in a relational database system. We then demonstrate how queries to this system may be extended over any arbitrary region of interest in a Web based geographical information system. To build the integrated database, spectral signatures are collected at FIA plot locations from the Landsat TM image. A plot-id image is produced by assigning each pixel to the closest FIA plot in multi-dimensional spectral space. The resulting image provides an interface to the Forest Inventory Analysis Data-Base (FIADB) and allows generalizations of the estimates for any user defined query window or region of interest (ROI). This methodology, along with geostatistical analysis, is integrated into a client/server Web based geographical information system, which provides Internet users with an easy to use query interface for the FIADB and spatial databases View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cost-based unbalanced R-trees

    Page(s): 203 - 212
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    Cost-based unbalanced R-trees (CUR-trees) are a cost-function-based data structure for spatial data. CUR-trees are constructed specifically to improve the evaluation of intersection queries, the most basic selection query in an R-tree. A CUR-tree is built taking into account a given query distribution for the queries and a cost model for their execution. Depending on the expected frequency of access, objects or subtrees are stored higher up in the tree. After each insertion in the tree, local reorganizations of a node and its children have their expected query cost evaluated, and a reorganization is performed if this is beneficial. No strict balancing of the trees applies, allowing the tree to unfold solely based on the result of the cost evaluation. We present our cost-based approach and describe the evaluation and reorganization operations based on the cost function. We present a cost model for in-memory access costs and we present three different query models. In our experiments, we compare the performance of the CUR-tree to the R-tree and the R*-tree. The CUR-tree is able to significantly improve intersection query performance, without unacceptably increasing the cost of building the tree. The use of R-trees for in-memory data reflects the high (and growing) cost of bringing data from RAM into the CPU cache relative to the cost of other computations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An XML-based distributed metadata server (DIMES) supporting Earth science metadata

    Page(s): 251 - 256
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (660 KB) |  | HTML iconHTML  

    With explosively increasing volumes of remote sensing, modelling and other Earth science data available, and the popularity of the Internet, scientists are now facing challenges to publish and to find interesting data sets effectively and efficiently. Metadata has been recognized as a key technology to ease the searching and retrieval of Earth science data. In this paper, we discuss the DIMES (DIstributed MEtadata Server) prototype system. Designed to be flexible yet simple, DIMES uses XML to represent, store, retrieve and interoperate metadata in a distributed environment. DIMES accepts metadata in any well-formed XML format and thus assumes the “tree” semantics of metadata entries. Additional domain knowledge can be represented as specific links through XML's ID/IDREF mechanism. DIMES provides a number of mechanisms, including the “nearest-neighbor search”, to navigate and to search metadata. Though started for the Earth science community, DIMES can be easily extended to serve scientific communities in other disciplines View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring the behavior of the spring ecosystem model using an object-oriented database system

    Page(s): 267 - 269
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB) |  | HTML iconHTML  

    Landscape and ecosystem models have typically been viewed as “black boxes”, that given a set of inputs yield a set of outputs. This view does not easily lend itself to investigations as to why a model behaves as it does, especially when multiple models are coupled together to create a larger model. We hypothesize that, for this type of investigation, a visual multimedia tool is necessary to gain insight into the temporal behavior of the model. In order to facilitate the exploration of the behavior of one particular well-known landscape model, we have coupled the model with an object oriented database system and a data plotting and visualization package. Using this system, we have created animations of the model's output and used them to discover interesting properties of the model. In our demonstration, we display these animations and discuss how such a system can be used as an aid to explore hypotheses about the model's behavior View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rewrite rules for quantified subqueries in a federated database

    Page(s): 134 - 143
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    Transforming queries for efficient execution is particularly important in federated database systems since a more efficient execution plan can require many fewer data requests to be sent to the component databases. Also, it is important to do as much as possible of the selection and processing close to where the data are stored, making best use of facilities provided by the federation's component database management systems. We address the problem of processing complex queries including quantifiers, which have to be executed against different databases in an expanding heterogeneous federation. This is done by transforming queries within a mediator for global query improvement, and within wrappers to make best use of the query processing capabilities of external databases. Our approach is based on pattern matching and query rewriting. We introduce a high level language for expressing rewrite rules declaratively, and demonstrate the use and flexibility of such rules in improving query performance for existentially quantified subqueries. Extensions to this language that allow generic rewrite rules to be expressed are also presented. The value of performing final transformations within a wrapper for a given remote database is shown in several examples that use AMOS II-an SQLS-like system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Content-based browsing of data from the Tropical Rainfall Measuring Mission (TRMM)

    Page(s): 270 - 273
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB) |  | HTML iconHTML  

    Through content based browsing, the TSDIS Orbit Viewer can help scientists decide which files to order from the TRMM archive. The Orbit Viewer's Mission Index can locate large-scale rain events in six terabytes of data. The Orbit Viewer's TRMM Tracker can locate coincidences between the TRMM orbit and a user-defined surface track View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Two approaches to representing multiple overlapping classifications: a comparison [plant taxonomy]

    Page(s): 239 - 244
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB) |  | HTML iconHTML  

    One of the tasks of plant taxonomy is the creation of classifications of organisms that allows the understanding of the evolutionary relationships between them. In this paper, we describe two different data models that have been designed to support two aspects of taxonomic work: the storage of the information and the visualisation of that information. We show that these two models are different because of their constraints and aims, and we compare their abilities using a number of typical tasks that users perform. We also show that, although different and able to perform different tasks, each of these models is well adapted to its purpose, and tight integration is difficult View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient disk allocation schemes for parallel retrieval of multidimensional grid data

    Page(s): 213 - 222
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (724 KB) |  | HTML iconHTML  

    Declustering schemes enable parallel data retrieval by placing data blocks across multiple disk devices. Various declustering schemes have been proposed for multidimensional data to reduce the response time of range queries. However, efficient schemes, which must be easy to compute and provide good performance, are only known for a restricted number of disks and dimensions. In this paper, we propose a novel technique to construct efficient multidimensional declustering schemes, for any number of disks and dimensions. Simulation results show that the new schemes outperform the best previously-known non-exhaustive search-based multidimensional declustering schemes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Earth System Science Workbench: a data management infrastructure for earth science products

    Page(s): 180 - 189
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (720 KB) |  | HTML iconHTML  

    The Earth System Science Workbench (ESSW) is a non-intrusive data management infrastructure for researchers who are also data publishers. An implementation of ESSW to track the processing of locally received satellite imagery is presented, demonstrating the Workbench's transparent and robust support for archiving and publishing data products. ESSW features a Lab Notebook metadata service, an ND-WORM (No Duplicate-Write Once Read Many) storage service, and Web user interface tools. The Lab Notebook logs processes (experiments) and their relationships via a custom API to XML documents stored in a relational database. The ND-WORM provides a managed storage archive for the Lab Notebook by keeping unique file digests and name-space meta-data, also in a relational database. ESSW Notebook tools allow project searching and ordering, and file and meta-data management View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tracing lineage of array data

    Page(s): 69 - 78
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (832 KB) |  | HTML iconHTML  

    Arrays are a common and important class of data. They can model digital images, digital video, scientific and experimentation data, matrices, finite element grids, and many other types of data. Although array manipulations are diverse and domain-specific, they often exhibit structural regularities. The paper presents an algorithm called SUN-pushdown to compute data lineage in such array computations. The array manipulations are expressed in the Array Manipulation Language (AML) that was introduced previously (A.P. Marathe and K. Salem, 1997). SUB-pushdown has several useful features. First, the lineage computation is expressed as an AML query. Second, it is not necessary to evaluate the AML lineage query to compute the array data lineage. Third, SUB-pushdown never gives false-negative answers. SUB-pushdown has been implemented as part of the ArrayDB prototype array database system that we built (A.P. Marathe, 2001) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Clustering algorithms and validity measures

    Page(s): 3 - 22
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1784 KB) |  | HTML iconHTML  

    Clustering aims at discovering groups and identifying interesting distributions and patterns in data sets. Researchers have extensively studied clustering since it arises in many application domains in engineering and social sciences. In the last years the availability of huge transactional and experimental data sets and the arising requirements for data mining created needs for clustering algorithms that scale and can be applied in diverse domains. The paper surveys clustering methods and approaches available in the literature in a comparative way. It also presents the basic concepts, principles and assumptions upon which the clustering algorithms are based. Another important issue is the validity of the clustering schemes resulting from applying algorithms. This is also related to the inherent features of the data set under concern. We review and compare clustering validity measures available in the literature. Furthermore, we illustrate the issues that are under-addressed by the recent algorithms and we address new research directions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating distributed scientific data sources with MOCHA and XRoaster

    Page(s): 263 - 266
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB) |  | HTML iconHTML  

    MOCHA is a novel middleware system for integrating distributed data sources that we have developed at the University of Maryland. MOCHA is based on the idea that the code that implements user-defined types and functions should be automatically deployed to remote sites by the middleware system itself. To this end, we have developed an XML-based framework to specify metadata about data sites, data sets, and user-defined types and functions. XRoaster is a graphical tool that we have developed to help the user create all the XML metadata elements to be used in MOCHA View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling statistical metadata

    Page(s): 25 - 35
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (816 KB) |  | HTML iconHTML  

    An object oriented statistical metadata model is presented, which can be used in building information systems providing metadata-guided, statistical data processing features. The semantics of the model are analyzed and a set of operators (transformations) is proposed that allows for the automatic manipulation of both data and metadata at the same time. We discuss the mathematical properties of these transformations, and subsequently as a case study, we demonstrate how a statistical office can use the presented framework to build a Web site offering ad hoc query capabilities to its data consumers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Metacat: a schema-independent XML database system

    Page(s): 171 - 179
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (692 KB) |  | HTML iconHTML  

    The ecological sciences represent a challenging community from the perspective of scientific data management. Ecological data are collected by investigators who are spread out over a large geographic area and who use a wide variety of research protocols and data-handling techniques. The resulting heterogeneous data are stored in autonomous database systems that are dispersed throughout the ecological community. The Knowledge Network for Biocomplexity is seeking to address these issues through the use of structured metadata encoded in the Extensible Markup Language (XML). The main goal of this project has been to design and implement a schema-independent data storage system for XML which is called Metacat. Metacat uses a hybrid XML storage approach using a commercial relational DBMS back-end while still allowing any arbitrary XML document to be stored. This paper describes the Metacat XML data storage system and its relevance to scientific data management in the ecological sciences View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary design and development of image meta-analysis environments based on object-relational database mediator technology

    Page(s): 190 - 200
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (828 KB) |  | HTML iconHTML  

    Discusses how emerging object-relational database mediator technology can be used to integrate academic freeware and commercial-off-the-shelf software components to create a sequence of gradually more complex and powerful, yet always syntactically and semantically homogeneous, database-centred image meta-analysis environments. We show how this may be done by defining and utilising a use-case-based evolutionary design and development process. This process allows subsystems to be produced largely independently by several small specialist subprojects, turning the system integration work into a high-level domain modelling task View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2D TSA-tree: a wavelet-based approach to improve the efficiency of multi-level spatial data mining

    Page(s): 59 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (884 KB) |  | HTML iconHTML  

    Due to the large amount of the collected scientific data, it is becoming increasingly difficult for scientists to comprehend and interpret the available data. Moreover typical queries on these data sets are in the nature of identifying (or visualizing) trends and surprises at a selected sub-region in multiple levels of abstraction rather than identifying information about a specific data point. The authors propose a versatile wavelet-based data structure, 2D TSA-tree (Trend and Surprise Abstractions Tree), to enable efficient multi-level trend detection on spatial data at different levels. We show how 2D TSA-tree can be utilized efficiently for sub-region selections. Moreover, 2D TSA-tree can be utilized to precompute the reconstruction error and retrieval time of a data subset in advance in order to allow the user to trade off accuracy for response time (or vice versa) at query time. Finally, when the storage space is limited, our 2D Optimal TSA-tree saves on storage by storing only a specific optimal subset of the tree. To demonstrate the effectiveness of our proposed methods, we evaluated our 2D TSA-tree using real and synthetic data. Our results show that our method outperformed other methods (DFT and SVD) in terms of accuracy, complexity and scalability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A feasible method to find areas with constraints using hierarchical depth-first clustering

    Page(s): 257 - 262
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (512 KB) |  | HTML iconHTML  

    Addresses a reliable, feasible method to find geographical areas with constraints using hierarchical depth-first clustering. The method involves multi-level hierarchical clustering with a depth-first strategy, depending on whether the area of each cluster satisfies the given constraints. The attributes used in the hierarchical clustering are the coordinates of the grid data points. The constraints are an average value range and the minimum size of an area with a small proportion of missing data points. Convex-hull and point-in-polygon algorithms are involved in examining the constraint satisfaction. The method is implemented for an Earth science data set for vegetation studies - the Normalized Difference Vegetation Index (NVDI) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.