By Topic

Research and Technology Advances in Digital Libraries, 1998. ADL 98. Proceedings. IEEE International Forum on

Date 22-24 April 1998

Filter Results

Displaying Results 1 - 25 of 35
  • Proceedings IEEE International Forum on Research and Technology Advances in Digital Libraries -ADL'98-

    Save to Project icon | Request Permissions | PDF file iconPDF (235 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 327 - 328
    Save to Project icon | Request Permissions | PDF file iconPDF (108 KB)  
    Freely Available from IEEE
  • Personal interface mechanism on digital library

    Page(s): 76 - 85
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (192 KB)  

    A digital library is very useful to retrieve and refer to appropriate information directly online, and many projects have expanded progressively. However, a digital library with an operational interface or personal/private manipulation facility is not always sufficiently investigated with a view to constructing an environment in which users can apply the retrieved/referred information to their ordinary works directly or their original requests effectively. This paper addresses personal working environments. A personal interface for a digital library is proposed, with respect to a working environment of a personal data library, constructed on existing digital libraries. The main idea is to introduce the concept of a notebook as user-specific data storage View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Applying data mining techniques for descriptive phrase extraction in digital document collections

    Page(s): 2 - 11
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (120 KB)  

    Traditionally, texts have been analysed using various information retrieval-related methods, such as full-text analysis and natural language processing. However, only few examples of data mining in text, particularly in full text, are available. In this paper, we show that general data mining methods are applicable to text analysis tasks such as descriptive phrase extraction. Moreover, we present a general framework for text mining. The framework follows the general knowledge discovery process, thus containing steps from preprocessing to utilization of the results. The data mining method that we apply is based on generalized episodes and episode rules. We give concrete examples of how to preprocess texts based on the intended use of the discovered results and we introduce a weighting scheme that helps in pruning out redundant or non-descriptive phrases. We also present results from real-life data experiments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Image processing in the Alexandria Digital Library project

    Page(s): 180 - 187
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3620 KB)  

    The management of images, video, and in general, multimedia data, is an important issue in the design of digital libraries. In particular, the following problems stand out: efficient storage, fast retrieval, and protection of intellectual property. We outline below some of the recent advances in image processing in the context of the UCSB Alexandria Digital Library (ADL) project whose goal is to create a database of spatially indexed data. Maps and satellite images are among the main data sets in this project. The focus of this overview is on image retrieval using texture and on digital watermarking. A texture thesaurus for browsing aerial photographs and a wavelet based digital watermarking scheme are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Making sense of scientific information on WorldWideWeb: WWW-TED (Thesaurus Evolutif et Dynamique)

    Page(s): 238 - 244
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (148 KB)  

    This article considers the demand for a HTML document management tool which provides high quality search capability for scientific research. It proposes guidelines for the implementation of WWW-TED, an evolving thesaurus tool for medium-sized collections of HTML pages View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An indexing model for structured documents to support queries on content, structure and attributes

    Page(s): 88 - 97
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    The complex internal structure of documents can be described and captured by documentation representation standards such as SGML and SGML related standards like HTML and XML. The hierarchical structure of documents and the attributes of documents as well as attributes of document components at all levels of the document hierarchy can be encoded with markup tags. In traditional text database systems, only queries on content are supported. The rich structural information contained in documents and the attributes of document components are not captured in these systems, and queries on structure and attributes are not supported. We propose a text model, a query language and an indexing scheme which can support queries on content, structure, and attributes of documents as well as attributes of text elements within documents. This model is schema-independent, and query evaluation time is at worst linear. We show that our indexing scheme can efficiently support a wide range of queries in a database for highly heterogeneous collections of structured documents. We provide query examples to show how all the information encoded in documents marked up according to the TEI Guidelines, an encoding standard adopted by the humanities disciplines, can be indexed and queried in our indexing model View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Thesaurus federations: a framework for the flexible integration of heterogeneous, autonomous thesauri

    Page(s): 46 - 55
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB)  

    Modern information systems such as the World Wide Web and digital libraries contain more data than ever before, are globally distributed, are easy to use, and therefore become accessible to huge, heterogeneous user groups. On the other hand, the potentially enormously large amount of heterogeneous information requires powerful tools to allow the user to find relevant pieces of data. One such tool is thesauri. They are a proven means to provide a uniform and consistent vocabulary for the indexing and retrieval of information-bearing objects (IBOs). Modern multi-lingual and multi-subject information systems require more than the traditional single-language, narrow-focus thesauri. The broad clientele of information systems demands thesauri that can be used by non-specialists. To achieve this goal, we introduce the framework of thesaurus federations, i.e. loose compounds of distributed, multi- or mono-lingual thesauri that go beyond the already-known concepts of multi-thesaurus systems. We classify multi-thesaurus systems into multi-thesaurus environments, thesaurus switching systems and thesaurus compounds. Our architecture is based on a mediation layer and wrappers for the integration of heterogeneous, distributed thesauri. We present a Java-based prototype system which enables integrated access to several thesauri, which is available through a SQL or HTML interface via a comfortable thesaurus federation browser. This system has been used for the retrieval of metadata records managed by the Catalogue of Data Sources of the European Environment Agency (EEA) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A metadata architecture for digital libraries

    Page(s): 276 - 288
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (156 KB)  

    From an architectural perspective, there is no essential distinction between data and metadata. Both can be represented in distributed active relationships (DARs), which are an extension of the Warwick framework (C. Lagoze et al., 1996). The DAR model is a powerful way to express relationships between networked resources and to allow such relationships to be dynamically downloadable and executable View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • WebDB: a Web query system and its modeling, language, and implementation

    Page(s): 216 - 227
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (384 KB)  

    The World-Wide Web can be viewed as a collection of multimedia documents in the form of Web pages connected through hyperlinks. Unlike most Web search engines, which primarily focus on information retrieval functionality, WebDB aims at supporting more comprehensive database-like query functionality, including selection, aggregation, sorting, summary, grouping, and projection on (1) document level information, such as title, URL, length, keywords, types, and last modified date; (2) intra-document structures, such as tables, forms, and images; and (3) inter-document linkage information, such as destination URLs and anchors. With these three types of information, more comprehensive queries, such as “list all Web pages which link to the NEC Web sites containing a form and a keyword multimedia within link depth of 3 and group these Web pages by country” can be supported. A novel visual query/browsing interface of WebDB, WebIFQ, is presented for demonstrating the high usability of this system View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A methodology for the enhancement of a hypertext version of a textbook by the automatic insertion of links in the subject index

    Page(s): 157 - 166
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    This paper presents a methodology for the enhancement of a hypertext version of a textbook. The enhancement over the textual version of the textbook is achieved by automatically inserting links between text excerpts of the textbook and the item in the subject index produced by the author of the textbook. These links enable access to parts of the textbook that have not been specifically indexed by the author, but that are semantically related to items in the subject index. Such links are meant to improve the effectiveness of the use of the book in search oriented tasks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advanced hypermedia indexing of documents in a deductive database system

    Page(s): 98 - 106
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB)  

    This paper presents the hypermedia model SemaLink that faces orientation and maintenance problems in large information networks with a knowledge-based approach. Documents are connected to a knowledge base. By means of queries to this knowledge base users locate documents effectively that serve as entry points for a further knowledge based hypermedia navigation. Rule-based virtual structures derive new hypermedia structures automatically. Data consistency is guaranteed by an object-oriented schema. As a generic model SemaLink is suitable for a wide range of applications. To achieve this SemaLink combines multimedia with deductive database systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An object-based information retrieval model: toward the structural construction of thesauri

    Page(s): 117 - 125
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB)  

    We propose an information retrieval model, where the object-oriented paradigm is applied to the construction of thesauri and the interpretation of user queries. This model provides a mechanism to assist domain experts in constructing thesauri; it determines a considerable part of the relationship degrees between term objects by inheritance, and supplies domain experts with information available from a thesaurus being constructed. It enables domain experts to incrementally construct a thesaurus as well, since the automatically determined degree of relationships can be refined whenever a more sophisticated thesaurus is needed. It may minimize the domain expert's burden caused from exhaustive specification of individual relationships. All the relationships between term objects of our thesaurus (called object-based thesaurus) are represented in terms of two levels: concept level and instance level. The former defines the relationships between concepts, whereas the latter specifies the relationships between instances. We also propose a new query evaluation mechanism to exploit the thesaurus when interpreting the intent of user queries View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multilingual input system for the Web-an open multimedia approach of keyboard and handwriting recognition for Chinese and Japanese

    Page(s): 188 - 194
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (124 KB)  

    The basic building block of a multilingual information retrieval system is the input system. Chinese and Japanese characters pose great challenges for the conventional 101-key alphabet-based keyboard, because they are radical-based and number in the thousands. This paper reviews the development of various approaches and then presents a framework and working demonstrations of Chinese and Japanese input methods implemented in Java, which allow open deployment over the Web to any platform. The demo includes both popular keyboard input methods and neural network handwriting recognition using a mouse or pen. This framework is able to accommodate future extension to other input media and languages of interest View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Browsing through high quality document images with DjVu

    Page(s): 309 - 318
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (544 KB)  

    Presents a new image compression technique called “DjVu” that is specifically geared towards the compression of high-resolution, high-quality images of scanned documents in color. With DjVu, any screen connected to the Internet can access and display images of scanned pages while faithfully reproducing the font, color, drawings, pictures and paper texture. A typical magazine page in color at 300 dpi can be compressed down to between 40 to 60 KBytes, approximately 5 to 10 times better than JPEG for a similar level of subjective quality. Black-and-white documents are typically 15 to 30 KBytes at 300 dpi, or 4 to 8 times better than CCITT-G4. A real-time, memory-efficient version of the decoder was implemented, and is available as a plug-in for popular Web browsers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Query optimization for structured documents based on knowledge on the document type definition

    Page(s): 196 - 205
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (252 KB)  

    Declarative access mechanisms for structured document collections and for semi-structured data are becoming increasingly important. Using a rule-based approach for query optimization and applying it to such queries, we deploy knowledge on Document Type Definition (DTD) to formulate transformation rules for query-algebra terms. Specifically, we look at rules that serve navigation along paths by cutting off these paths or by replacing them with access operations to indices, i.e., materialized views on paths. We show for both cases that we correctly apply and completely exploit knowledge on the DTD, and we briefly discuss performance results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Electronic publishing, storage, dissemination and retrieval of a scientific journal through the Web

    Page(s): 137 - 146
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    The initial part of the paper describes the general architecture of a prototype system developed in the context of the IRIDES project for the electronic publishing, storage, dissemination and retrieval of The Computer Journal, published by Oxford University Press. The core of the paper is devoted to one of the components of the overall system: AUCTOR. AUCTOR is a system for the automatic authoring of HTML hypertext representing the content of the journal's articles. After a detailed description of the system, considerations of the automatically authored hypertext are given. AUCTOR in particular, and IRIDES in general, can be considered as the initial component of a complete environment to collect different collections of documents and services for the implementation of a modern digital library View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-resolution cache management in digital virtual library

    Page(s): 66 - 75
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1944 KB)  

    Envisions a particular kind of digital library application: the application of user experience in walking through a large virtual environment, viewing virtual objects from different distances and angles. To provide a good performance of such applications, we need to address several research issues. First, we must be able to model virtual objects effectively. The recently developed multi-resolution object modeling techniques are capable of simplifying the object models and therefore reducing the time to render them. Second, with the limited bandwidth of the Internet, caching of suitable objects of high affinity will reduce the dependency on the network and thus could reduce the response time. Third, the Internet often suffers from disconnection. A caching mechanism that allows objects to be cached with at least their minimum resolution will be useful to provide at least a coarse view of the objects to the viewer. In this paper, we propose a multi-resolution caching mechanism and investigate the effectiveness of the mechanism in supporting virtual walkthrough applications in the Internet environment. The caching mechanism allows virtual objects from a remote database server to be cached in the local storage of a client at various degrees of resolution. We also attempt to quantify the performance of the mechanism via simulated experiments View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A three-level user interface to multimedia digital libraries with relaxation and restriction

    Page(s): 206 - 215
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (224 KB)  

    This paper proposes a three-level user interface to document-rich digital libraries. Not only typical document metadata information but also document contents and personal annotations can be used to select digital documents, and afterward a two-dimension matrix can be used to relax or restrict the selections. Our approach to document retrieval called “three-level user interface” employs a database query language with intelligent multimedia retrieval techniques. Queries can be constituted with metadata information about documents in the first level. In the second level, queries can be constituted with document contents, the contents based on the corresponding document type definition. In the third level, for sophisticated users semantic approaches are used for queries. Users are asked for annotations or heuristics with their subjective meanings. In each level, however, a matrix based query can be used to either restrict for finitely many selections or relax for few selections. The contribution of this paper includes inventing a framework for multimedia document retrieval facility in which query conditions can be efficiently related or restricted for an appropriate number of documents View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distributed digital library architecture incorporating different index styles

    Page(s): 36 - 45
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (144 KB)  

    The New Zealand Digital Library offers several collections of information over the World Wide Web. Although full-text indexing is the primary access mechanism, musical collections can also be accessed through a novel melody retrieval system. In offering this service over a three-year period, we have had to face many practical challenges in building, maintaining and administering diverse collections of different kinds of information, involving different search and retrieval systems, with different user interfaces. This paper describes the design of the software we have built to support the service. Interface server programs provide a uniform interface between the search engine and the client, irrespective of the nature of the collection. Search engines that embody completely different index styles operate under a single distributed framework-we describe as examples MG (Managing Gigabytes), a full-text retrieval system, and the MR (Melody Retrieval) system. A flexible protocol for communicating between an interface server and a search engine is defined. The resulting architecture simplifies library administration and the creation of new collections by providing a unified framework under which vastly different user interfaces and search engines can co-exist in a distributed computing environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Discovering Web access patterns and trends by applying OLAP and data mining technology on Web logs

    Page(s): 19 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB)  

    As a confluence of data mining and World Wide Web technologies, it is now possible to perform data mining on Web log records collected from the Internet Web-page access history. The behaviour of Web page readers is imprinted in the Web server log files. Analyzing and exploring regularities in this behaviour can improve the system performance, enhance the quality and delivery of Internet information services to the end user, and identify populations of potential customers for electronic commerce. Thus, by observing people using collections of data, data mining can bring a considerable contribution to digital library designers. In a joint effort between the TeleLearning-NCE (Networks of Centres of Excellence) project on the Virtual University and the NCE-IRIS project on data mining, we have been developing a knowledge discovery tool, called WebLogMiner, for mining Web server log files. This paper presents the design of WebLogMiner, reports current progress and outlines future work in this direction View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Towards an archival Intermemory

    Page(s): 147 - 156
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB)  

    We propose a self-organizing archival Intermemory. That is, a noncommercial subscriber-provided distributed information storage service built on the existing Internet. Given an assumption of continued growth in the memory's total size, a subscriber's participation for only a finite time can nevertheless ensure archival preservation of the subscriber's data. Information disperses through the network over time and memories become more difficult to erase as they age. The probability of losing an old memory given random node failures is vanishingly small-and an adversary would have to corrupt hundreds of thousands of nodes to destroy a very old memory. This paper presents a framework for the design of an Intermemory, and considers certain aspects of the design in greater detail. In particular, the aspects of addressing, space efficiency, and redundant coding are discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Semantic relations in a medical digital library

    Page(s): 290 - 298
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB)  

    Describes the VesaliusTM project, a multi-modal collection of anatomical resources under development at Columbia University. Our focus is on the need for navigational tools to effectively access the wealth of electronic information on anatomy, including life-like 3D images of anatomical entities that can be interactively viewed and browsed. We describe a key component which must be in place in order to develop a flexible and reusable digital library system, namely an anatomical knowledge base containing a `nucleus' of anatomical information specifically designed to make it possible to develop a wide spectrum of curriculum applications that use and extend the information in the knowledge base. The unique contribution of our research lies in the dual focus on user needs and on effective use of knowledge representation theory in order to develop a system that makes it possible to take advantage of interactive 3D models and the wealth of other anatomical data that is now available View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • KeyGraph: automatic indexing by co-occurrence graph based on building construction metaphor

    Page(s): 12 - 18
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (220 KB)  

    Presents an algorithm for extracting keywords representing the asserted main point in a document, without relying on external devices such as natural-language processing tools or a document corpus. Our algorithm, KeyGraph, is based on the segmentation of a graph, representing the co-occurrence between terms in a document, into clusters. Each cluster corresponds to a concept on which an author's idea is based, and the top-ranked terms are selected as keywords using a statistic based on each term's relationship to these clusters. This strategy comes from considering that a document is constructed like a building for expressing new ideas based on traditional concepts. The experimental results show that the thus-extracted terms match the author's main point quite accurately, even though KeyGraph does not use each term's average frequency in a corpus, i.e. KeyGraph is a content-sensitive, domain-independent indexing device View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Story segmentation and detection of commercials in broadcast news video

    Page(s): 168 - 179
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (280 KB)  

    The Informedia Digital Library Project allows full content indexing and retrieval of text, audio and video material. Segmentation is an integral process in the Informedia digital video library. The success of the Informedia project hinges on two critical assumptions: that we can extract sufficiently accurate speech recognition transcripts from the broadcast audio and that we can segment the broadcast into video paragraphs, or stories, that are useful for information retrieval. In previous papers we have shown that speech recognition is sufficient for information retrieval of pre-segmented video news stories. We now address the issue of segmentation and demonstrate that a fully automatic system can extract story boundaries using available audio, video and closed-captioning cues. The story segmentation step for the Informedia Digital Video Library splits full-length news broadcasts into individual news stories. During this phase the system also labels commercials as separate “stories”. We explain how the Informedia system takes advantage of the closed captioning frequently broadcast with the news, how it extracts timing information by aligning the closed-captions with the result of the speech recognition, and how the system integrates closed-caption cues with the results of image and audio processing View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.