Scheduled System Maintenance:
Some services will be unavailable Sunday, March 29th through Monday, March 30th. We apologize for the inconvenience.
By Topic

String Processing and Information Retrieval Symposium, 1999 and International Workshop on Groupware

Date 24-24 Sept. 1999

Filter Results

Displaying Results 1 - 25 of 47
  • 6th International Symposium on String Processing and Information Retrieval. 5th International Workshop on Groupware (Cat. No.PR00268)

    Publication Year: 1999
    Save to Project icon | Request Permissions | PDF file iconPDF (230 KB)  
    Freely Available from IEEE
  • Author index

    Publication Year: 1999 , Page(s): 363
    Save to Project icon | Request Permissions | PDF file iconPDF (15 KB)  
    Freely Available from IEEE
  • Developing a tool to assist electronic facilitation of decision-making groups

    Publication Year: 1999 , Page(s): 243 - 252
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (164 KB)  

    One resource playing a critical role in electronically supported decision making groups is the facilitator. Facilitation is a complex task, encompassing social abilities, pre-meeting planning of decision making processes and supervising the technology usage during meetings. We found two problems with previous support to electronic facilitation: (1) limited support to planning activities; and (2) limited support to remote facilitation. The Facilitation Tool was developed to address these two issues. The tool was built around a comprehensive decision making model, contributing with design patterns to planning activities. The Facilitation Tool also provides a set of techniques to support remote meetings, allowing facilitators to steer and focus group participants, analyse and understand issues, and moderate conflicting or chaotic situations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Practical constructions of L-restricted alphabetic prefix codes

    Publication Year: 1999 , Page(s): 115 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    Information retrieval systems use various search techniques such as B-trees, inverted files and suffix arrays to provide quick response. Many of these techniques rely on string comparison operations. If a record field is coded using Huffman codes (D.A. Huffman, 1952) in order to save storage space, the field must be decoded before performing any comparison. On the other hand, if the field is alphabetically coded, then the comparison can be directly applied to the sequence of codewords, which is faster. This approach also saves storage space, in comparison with the case where no data compression is applied. Experiments with alphabetically coded texts indexed with suffix arrays were reported by E.S. Moura et al. (1997). We consider the construction of L-restricted ABPC (alphabetic binary prefix code) which satisfies l i⩽L for i=1,...,n. Optimal L-restricted ABPC can be constructed in O(nLlogn) time, using O(nL) space (L.L Larmore and T.M. Przytycka, 1994). Nevertheless, due to its space requirements, this method turns out to be prohibitive for larger values of n. We suggest a simple approach to construct suboptimal L-restricted ABPC. Our approach is divided into three phases. In the first phase, we verify if an optimal ABPC is also an optimal L-restricted ABPC. In the second one, we obtain a L-restricted prefix code (not necessarily alphabetical) and in the third phase we turn this code into an alphabetical one. We denote this approach by 3-phase algorithm . The codes generated through this algorithm are called 3-phase codes. We analyze the time and space complexities and compare the average length of the 3-phase code against the Shannon Entropy. We also compare the average length of the Huffman code against the average length of an optimal L-restricted ABPC View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A customizable collaborative virtual environment on the Web

    Publication Year: 1999 , Page(s): 328 - 335
    Cited by:  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (120 KB)  

    Collaborative virtual environments (CVE) support the collaboration, communication and social interaction among users in virtual spaces. We present a customizable CVE implemented on top of the Web that allows for the creation of various interactive environments. This CVE is designed mainly, from the perspective of the desk, rooms and hall metaphors. The architecture of our CVE is based on design patterns and integrates a group of tools in order to support interaction among users in both synchronous and asynchronous ways View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient method for in memory construction of suffix arrays

    Publication Year: 1999 , Page(s): 81 - 88
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (272 KB)  

    The suffix array is a string-indexing structure and a memory efficient alternative to the suffix tree. It has many advantages for text processing. We propose an efficient algorithm for sorting suffixes. We call this algorithm the two-stage suffix sort. One of our ideas is to exploit the specific relationships between adjacent suffixes. Our algorithm makes it possible to use the suffix array for much larger texts and suggests new areas of application. Our experiments on several text data sets (including 514-MB Japanese newspapers) demonstrate that our algorithm is 4.5 to 6.9 times faster than Quicksort, and 2.5 to 3.6 times faster than K. Sadakane's (1998) algorithm, which is considered to be the fastest algorithm in previous work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Top-down extraction of semi-structured data

    Publication Year: 1999 , Page(s): 176 - 183
    Cited by:  Papers (2)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (172 KB)  

    We propose an innovative approach to extracting semi-structured data from Web sources. The idea is to collect a couple of example objects from the user and to use this information to extract new objects from new pages or texts. We propose a top-down strategy that extracts complex objects, decomposing them in objects less complex, until atomic objects have been extracted. Through experimentation, we demonstrate that with a small number of given examples, our strategy is able to extract most of the objects present in a Web source given as input View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Approximate retrieval from multimedia databases using relevance feedback

    Publication Year: 1999 , Page(s): 215 - 223
    Cited by:  Papers (1)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    We address the problem of retrieving stored multimedia presentations using relevance feedback. We model multimedia presentations using a crisp relational or object oriented database, augmented with a text attribute. We also introduce a language for retrieval by content from such databases. The language is based on fuzzy logic. We also introduce a method for query refinement that uses relevance feedback provided by the user View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cross-domain approximate string matching

    Publication Year: 1999 , Page(s): 120 - 127
    Cited by:  Papers (1)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    Approximate string matching is an important paradigm in domains ranging from speech recognition to information retrieval and molecular biology. We introduce a new formalism for a class of applications that takes two strings as input, each specified in terms of a particular domain, and performs a comparison motivated by constraints derived from a third, possibly different domain. This issue arises, for example, when searching multimedia databases built using imperfect recognition technologies (e.g., speech, optical character, and handwriting recognition). We present a polynomial time algorithm for solving the problem, and describe several variations that can also be solved efficiently View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Flexible communication support for CSCW applications

    Publication Year: 1999 , Page(s): 338 - 342
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    Modern computer-supported cooperative work applications (CSCW) supporting same-time/different-place interaction are required to open several communication channels. Each of these channels has its own quality of service (QoS) and is implemented by a specific protocol stack. Typically, these channels need to be synchronized but inter-stack dependencies are hard to express with current communication architectures. The paper proposes a novel approach to the development of communication software supporting a style of micro-protocol composition that satisfies the requirements imposed by CSCW applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast distributed suffix array generation algorithm

    Publication Year: 1999 , Page(s): 97 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (256 KB)  

    We present a distributed algorithm for suffix array generation, based on the sequential algorithm of U. Manber and E. Myers (1993). The sequential algorithm is O(nlogn) in the worst case and O(nloglogn) on average, where n is the text size. Using p processors connected through a high bandwidth network, we obtain O((n/p)loglogn) average time, which is an almost optimal speedup. Unlike previous algorithms, the text is not transmitted through the network and hence the messages exchanged are much smaller. We present some experimental evidence to show that the new algorithm can be faster than the sequential Manber & Myers counterpart View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • X-tract: structure extraction from botanical textual descriptions

    Publication Year: 1999 , Page(s): 2 - 7
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB)  

    Most available information today, both from printed books and digital repositories, is in the form of free-format texts. The task of retrieving information from these ever-growing repositories has become a challenge for information retrieval (IR) researchers. In some fields, such as botany and taxonomy, textual descriptions observe a set of rules and use a relatively limited vocabulary. This makes botanical textual descriptions an interesting area to explore IR techniques for finding structure and facilitating semantic analysis. This paper presents X-tract, a solution to the problem of text analysis and structure extraction in a specific application domain, namely floristic morphologic descriptions. The solution demonstrates the potential of using a grammar in the determination of information structure in a botanical digital library. We have developed a prototype based on this approach in which given an HTML or plain text, X-tract analyzes it and presents results to the user so he or she can verify the proposed structure before updating the database. This transformation is useful also in the process of storing morphologic descriptions in a database with a preestablished format. The solution is implemented in the context of the Floristic Digital Library (FDL), a large digital library project comprising a wide variety of botanical documents, formats and services View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CoBWeb-a crawler for the Brazilian Web

    Publication Year: 1999 , Page(s): 184 - 191
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (84 KB)  

    One of the key components of current Web search engines is the document collector. The paper describes CoBWeb, an automatic document collector whose architecture is distributed and highly scalable. CoBWeb aims at collecting large amounts of documents per time period while observing operational and ethical limits in the crawling process. CoBWeb is part of the SIAM (Information Systems in Mobile Computing Environments) search engine which is being implemented to support the Brazilian Web. Thus, several results related to the Brazilian Web are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bounds for parametric sequence comparison

    Publication Year: 1999 , Page(s): 55 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (188 KB)  

    We consider the problem of computing a global alignment between two or more sequences subject to varying mismatch and indel penalties. We prove a tight 3(n/2π)2/3+O(n1/3logn) bound on the worst-case number of distinct optimum alignments for two sequences of length n as the parameters are varied. This refines a O(n 2/3) upper bound by D. Gusfield et al. (1994). Our lower bound requires an unbounded alphabet. For strings over a binary alphabet, we prove a Ω(n1/2) lower bound. For the parametric global alignment of k⩾2 sequences under sum-of-pairs scoring, we prove a 3((k/2)n/2π)2/3+O(k2/3n1/3logn) upper bound on the number of distinct optimality regions and a Ω(n 2/3) lower bound. Based on experimental evidence, we conjecture that for two random sequences, the number of optimality regions is approximately √n with high probability View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Near optimal multiple sequence alignments using a traveling salesman problem approach

    Publication Year: 1999 , Page(s): 105 - 114
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (212 KB)  

    We present a new method for the calculation of multiple sequence alignments (MSAs). The input to our problem are n protein sequences. We assume that the sequences are related with each other and that there exists some unknown evolutionary tree that corresponds to the MSA. One advantage of our method is that the scoring can be done with reference to this phylogenetic tree, even though the tree structure itself may remain unknown. Instead of computing an evolutionary tree, we only need to compute a circular tour of the tree which is determined via a traveling salesman problem (TSP) algorithm. Our algorithm can calculate a near optimal MSA and has a performance guarantee of n-1/n.opt (where opt is the optimal score of the MSA). The algorithm runs in O(k2 n2) time, where k is the length of the longest input sequence. From there, we improve the alignment further. Experimental results are shown at the end View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Emotional awareness in collaborative systems

    Publication Year: 1999 , Page(s): 296 - 303
    Cited by:  Papers (11)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (152 KB)  

    Emotions play an important role in human interaction. Both, our own emotional state and our perception of that of others with which we collaborate influence the outcome of cooperative work. With the growing interest in providing computational support for the recognition and representation of emotions, there is a clear interest in adding such facilities to groupware systems and to evaluate the positive and negative effects of using this additional channel of communication. We discuss the issues involved in supporting a new type of collaborative awareness in groupware, namely, emotional awareness. We also present two emotion-based sample applications, and discussion to further motivate work in this area within the collaborative community View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear time sorting of skewed distributions

    Publication Year: 1999 , Page(s): 135 - 140
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (84 KB)  

    The article presents an efficient linear average time algorithm to sort lists of integers that follow skewed distributions. It also studies a particular case where the list follows Zipf's distribution, and presents an example application where the algorithm is used to reduce the time to build word-based Huffman codes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient uniform-cost normalized edit distance algorithm

    Publication Year: 1999 , Page(s): 8 - 15
    Cited by:  Papers (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    A common model for computing the similarity of two strings X and Y of lengths m, and n respectively with m⩾n, is to transform X into Y through a sequence of three types of edit operations: insertion, deletion, and substitution. The model assumes a given cost function which assigns a non-negative real weight to each edit operation. The amortized weight for a given edit sequence is the ratio of its weight to its length, and the minimum of this ratio over all edit sequences is the normalized edit distance. Existing algorithms for normalized edit distance computation with proven complexity bounds require O(mn2 ) time in the worst-case. We give an O(mn log n)-time algorithm for the problem when the cost function is uniform, i.e., the weight of each edit operation is constant within the same type, except substitutions can have different weights depending on whether they are matching or non-matching View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Problems related to subsequences and supersequences

    Publication Year: 1999 , Page(s): 199 - 205
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (108 KB)  

    We present an algorithm for building the automaton that searches for all non-overlapping occurrences of each subsequence from the set of subsequences. Further, we define Directed Acyclic Supersequence Graph and use it to solve the generalized Shortest Common Supersequence problem, the Longest Common Non-Supersequence problem, and the Longest Consistent Supersequence problem View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Searching in metric spaces by spatial approximation

    Publication Year: 1999 , Page(s): 141 - 148
    Cited by:  Papers (8)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    We propose a novel data structure to search in metric spaces. A metric space is formed by a collection of objects and a distance function defined among them, which satisfies the triangular inequality. The goal is, given a set of objects and a query, retrieve those objects close enough to the query. The number of distances computed to achieve this goal is the complexity measure. Our data structure, called sa-tree (“spatial approximation tree”), is based on approaching spatially the searched objects. We analyze our method and show that the number of distance evaluations to search among n objects is o(n). We show experimentally that the sa-tree is the best existing technique when the metric space is high-dimensional or the query has low selectivity. These are the most difficult cases in real applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A method of describing document contents through topic selection

    Publication Year: 1999 , Page(s): 73 - 80
    Cited by:  Papers (2)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB)  

    Given a large hierarchical dictionary of concepts, the task of selection of the concepts that describe the contents of a given document is considered. The problem consists in proper handling of the top-level concepts in the hierarchy. As a representation of the document, a histogram of the topics with their respective contribution in the document is used. The contribution is determined by comparison of the document with the “ideal” document for each topic in the dictionary. The “ideal” document for a concept is one that contains only the keywords belonging to this concept, in proportion to their occurrences in the training corpus. A fast algorithm of comparison for some types of metrics is proposed. The application of the method in a system classifier is discussed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using AulaNet for Web-based course development

    Publication Year: 1999 , Page(s): 322 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (40 KB)  

    A variety of technologies are being used to replace or supplement the face-to-face learning process, including the World Wide Web. We present AulaNet, an environment for creating, updating and attending Web-based courses. We illustrate some dynamics of three experiments of course development and delivery with AulaNet, pointing out their features. We discuss how easy and how difficult it is to orchestrate technology for educational purposes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Effects of term segmentation on Chinese/English cross-language information retrieval

    Publication Year: 1999 , Page(s): 149 - 157
    Cited by:  Patents (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (176 KB)  

    The majority of recent Cross-Language Information Retrieval (CLIR) research has focused on European languages. CLIR problems that involve East Asian languages such as Chinese introduce additional challenges, because written Chinese texts lack boundaries between terms. The paper examines three Chinese segmentation techniques in combination with two variants of dictionary-based Chinese to English query translation. The results indicate that failure to segment terms, particularly technical terms and names, can have a cascading effect that reduces retrieval effectiveness. Task-tuned segmentation algorithms and alternative term weighting strategies are suggested as productive directions for future work View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design patterns for collaborative systems

    Publication Year: 1999 , Page(s): 270 - 277
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (52 KB)  

    Collaborative applications provide a group of users with the facility to communicate and share data in a coordinated way. We propose a pattern system to design the basic aspects of data sharing, communication and coordination for collaborative applications. These patterns are useful for the design and development of collaborative applications as well as for the development of platforms for the construction of collaborative applications View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrency and recovery in full-text indexing

    Publication Year: 1999 , Page(s): 192 - 198
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (92 KB)  

    An important feature of a document database system is that the documents can be retrieved by searching for words from their contents. In a full-text index, each word of the stored documents can be used as a search key. Inserting a new document into the database automatically triggers a transaction that inserts the words together with their occurrence information into the index. We present solutions to problems that arise when full-text indexing is applied for constantly changing document data, such as WWW pages. We present and analyze an algorithm for full-text indexing with the following properties: concurrent searches are possible and efficient, and the algorithm can be designed such that several indexing processes can be performed concurrently. Moreover, the algorithm allows efficient recovery of the index after failures that can occur while the index is modified. This is important for large indices, because when not prepared for failures, the index may need to be reconstructed from original documents View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.