By Topic

Computers, IEEE Transactions on

Issue 8 • Date Aug. 2005

Filter Results

Displaying Results 1 - 13 of 13
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (158 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB)  
    Freely Available from IEEE
  • Pattern matching in LZW compressed files

    Page(s): 929 - 938
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (760 KB) |  | HTML iconHTML  

    Compressed pattern matching is an emerging research area that addresses the following problem: Given a text file in compressed format and a pattern, report the occurrence(s) of the pattern in the file with minimal (or no) decompression. In this paper, we report our work on compressed pattern matching in LZW compressed files. The work includes an extension of Amir et al.'s well-known "almost-optimal" algorithm. The original Amir et al.'s algorithm has been improved to search not only the first occurrence of the pattern but also all other occurrences. A faster implementation for so-called "simple patterns" is also proposed. The work also includes a novel multiple-pattern matching algorithm using the Aho-Corasick algorithm. The algorithm takes O(mt+n+r) time with O(mt) extra space, where n is the size of the compressed file, m is the total length of all patterns, t is the size of the LZW trie, and r is the number of occurrences of the patterns. Extensive experiments have been conducted to test the performance of our algorithms and to compare with other well-known compressed pattern matching algorithms, particularly the BWT-based algorithms and another similar multiple-pattern matching algorithm by Kida et al. that also uses the Aho-Corasick algorithm on the LZW compressed data. The results showed that our multiple-pattern matching algorithm is competitive among the best compressed pattern-matching algorithms and is practically the fastest among all approaches when the number of patterns is not very large. Therefore, our algorithm is preferable for general string matching applications. The proposed algorithm is efficient for large files and it is particularly efficient when being applied on archive search if the archives are compressed with a common LZW trie. LZW is one of the most efficient and popular compression algorithms used extensively and our method requires no modification on the compression algorithm. The work reported in this paper, therefore, has great economic and market potential. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Making LRU friendly to weak locality workloads: a novel replacement algorithm to improve buffer cache performance

    Page(s): 939 - 952
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1096 KB) |  | HTML iconHTML  

    Although the LRU replacement algorithm has been widely used in buffer cache management, it is well-known for its inability to cope with access patterns with weak locality. Previously proposed algorithms to improve LRU greatly increase complexity and/or cannot provide consistently improved performance. Some of the algorithms only address LRU problems on certain specific and predefined cases. Motivated by the limitations of existing algorithms, we propose a general and efficient replacement algorithm, called Low Inter-reference Recency Set (LIRS). LIRS effectively addresses the limitations of LRU by using recency to evaluate Inter-Reference Recency (IRR) of accessed blocks for making a replacement decision. This is in contrast to what LRU does: directly using recency to predict the next reference time. Meanwhile, LIRS mostly retains the simple assumption adopted by LRU for predicting future block access behaviors. Conducting simulations with a variety of traces of different access patterns and with a wide range of cache sizes, we show that LIRS significantly outperforms LRU and outperforms other existing replacement algorithms in most cases. Furthermore, we show that the additional cost for implementing LIRS is trivial in comparison with that of LRU. We also show that the LIRS algorithm can be extended into a family of replacement algorithms, in which LRU is a special member. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • High-speed multioperand decimal adders

    Page(s): 953 - 963
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1240 KB) |  | HTML iconHTML  

    There is increasing interest in hardware support for decimal arithmetic as a result of recent growth in commercial, financial, and Internet-based applications. Consequently, new specifications for decimal floating-point arithmetic have been added to the draft revision of the IEEE-754 Standard for floating-point arithmetic. This paper introduces and analyzes three techniques for performing fast decimal addition on multiple binary coded decimal (BCD) operands. Two of the techniques speculate BCD correction values and correct intermediate results while adding the input operands. The first speculates over one addition. The second speculates over two additions. The third technique uses a binary carry-save adder tree and produces a binary sum. Combinational logic is then used to correct the sum and determine the carry into the next more significant digit. Multioperand adder designs are constructed and synthesized for four to 16 input operands. Analyses are performed on the synthesis results and the merits of each technique are discussed. Finally, these techniques are compared to several previous techniques for high-speed decimal addition. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Robust processing rate allocation for proportional slowdown differentiation on Internet servers

    Page(s): 964 - 977
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1216 KB) |  | HTML iconHTML  

    A desirable behavior of an Internet server is that a request's queuing delay depends on its service time in a linear fashion. Measuring the quality of service in terms of slowdown, the ratio of a request's queuing delay to its service time, provides a simple way to attain the objective. Moreover, it treats client requests equally regardless of their service time, whereas response time favors requests that need more processing resources. In this paper, we propose a proportional slowdown differentiation (PSD) service model on Internet servers. It aims to maintain prespecified slowdown ratios between different classes of client requests. To provide PSD services, we first derive a closed-form expression of the expected slowdown in an M/G/1 FCFS queuing system with a typical heavy-tailed service time distribution, the bounded Pareto distribution. Based on the closed-form expression, we design a queuing-theoretic strategy of processing-rate allocation. The rate allocation is realized by deploying a virtual server for each class. Simulation results show that the strategy can provide controllable PSD services on Internet servers. It, however, comes along with large variance and weak predictability due to the dynamics of Internet traffic. To address these issues, we design an integral feedback controller and integrate it into the queuing-theoretic strategy. Simulation results demonstrate that the integrated strategy is robust and can deliver predictable PSD services at a superior fine-grained level. We modified the Apache Web server with an implementation of the integrated processing-rate allocation strategy. Experimental results further demonstrate its effectiveness and feasibility in practice. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A distributed coverage- and connectivity-centric technique for selecting active nodes in wireless sensor networks

    Page(s): 978 - 991
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (896 KB) |  | HTML iconHTML  

    Due to their low cost and small form factors, a large number of sensor nodes can be deployed in redundant fashion in dense sensor networks. The availability of redundant nodes increases network lifetime as well as network fault tolerance. It is, however, undesirable to keep all the sensor nodes active at all times for sensing and communication. An excessive number of active nodes lead to higher energy consumption and it places more demand on the limited network bandwidth. We present an efficient technique for the selection of active sensor nodes in dense sensor networks. The active node selection procedure is aimed at providing the highest possible coverage of the sensor field, i.e., the surveillance area. It also assures network connectivity for routing and information dissemination. We first show that the coverage-centric active nodes selection problem is NP-complete. We then present a distributed approach based on the concept of a connected dominating set (CDS). We prove that the set of active nodes selected by our approach provides full coverage and connectivity. We also describe an optimal coverage-centric centralized approach based on integer linear programming. We present simulation results obtained using an ns2 implementation of the proposed technique. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient basis conversion algorithm for composite fields with given representations

    Page(s): 992 - 997
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB) |  | HTML iconHTML  

    We describe an efficient method for constructing the basis conversion matrix between two given finite field representations where one is composite. We are motivated by the fact that using certain representations, e.g., low-Hamming weight polynomial or composite field representations, permits arithmetic operations such as multiplication and inversion to be computed more efficiently. An earlier work by Paar defines the conversion problem and outlines an exponential time algorithm that requires an exhaustive search in the field. Another algorithm by Sunar et al. provides a polynomial time algorithm for the limited case where the second representation is constructed (rather than initially given). The algorithm we present facilitates existing factorization algorithms and provides a randomized polynomial time algorithm to solve the basis conversion problem where the two representations are initially given. We also adapt a fast trace-based factorization algorithm to work in the composite field setting which yields a subcubic complexity algorithm for the construction of the basis conversion matrix. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Partitioning variables across register windows to reduce spill code in a low-power processor

    Page(s): 998 - 1012
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1576 KB) |  | HTML iconHTML  

    Low-power embedded processors utilize compact instruction encodings to achieve small code size. Such encodings place tight restrictions on the number of bits available to encode operand specifiers and, thus, on the number of architected registers. As a result, performance and power are often sacrificed as the burden of operand supply is shifted from the register file to the memory due to the limited number of registers. In this paper, we investigate the use of a windowed register file to address this problem by providing more registers than allowed in the encoding. The registers are organized as a set of identical register windows where, at each point in the execution, there is a single active window. Special window management instructions are used to change the active window and to transfer values between windows. This design gives the appearance of a large register file without compromising the instruction encoding. To support the windowed register file, we designed and implemented a graph partitioning-based compiler algorithm that partitions program variables and temporaries referenced within a procedure across multiple windows. On a 16-bit embedded processor, an average of 11 percent improvement in application performance and 25 percent reduction in system power was achieved as an 8-register design was scaled from one to two windows. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized broadcast protocol for sensor networks

    Page(s): 1013 - 1024
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (856 KB) |  | HTML iconHTML  

    Sensor networks usually operate under very severe energy restrictions. Therefore, sensor communications should consume the minimum possible amount of energy. White broadcasting is a very energy-expensive protocol, it is also widely used as a building block for a variety of other network layer protocols. Therefore, reducing the energy consumption by optimizing broadcasting is a major improvement in sensor networking. In this paper, we propose an optimized broadcast protocol for sensor networks (BPS). The major novelty of BPS is its adaptive-geometric approach that enables considerable reduction of retransmissions by maximizing each hop length. BPS adapts itself and gets the best out of existing radio conditions. In BPS, nodes do not need any neighborhood information, which leads to low communication and memory overhead. We analyze the worst-case scenario for BPS and show that the number of transmissions in such a scenario is a constant multiple of those required in the ideal case. Our simulation results show that BPS is very scalable with respect to network density. BPS is also resilient to transmission errors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation and design trade-offs for network-on-chip interconnect architectures

    Page(s): 1025 - 1040
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2200 KB) |  | HTML iconHTML  

    Multiprocessor system-on-chip (MP-SoC) platforms are emerging as an important trend for SoC design. Power and wire design constraints are forcing the adoption of new design methodologies for system-on-chip (SoC), namely, those that incorporate modularity and explicit parallelism. To enable these MP-SoC platforms, researchers have recently pursued scaleable communication-centric interconnect fabrics, such as networks-on-chip (NoC), which possess many features that are particularly attractive for these. These communication-centric interconnect fabrics are characterized by different trade-offs with regard to latency, throughput, energy dissipation, and silicon area requirements. In this paper, we develop a consistent and meaningful evaluation methodology to compare the performance and characteristics of a variety of NoC architectures. We also explore design trade-offs that characterize the NoC approach and obtain comparative results for a number of common NoC topologies. To the best of our knowledge, this is the first effort in characterizing different NoC architectures with respect to their performance and design trade-offs. To further illustrate our evaluation methodology, we map a typical multiprocessing platform to different NoC interconnect architectures and show how the system performance is affected by these design trade-offs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (90 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (158 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org