By Topic

Computers, IEEE Transactions on

Issue 4 • Date April 2007

Filter Results

Displaying Results 1 - 13 of 13
  • ARQ Protocols and Unidirectional Codes

    Page(s): 433 - 443
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1659 KB) |  | HTML iconHTML  

    Forward error control (FEC) and automatic-repeat request (ARQ) is two main techniques used for reliable data transmission in computer and communication systems. In this paper, some simple, low cost error control techniques for ARQ protocols used with binary unidirectional channels are described. The proposed schemes can correct up to [t/2] unidirectional errors using t-unidirectional error detecting codes and code combining with a much smaller number of retransmissions. First, we show how to do code combining for unidirectional errors. To use code combining with unidirectional codes, we need to identify the type of error (0rarr1 or 1rarr0) from the received word. We show how this can be done for various unidirectional codes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Voltage Scaling in Multitier Web Servers with End-to-End Delay Control

    Page(s): 444 - 458
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2104 KB) |  | HTML iconHTML  

    The energy and cooling costs of Web server farms are among their main financial expenditures. This paper explores the benefits of dynamic voltage scaling (DVS) for power management in server farms. Unlike previous work, which addressed DVS on individual servers and on load-balanced server replicas, this paper addresses DVS in multistage service pipelines. Contemporary Web server installations typically adopt a three-tier architecture in which the first tier presents a Web interface, the second executes scripts that implement business logic, and the third serves database accesses. From a user's perspective, only the end-to-end response across the entire pipeline is relevant. This paper presents a rigorous optimization methodology and an algorithm for minimizing the total energy expenditure of the multistage pipeline subject to soft end-to-end response-time constraints. A distributed power management service is designed and evaluated on a real three-tier server prototype for coordinating DVS settings in a way that minimizes global energy consumption while meeting end-to-end delay constraints. The service is shown to consume as much as 30 percent less energy compared to the default (Linux) energy saving policy View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TPF: TCP Plugged File System for Efficient Data Delivery over TCP

    Page(s): 459 - 473
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3242 KB) |  | HTML iconHTML  

    Most Internet services, including Web, FTP, and streaming, have been realized on top of TCP, which is the de facto protocol for data delivery over the Internet. Therefore, in order to achieve high-performance data delivery over TCP, we thoroughly analyze TCP-based data delivery and identify three critical mismatches in a general file system design while-supplying data to TCP. The first is the frequent sleeping/waking up of a server process that accompanies excessive context switching overheads due to processing TCP data and ACK segments in different contexts. The second is the inefficient uniform data prefetching for TCP connections, irrespective of their characteristics such as bandwidth, latency, and the status of a send buffer. The third is inefficient disk access due to the ignorance of abrupt changes in TCP connections. As a remedy to these mismatches, we newly design a TCP-plugged file system (TPF) which is comprised of three novel mechanisms, each of which relieves the identified mismatches, integrated data sending routine, TCP aware data prefetching, and eager disk request cancellation. With these mechanisms, TPF becomes capable of supplying data managed by a file system to TCP connections timely and seamlessly and becomes reactive to abrupt changes in TCP connections. As a consequence, TPF provides minimal context switching overhead, high buffer utilization, and highly effective disk access utilization. We have implemented and tested the mechanisms in Linux 2.4. The experimental results show that the number of context switching is reduced by up to 40 percent and the overall system performance is improved by 3-34 percent View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Grid-Oriented Storage: A Single-Image, Cross-Domain, High-Bandwidth Architecture

    Page(s): 474 - 487
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3585 KB) |  | HTML iconHTML  

    This paper describes the grid-oriented storage (GOS) architecture and its implementations. A GOS-specific file system (GOS-FS), the single-purpose intent of a GOS OS, and secure interfaces via grid security infrastructure (GSI) motivate and enable this new architecture. As an FTP server, GOS with a slimmed OS, with a total volume of around 150 MB, outperforms the standard GridFTP by 20-40 percent. As a file server, GOS-FS acts as a network/grid interface, enabling a user to perform searches and access resources without downloading them locally. In the real-world tests between Cambridge and Beijing, where the transfer distance is 10,000 km, the multistreamed GOS-FS file opening/saving resulted in a remarkable performance increase of about 2-25 times, compared to the single-streamed network file system (NFSv4). GOS is expected to be a variant of or successor to the well-used network-attached storage (NAS) and/or storage area network (SAN) products in the grid era View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PABC: Power-Aware Buffer Cache Management for Low Power Consumption

    Page(s): 488 - 501
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4770 KB) |  | HTML iconHTML  

    Power consumed by memory systems becomes a serious issue as the size of the memory installed increases. With various low power modes that can be applied to each memory unit, the operating system can reduce the number of active memory units by collocating active pages onto a few memory units. This paper presents a memory management scheme based on this observation, which differs from other approaches in that all of the memory space is considered, while previous methods deal only with pages mapped to user address spaces. The buffer cache usually takes more than half of the total memory and the pages access patterns are different from those in user address spaces. Based on an analysis of buffer cache behavior and its interaction with the user space, our scheme achieves up to 63 percent more power reduction. Migrating a page to a different memory unit increases memory latencies, but it is shown to reduce the power consumed by an additional 4.4 percent View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low Diameter Interconnections for Routing in High-Performance Parallel Systems

    Page(s): 502 - 510
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3472 KB) |  | HTML iconHTML  

    A new class of low diameter interconnections (LDI) is proposed for high-performance computer systems that are augmented with circuit switching networks. In these systems, the network is configured to match the communication patterns of applications, when these patterns exhibit temporal locality, and to embed a logical topology to route traffic that does not exhibit locality. The new LDI topology is a surprisingly simple directed graph which minimizes the network diameter for a given node degree and number of nodes. It can be easily embedded in circuit switching networks to route random traffic with high bandwidth and low latency View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Formal Verification of Simulation Traces Using Computation Slicing

    Page(s): 511 - 527
    Multimedia
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2210 KB) |  | HTML iconHTML  

    Concurrent and distributed systems, such as system-on-chips (SoCs), present an immense challenge for verification due to their complexity and inherent concurrency. Traditional approaches for eliminating errors in concurrent and distributed systems include formal methods and simulation. We present an approach toward combining formal methods and simulation in a technique called predicate detection (aka runtime verification), while avoiding the complexity of formal methods and the pitfalls of ad hoc simulation. Our technique enables efficient formal verification on execution traces of actual scalable systems. Traditional simulation methodologies are woefully inadequate in the presence of concurrency and subtle synchronization. The bug in the system may appear only when the ordering of concurrent events is different from the ordering in the simulation trace. We use a partial order trace model rather than the traditional total order trace model and we get the benefit of properly dealing with concurrent events and especially of detecting errors from analyzing successful total order traces. Surprisingly, checking properties, even on a finite partial order trace, is NP-complete in the size of the trace description (aka state-explosion problem). Our approach to ameliorating state explosion in partial order trace model uses two techniques: 1) slicing and 2) exploiting the structure of the property itself-by imposing restrictions-to evaluate its value efficiently for a given execution trace. Intuitively, the slice of a trace with respect to a property is a subtrace that contains all of the global states of the trace that satisfy the property such that it is computed efficiently (without traversing the state space) and represented concisely (without explicit representation of individual states). We present temporal slicing algorithms with respect to properties in temporal logic RCTL+. We show how to use the slicing algorithms for efficient predicate detection of design properti- - es. We have developed a prototype system, partial order trace analyzer (POTA), which implements our algorithms. We verify several scalable and industrial protocols, including CORBA's general inter-ORB protocol, PCI-based system-on-chip, ISO's asynchronous transfer mode ring, cache coherence, and mutual exclusion. Our experimental results indicate that slicing can lead to exponential reduction over existing techniques, such as the ones in SPIN model checker, both in time and space View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Properties Incompleteness Evaluation by Functional Verification

    Page(s): 528 - 544
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2534 KB) |  | HTML iconHTML  

    Verification engineers cannot guarantee the correctness of the system implementation by model checking if the set of proven properties is incomplete. However, the use of model checking lacks widely accepted coverage metrics to evaluate the property completeness. The already existing metrics are based on time-consuming formal approaches that cannot be efficiently applied to medium/large systems. In this context, the paper proposes a coverage methodology based on a combination of static and dynamic verification that allows us to reduce the evaluation time with respect to pure formal approaches. The joining point between static and dynamic verification is represented by a fault model targeting functional descriptions. Functional fault simulation and dynamic automatic test pattern generation are used to quickly estimate the capability of properties in detecting functional faults. This provides a first estimation of the property completeness. Then, if necessary, model checking is used to complete the analysis, avoiding the underestimation of the property coverage that can be obtained due to the lack of exhaustiveness of dynamic verification. The proposed approach is theoretically founded and its effectiveness is compared with already existing techniques. In addition, experimental results to confirm the theoretical results are provided View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data-Independent Pattern Run-Length Compression for Testing Embedded Cores in SoCs

    Page(s): 545 - 556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2646 KB) |  | HTML iconHTML  

    This paper presents a new compression technique for testing the intellectual property (IP) cores in system-on-chips. The pattern run-length compression applies the well-known run-length coding to equal and complementary consecutive patterns of the precomputed test data. No structural information of the IP cores is required by the encoding procedure. A data-independent decompressor can be realized by the embedded processor or on-chip circuitry. The decompressed test set can be flexibly applied to a single-scan or multiple-scan chain of each core-under-test. Experiments on ISCAS-89 benchmarks show that the new technique results in superior compression performance. The test application time is also significantly reduced View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigured Scan Forest for Test Application Cost, Test Data Volume, and Test Power Reduction

    Page(s): 557 - 562
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2283 KB) |  | HTML iconHTML  

    A new scan architecture called reconfigured scan forest is proposed for cost-effective scan testing. Multiple scan flip-flops can be grouped based on structural analysis that avoids new untestable faults due to new reconvergent fanouts. The proposed new scan architecture allows only a few scan flip-flops to be connected to the XOR trees. The size of the XOR trees can be greatly reduced compared with the original scan forest; therefore, area overhead and routing complexity can be greatly reduced. It is shown that test application cost, test data volume, and test power with the proposed scan forest architecture can be greatly reduced compared with the conventional full scan design with a single scan chain and several recent scan testing methods View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modified Low-Density MDS Array Codes for Tolerating Double Disk Failures in Disk Arrays

    Page(s): 563 - 566
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1085 KB) |  | HTML iconHTML  

    In this paper, we present a new class of low-density MDS array codes for tolerating double disk failures in disk arrays. The proposed MDS array code has lower encoding and decoding complexity than the EVENODD code of Blaum et al View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Bit-Width Optimization Methodology for Polynomial-Based Function Evaluation

    Page(s): 567 - 571
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1173 KB) |  | HTML iconHTML  

    We present an automated bit-width optimization methodology for polynomial-based hardware function evaluation. Due to the analytical nature of the approach, overflow protection and precision accurate to one unit in the last place (ulp) can be guaranteed. A range analysis technique based on computing the root of the derivative of a signal is utilized to determine the minimal number of integer bits. Fractional bit requirements are established using an analytical error expression derived from the functions that occur along the data path. Global fractional bit optimization across multiple computation stages is performed using simulated annealing and circuit area estimation functions View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Modulo 2^{n} - (2^{n - 2} + 1) Addition: A New Class of Adder for RNS

    Page(s): 572 - 576
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (907 KB) |  | HTML iconHTML  

    Efficient modular adder architectures are invaluable to the design of residue number system (RNS)-based digital systems. For example, they are used to perform residue encoding and decoding, modular multiplication, and scaling. This work is a first in the literature on modulo 2n-(2n-2+1) addition. The algebraic properties of such moduli are exploited in the derivation of the proposed fast adder architecture. Actual VLSI implementations using 130 mm CMOS technology show that our adder significantly outperforms the most competitive generic modular adder design over the entirety of the power-delay-area space View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au