By Topic

Computers, IEEE Transactions on

Issue 7 • Date July 2004

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2004 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (143 KB)  
    Freely Available from IEEE
  • [Cover 2]

    Publication Year: 2004 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (75 KB)  
    Freely Available from IEEE
  • Fair bandwidth allocation for multicasting in networks with discrete feasible set

    Publication Year: 2004 , Page(s): 785 - 797
    Cited by:  Papers (10)  |  Patents (4)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (664 KB) |  | HTML iconHTML  

    We study fairness in allocating bandwidth for loss-tolerant real-time multicast applications. We assume that the traffic is encoded in several layers so that the network can adapt to the available bandwidth and receiver processing capabilities by varying the number of layers delivered. We consider the case where receivers cannot subscribe to fractional layers. Therefore, the network can allocate only a discrete set of bandwidth to a receiver, whereas a continuous set of rates can be allocated when receivers can subscribe to fractional layers. Fairness issues differ vastly in these two different cases. Computation of lexicographic optimal rate allocation becomes NP-hard in this case, while lexicographic optimal rate allocation is polynomial complexity computable when fractional layers can be allocated. Furthermore, maxmin fair rate vector may not exist in this case. We introduce a new notion of fairness, maximal fairness. Even though maximal fairness is a weaker notion of fairness, it has many intuitively appealing fairness properties. For example, it coincides with lexicographic optimally and maxmin fairness, when maxmin fair rate allocation exists. We propose a polynomial complexity algorithm for computation of maximally fair rates allocated to various source-destination pairs, which incidentally computes the maxmin fair rate allocation, when the latter exists. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and analysis of a self-timed duplex communication system

    Publication Year: 2004 , Page(s): 798 - 814
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1728 KB) |  | HTML iconHTML  

    Communication-centric design is a key paradigm for systems-on-chips (SoCs), where most computing blocks are predesigned IP cores. Due to the problems with distributing a clock across a large die, future system designs are more asynchronous or self-timed. For portable, battery-run applications, power and pin efficiency is an important property of a communication system where the cost of a signal transition on a global interconnect is much greater than for internal wires in logic blocks. We address this issue by designing an asynchronous communication system aimed at power and pin efficiency. Another important issue of SoC design is design productivity. It demands new methods and tools, particularly for designing communication protocols and interconnects. The design of a self-timed communication system is approached employing formal techniques supported by verification and synthesis tools. The protocol is formally specified and verified with respect to deadlock-freedom and delay-insensitivity using a Petri-net-based model-checking tool. A protocol controller has been synthesized by a direct mapping of the Petri net model derived from the protocol specification. The logic implementation was analyzed using the Cadence toolkit. The results of SPICE simulation show the advantages of the direct mapping method compared to logic synthesis. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the implementation of unreliable failure detectors in partially synchronous systems

    Publication Year: 2004 , Page(s): 815 - 828
    Cited by:  Papers (12)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (600 KB) |  | HTML iconHTML  

    Unreliable failure detectors were proposed by Chandra and Toueg as mechanisms that provide information about process failures. Chandra and Toueg defined eight classes of failure detectors, depending on how accurate this information is, and presented an algorithm implementing a failure detector of one of these classes in a partially synchronous system. This algorithm is based on all-to-all communication and periodically exchanges a number of messages that is quadratic on the number of processes. We study the implementability of different classes of failure detectors in several models of partial synchrony. We first show that no failure detector with perpetual accuracy (namely, P, Q, S, and W) can be implemented in these models in systems with even a single failure. We also show that, in these models of partial synchrony, it is necessary a majority of correct processes to implement a failure detector of the class θ proposed by Aguilera et al. Then, we present a family of distributed algorithms that implement the four classes of unreliable failure detectors with eventual accuracy (namely, P, Q, S, and W). Our algorithms are based on a logical ring arrangement of the processes, which defines the monitoring and failure information propagation pattern. The resulting algorithms periodically exchange at most a linear number of messages. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CeRA: a router for symmetrical FPGAs based on exact routing density evaluation

    Publication Year: 2004 , Page(s): 829 - 842
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2248 KB) |  | HTML iconHTML  

    We present a new performance and routability driven routing algorithm for symmetrical array-based field-programmable gate arrays (FPGAs). A key contribution of our work is the overcoming of one essential limitation of the previous routing algorithms: inaccurate estimations of routing density that were too general for symmetrical FPGAs. To this end, we formulate an exact routing density calculation that is based on a precise analysis of the structure (switch block) of symmetrical FPGAs and utilize it consistently in global and detailed routings. With an introduction to the proposed accurate routing metrics, we describe a new routing algorithm, called cost-effective net-decomposition-based routing, which is fast and yet produces remarkable routing results in terms of both routability and net/path delays. We performed extensive experiments to show the effectiveness of our algorithm based on the proposed cost metrics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and optimization of large size and low overhead off-chip caches

    Publication Year: 2004 , Page(s): 843 - 855
    Cited by:  Papers (3)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1264 KB) |  | HTML iconHTML  

    Large off-chip L3 caches can significantly improve the performance of memory-intensive applications. However, conventional L3 SRAM caches are facing two issues as those applications require increasingly large caches. First, an SRAM cache has a limited size due to the low density and high cost of SRAM and, thus, cannot hold the working sets of many memory-intensive applications. Second, since the tag checking overhead of large caches is nontrivial, the existence of L3 caches increases the cache miss penalty and may even harm the performance of some memory-intensive applications. To address these two issues, we present a new memory hierarchy design that uses cached DRAM to construct a large size and low overhead off-chip cache. The high density DRAM portion in the cached DRAM can hold large working sets, while the small SRAM portion exploits the spatial locality appearing in L2 miss streams to reduce the access latency. The L3 tag array is placed off-chip with the data array, minimizing the area overhead on the processor for L3 cache, while a small tag cache is placed on-chip, effectively removing the off-chip tag access overhead. A prediction technique accurately predicts the hit/miss status of an access to the cached DRAM, further reducing the access latency. Conducting execution-driven simulations for a 2 GHz 4-way issue processor and with 11 memory-intensive programs from the SPEC 2000 benchmark, we show that a system with a cached DRAM of 64 MB DRAM and 128 KB on-chip SRAM cache as the off-chip cache outperforms the same system with an 8 MB SRAM L3 off-chip cache by up to 78 percent measured by the total execution time. The average speedup of the system with the cached-DRAM off-chip cache is 25 percent over the system with the L3 SRAM cache. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PACE: a new approach to dynamic voltage scaling

    Publication Year: 2004 , Page(s): 856 - 869
    Cited by:  Papers (51)  |  Patents (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1576 KB) |  | HTML iconHTML  

    By dynamically varying CPU speed and voltage, it is possible to save significant amounts of energy while still meeting prespecified soft or hard deadlines for tasks; numerous algorithms have been published with this goal. We show that it is possible to modify any voltage scaling algorithm to minimize energy use without affecting perceived performance and present a formula to do so optimally. Because this formula specifies increased speed as the task progresses, we call this approach PACE (Processor Acceleration to Conserve Energy). This optimal formula depends on the probability distribution of the task's work requirement and requires that the speed be varied continuously. We therefore present methods for estimating the task work distribution and evaluate how effective they are on a variety of real workloads. We also show how to approximate the optimal continuous schedule with one that changes speed a limited number of times. Using these methods, we find we can apply PACE practically and efficiently. Furthermore, PACE is extremely effective. Simulations using real workloads and the standard model for energy consumption as a function of voltage show that PACE can reduce the CPU energy consumption of existing algorithms by up to 49.5 percent, with an average of 20.6 percent, without any effect on perceived performance. The consequent PACE-modified algorithms reduce CPU energy consumption by an average of 65.4 percent relative to no dynamic voltage scaling, as opposed to only 54.3 percent without PACE. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analytically modeling a fault-tolerant messaging protocol

    Publication Year: 2004 , Page(s): 870 - 878
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (680 KB) |  | HTML iconHTML  

    We present a simple analytical model for communication over a discarding network using a fault-tolerant messaging protocol. Our technique is an improvement over existing methods in that it accurately models both packet retransmission and the multiple types of packets exchanged between sender and receiver in order to guarantee message delivery and idempotence. The model can be applied to any network and routing strategy; we consider both circuit switching and wormhole routing on three different network topologies. In all cases, the model agrees closely with simulated results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design and performance analysis of the generalized timed token service discipline

    Publication Year: 2004 , Page(s): 879 - 891
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (912 KB) |  | HTML iconHTML  

    Multiservice networks host heterogeneous applications, requiring different qualities of service (QoS), the coexistence of which can be efficiently accounted by employing scheduling algorithms which are capable of providing different QoS simultaneously. In a previous work, we defined a reference dual-class (DC) paradigm, according to which rate-guaranteed flows are restrained from using more than their minimum guaranteed rate in the presence of backlogged best-effort flows and the latter share all the remaining capacity according to predetermined weights. The timed token service discipline (TTSD), which applies at the output link of a switch the same rules used to control medium access by the timed token protocol, was also introduced and analyzed therein. It was proven that TTSD shares most of the capacity which is not strictly needed by the rate-guaranteed flows among the best-effort ones, thus achieving one of the goals of the DC paradigm. However, in TTSD, best-effort flows can only share the available capacity equally. We take into account the issue of differentiating the capacity sharing among the best-effort flows: We define a generalized TTSD (GTTSD) in which the latter actually share capacity according to predefined weights in a weighted fair queuing service discipline. Formal analysis and simulation results show that GTTSD closely approximates the DC paradigm. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy efficient comparators for superscalar datapaths

    Publication Year: 2004 , Page(s): 892 - 904
    Cited by:  Papers (11)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1112 KB) |  | HTML iconHTML  

    Modern superscalar datapaths use aggressive execution reordering to exploit instruction-level parallelism. Comparators, either explicit or embedded into content-addressable logic, are used extensively throughout such designs to implement several key out-of-order execution mechanisms and support the memory hierarchy. The traditional comparator designs dissipate energy on a mismatch in any bit position. As mismatches occur with a much higher frequency than matches in many situations, considerable improvements in energy dissipation are to be gained by using comparators that dissipate energy predominantly on a full match and little or no energy on partial or complete mismatches. We make two contributions. First, we introduce a series of dissipate-on-match comparator designs, including designs for comparing long arguments. Second, we show how comparators, used in modern datapaths, can be chosen and organized judiciously based on the microarchitectural-level statistics to minimize the energy dissipation. We use the actual layout data and the realistic bit patterns of the comparands (obtained from the simulated execution of SPEC 2000 benchmarks) to show the energy impact from the use of the new comparator designs. For the same delay, the proposed 8-bit comparators dissipate 70 percent less energy than the traditional designs if used within issue queues and 73 percent less energy if used within load-store queues. The use of the proposed 6-bit comparators within the dependency checking logic is shown to increase the energy dissipation by 65 percent on the average compared to the traditional designs. We also find that the use of a hybrid 32-bit comparator, comprised of three traditional 8-bit blocks and one proposed 8-bit block, is the most energy-efficient solution for the use in the load-store queue, resulting in 19 percent energy reduction compared to the use of four traditional 8-bit blocks used to implement a 32-bit comparator. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Group key agreement efficient in communication

    Publication Year: 2004 , Page(s): 905 - 921
    Cited by:  Papers (36)  |  Patents (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1104 KB) |  | HTML iconHTML  

    In recent years, collaborative and group-oriented applications and protocols have gained popularity. These applications typically involve communication over open networks; security thus is naturally an important requirement. Group key management is one of the basic building blocks in securing group communication. Most prior research in group key management focused on minimizing computation overhead, in particular minimizing expensive cryptographic operations. However, continued advances in computing power have not been matched by a decrease in network communication delay. Thus, communication latency, especially in high-delay long-haul networks, increasingly dominates the key setup latency, replacing computation delay as the main latency contributor. Hence, there is a need to minimize the size of messages and, especially, the number of rounds in cryptographic protocols. Since most previously proposed group key management techniques optimize computational (cryptographic) overhead, they are particularly impacted by high communication delay. In this work, we discuss and analyze a specific group key agreement technique which supports dynamic group membership and handles network failures, such as group partitions and merges. This technique is very communication-efficient and provably secure against hostile eavesdroppers as well as various other attacks specific to group settings. Furthermore, it is simple, fault-tolerant, and well-suited for high-delay networks. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On the nondomination of cohorts coteries

    Publication Year: 2004 , Page(s): 922 - 923
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (74 KB)  

    We show that subsets of Cohorts coteries proposed in [J.-R. Jiang et al., (1997)] are nondominated (ND) k-coteries, which are candidates to achieve the highest availability when utilized to solve the distributed k-mutual exclusion problem and the distributed h-out of-k mutual exclusion problem. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrent support of multiple page sizes on a skewed associative TLB

    Publication Year: 2004 , Page(s): 924 - 927
    Cited by:  Papers (1)  |  Patents (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (320 KB) |  | HTML iconHTML  

    Some architecture definitions (e.g., Alpha) allow the use of multiple virtual page sizes even for a single process. Unfortunately, on current set-associative TLBs (translation lookaside buffers), pages with different sizes cannot coexist together. Thus, processors supporting multiple page sizes implement fully associative TLBs. In this research note, we show how the skewed-associative TLB can accommodate the concurrent use of multiple page sizes within a single process. This allows us to envision either medium size L1 TLBs or very large L2 TLBs supporting multiple page sizes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Notice of Violation of IEEE Publication Principles [Addendum]

    Publication Year: 2004 , Page(s): 928
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (28 KB)  

    Notice of Violation of IEEE Publication Principles

    Volume 52, Number 7 (July 2003) of the IEEE Transactions on Computers contained a paper: Ramnath Duggirala, Rahul Gupta, Qing-An Zeng & Dharma P. Agrawal, "Performance Enhancements of Ad Hoc Networks with Localized Route Repair," pages 854-861.

    After careful and considered review of the content and authorship of this paper by a duly constituted committee, this paper has been found to be in violation of the IEEE's Publication Principles.
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • TC Information for authors

    Publication Year: 2004 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (75 KB)  
    Freely Available from IEEE
  • [Back cover]

    Publication Year: 2004 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (143 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org