By Topic

Computers, IEEE Transactions on

Issue 12 • Date Dec. 2006

Filter Results

Displaying Results 1 - 19 of 19
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (141 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE
  • Introducting the New Editor-in-Chief of the IEEE Transactions on Computers

    Page(s): 1489 - 1490
    Save to Project icon | Request Permissions | PDF file iconPDF (59 KB)  
    Freely Available from IEEE
  • Address-Value Delta (AVD) Prediction: A Hardware Technique for Efficiently Parallelizing Dependent Cache Misses

    Page(s): 1491 - 1508
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5515 KB) |  | HTML iconHTML  

    While runahead execution is effective at parallelizing independent long-latency cache misses, it is unable to parallelize dependent long-latency cache misses. To overcome this limitation, this paper proposes a novel hardware technique, address-value delta (AVD) prediction. An AVD predictor keeps track of the address (pointer) load instructions for which the arithmetic difference (i.e., delta) between the effective address and the data value is stable. If such a load instruction incurs a long-latency cache miss during runahead execution, its data value is predicted by subtracting the stable delta from its effective address. This prediction enables the preexecution of dependent instructions, including load instructions that incur long-latency cache misses. We analyze why and for what kind of loads AVD prediction works and describe the design of an implementable AVD predictor. We also describe simple hardware and software optimizations that can significantly improve the benefits of AVD prediction and analyze the interaction of AVD prediction with runahead efficiency techniques and stream-based data prefetching. Our analysis shows that AVD prediction is complementary to these techniques. Our results show that augmenting a runahead processor with a simple, 16-entry AVD predictor improves the average execution time of a set of pointer-intensive applications by 14.3 percent (7.5 percent excluding benchmark health) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Resource Reservation Algorithm for Power-Aware Scheduling of Periodic and Aperiodic Real-Time Tasks

    Page(s): 1509 - 1522
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1598 KB) |  | HTML iconHTML  

    Power consumption is an important issue in the design of real-time embedded systems. As many embedded systems are powered by batteries, the goal is to extend the autonomy of the system as much as possible. To reduce power consumption, modern processors can change their voltage and frequency at runtime. A power-aware scheduling algorithm can exploit this capability to reduce power consumption while preserving the timing constraints of real-time tasks. In this paper, we present GRUB-PA, a novel power-aware scheduling algorithm based on a resource reservation technique. In addition to providing temporal isolation and time guarantees and, unlike most of the power-aware algorithms proposed in the literature, GRUB-PA can efficiently handle systems consisting of both hard and soft, aperiodic, sporadic, and periodic tasks. We compared our algorithm with existing power-aware scheduling algorithms on an extensive set of simulation experiments on synthetic task sets. The results show that the performance of our algorithm is in line with the state-of-the-art power-aware algorithms. We also present the implementation of our algorithm in the Linux operating system and discuss practical implementation issues like switching overhead and power models. Finally, we show the results of experiments performed on a real testbed application View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Accuracy in Mitchell's Logarithmic Multiplication Using Operand Decomposition

    Page(s): 1523 - 1535
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (4965 KB) |  | HTML iconHTML  

    Logarithmic number systems (LNS) offer a viable alternative in terms of area, delay, and power to binary number systems for implementing multiplication and division operations for applications in signal processing. The Mitchell algorithm (MA), proposed, reduces the complexity of finding the logarithms and the antilogarithms using piecewise straight line approximations of the logarithm and the antilogarithm curves. The approximations, however, result in some loss of accuracy. Thus, several methods have been proposed in the literature for improving the accuracy of Mitchell's algorithm. In this work, we investigate a new method based on operand decomposition (OD) to improve the accuracy of Mitchell's algorithm when applied to logarithmic multiplication. In the OD technique proposed, for reducing the amount of switching activity in binary multiplication, the two inputs to be multiplied are together decomposed into four binary operands and the product is expressed as the sum of the products of the decomposed numbers. We show that applying operand decomposition to the inputs as a preprocessing step to Mitchell's multiplication algorithm significantly improves the accuracy. Experimental results indicate that the proposed algorithm for logarithmic multiplication reduces the error percentage of Mitchell's algorithm by 44.7 percent on the average. It is also shown that the OD method yields further improvement when combined with the other correction methods proposed in the literature View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Use of Sparse and/or Complex Exponents in Batch Verification of Exponentiations

    Page(s): 1536 - 1542
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1082 KB) |  | HTML iconHTML  

    Modular exponentiation in an abelian group is one of the most frequently used mathematical primitives in modern cryptography. Batch verification is an algorithm for verifying many exponentiations simultaneously. We propose two fast batch verification algorithms. The first one makes use of exponents of small weight, called sparse exponents, and is asymptotically 10 times faster than individual verification and twice as fast as previous works at the same security level. The second one can only be applied to elliptic curves defined over small finite fields. Using sparse Frobenius expansion with small integer coefficients, we give a complex exponent test which is four times faster than the previous works. For example, each exponentiation in one batch asymptotically requires nine elliptic curve additions on some elliptic curves for 280 security View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • eQoS: Provisioning of Client-Perceived End-to-End QoS Guarantees in Web Servers

    Page(s): 1543 - 1556
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2487 KB) |  | HTML iconHTML  

    It is important to guarantee client-perceived end-to-end quality of service (QoS) under heavy load conditions. Existing work focuses on network transfer time or server-side request processing time. In this paper, we propose a novel framework, eQoS, to monitor and control client-perceived response time in heavy loaded Web servers. The response time is measured with respect to Web pages that contain multiple embedded objects. Within the framework, we propose an adaptive fuzzy controller, STFC, to allocate server resources. The controller assumes no knowledge of the pageview traffic model. It deals with the effect of process delay in resource allocation by its two-level self-tuning capabilities. We also prove the stability of the STFC. We implement a prototype of eQoS in Linux and conduct comprehensive experiments across wide-range server workload conditions on PlanetLab and simulated networks. Experimental results demonstrate the effectiveness of the framework: it controls the deviation of client-perceived pageview response time to be within 20 percent of a predefined target with both synthetic and real Web traffics. We also compare the STFC with other controllers, including static fuzzy, linear proportional integral (PI), and adaptive PI controllers. Experimental results show that, although the STFC works slightly worse than the static fuzzy controller in the environment where the static fuzzy controller is best tuned, because of its self-tuning capabilities, it has better performance in all other test cases by around 25 percent on average in terms of the deviation from the target response time. In addition, due to its model independence, the STFC outperforms the linear PI and adaptive PI controllers by 50 percent and 75 percent on average, respectively View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Online Permutation Routing in Partitioned Optical Passive Star Networks

    Page(s): 1557 - 1571
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2081 KB) |  | HTML iconHTML  

    This paper establishes the state of the art in both deterministic and randomized online permutation routing in the POPS network. Indeed, we show that any permutation can be routed online on a POPS(d, g) network either with O(d/g log g) deterministic slots, or, with high probability, with 5clfloord/grfloor + o(d/g) + O(log log g) randomized slots, where constant c = exp(1 + e-1) ap 3.927. When d = Theta(g), which we claim to be the "interesting" case, the randomized algorithm is exponentially faster than any other algorithm in the literature, both deterministic and randomized ones. This is true in practice as well. Indeed, experiments show that it outperforms its rivals even starting from as small a network as a POPS(2, 2) and the gap grows exponentially with the size of the network. We can also show that, under proper hypothesis, no deterministic algorithm can asymptotically match its performance View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • STORM: Scalable Resource Management for Large-Scale Parallel Computers

    Page(s): 1572 - 1587
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2895 KB) |  | HTML iconHTML  

    Although clusters are a popular form of high-performance computing, they remain more difficult to manage than sequential systems - or even symmetric multiprocessors. In this paper, we identify a small set of primitive mechanisms that are sufficiently general to be used as building blocks to solve a variety of resource-management problems. We then present STORM, a resource-management environment that embodies these mechanisms in a scalable, low-overhead, and efficient implementation. The key innovation behind STORM is a modular software architecture that reduces all resource management functionality to a small number of highly scalable mechanisms. These mechanisms simplify the integration of resource management with low-level network features. As a result of this design, STORM can launch large, parallel applications an order of magnitude faster than the best time reported in the literature and can gang-schedule a parallel application as fast as the node OS can schedule a sequential application. This paper describes the mechanisms and algorithms behind STORM and presents a detailed performance model that shows that STORM's performance can scale to thousands of nodes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimized Slowdown in Real-Time Task Systems

    Page(s): 1588 - 1598
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1037 KB) |  | HTML iconHTML  

    Slowdown factors determine the extent of slowdown a computing system can experience based on functional and performance requirements. Dynamic voltage scaling (DVS) of a processor based on slowdown factors can lead to considerable energy savings. We address the problem of computing slowdown factors for dynamically scheduled tasks with specified deadlines. We present an algorithm to compute a near optimal constant slowdown factor based on the bisection method. As a further generalization, for the case of tasks with varying power characteristics, we present the computation of near optimal slowdown factors as a solution to convex optimization problem using the ellipsoid method. The algorithms are practically fast and have the same time complexity as the algorithms to compute the feasibility of a task set. Our simulation results show an average 20 percent energy gain over known slowdown techniques using static slowdown factors and 40 percent gain with dynamic slowdown View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Schedulability Envelope for Real-Time Radar Dwell Scheduling

    Page(s): 1599 - 1613
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3535 KB) |  | HTML iconHTML  

    This paper proposes novel techniques for scheduling radar dwells in phased array radar systems. In order to handle complex physical characteristics such as dwell interleaving, transmitting duty cycle constraint, and energy constraint, we propose a notion of schedulability envelope. The schedulability envelope designed offline hides the details of complex radar dwell scheduling and provides a simple measure for the schedulability check. Using the schedulability envelope, the proposed technique can efficiently perform the admission control for dynamic target tracking tasks. The simulation results show that the proposed approach can significantly improve the system utilization by taking advantage of dwell interleaving while guaranteeing the schedulability and physical constraints View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 2-Level TCAM Architecture for Ranges

    Page(s): 1614 - 1629
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3827 KB) |  | HTML iconHTML  

    As the demand for high-quality Internet increases, emerging network applications are spurring the need for faster, feature-rich, and cost-effective routers. Multifield packet classification in routers has been a computation-intensive data path function for software implementation. Therefore, solutions for packet classification based on hardware design, such as ternary content addressable memory (TCAM), are necessary to sustain gigabit line processing rate. Traditionally, TCAMs have been designed for storing prefixes. However, multifield packet classification usually involves two fields of arbitrary ranges that are TCP/IP layer 4 source and destination ports. Storing ranges in TCAMs relies on decomposing each individual range into multiple prefixes, which leads to range-to-prefix blowout. To reduce the total number of prefixes needed to represent all ranges, this paper proposes a 2-level TCAM architecture and two range-to-prefix conversion schemes. In the first proposed scheme, designed for disjoint ranges, the maximum number of entries needed in TCAM is 2m - 1 for m disjoint ranges. In the second proposed scheme, designed for contiguous ranges, only m TCAM entries are needed. In a general case of n arbitrary ranges, all ranges can first be converted into disjoint ranges or contiguous ranges and then the proposed algorithms can be applied. As a result, only 4n - 3 TCAM entries are needed for the disjoint ranges and only 2n + 1 TCAM entries are needed for contiguous ranges. This paper also proposes insertion and deletion algorithms to accommodate incremental changes to the range sets. The experiments made show that the proposed range-to-prefix conversion schemes perform better than the existing schemes in terms of the number of required TCAM entries and execution time for range update operations View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Opens and Delay Faults in CMOS RAM Address Decoders

    Page(s): 1630 - 1639
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2444 KB) |  | HTML iconHTML  

    This paper presents a complete electrical analysis of address decoder delay faults "ADFs" caused by resistive opens in RAMs. A classification between inter and intragate opens is made. A systematic way is introduced to explore the space of possible tests to detect these faults; it is based on generating appropriate sensitizing address transitions and the corresponding sensitizing operation sequences. DFT features are given to facilitate the BIST implementation of the new tests View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mirrored Disk Organization Reliability Analysis

    Page(s): 1640 - 1644
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (781 KB) |  | HTML iconHTML  

    Disk mirroring or RAID level 1 (RAID1) is a popular paradigm to achieve fault tolerance and a higher disk access bandwidth for read requests. We consider four RAID1 organizations: basic mirroring, group rotate declustering, interleaved declustering, and chained declustering, where the last three organizations attain a more balanced load than basic mirroring when disk failures occur. We first obtain the number of configurations, A(n, i), which do not result in data loss when i out of n disks have failed. The probability of no data loss in this case is A(n, i)/matrix of(n, i). The reliability of each RAID1 organization is the summation over 1 les i les n/2 of A(n, i)rn-i (1 - r) i, where r denotes the reliability of each disk. A closed-form expression for A(n, i) is obtained easily for the first three organizations. We present a relatively simple derivation of the expression for A(n, i) for the chained declustering method, which includes a correctness proof. We also discuss the routing of read requests to balance disk loads, especially when there are disk failures, to maximize the attainable throughput View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Internode Distance and Optimal Routing in a Class of Alternating Group Networks

    Page(s): 1645 - 1648
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (290 KB) |  | HTML iconHTML  

    Alternating group graphs AGn, studied by Jwo and others, constitute a class of Cayley graphs that possess certain desirable properties compared with other regular networks considered by researchers in parallel and distributed computing. A different form, AN n, of such graphs, proposed by Youhou and dubbed alternating group networks, has been shown to possess advantages over AGn. For example, ANn has a node degree that is smaller by a factor of about 2 while maintaining a diameter comparable to that of AGn, is maximally fault-tolerant, and shares some of the positive structural attributes of the well-known star graph. In this paper, we characterize the distance between any two nodes in ANn and present an optimal (shortest-path) routing algorithm for this class of networks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Annual index

    Page(s): tc06
    Save to Project icon | Request Permissions | PDF file iconPDF (366 KB)  
    Freely Available from IEEE
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (91 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (143 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Albert Y. Zomaya
School of Information Technologies
Building J12
The University of Sydney
Sydney, NSW 2006, Australia
http://www.cs.usyd.edu.au/~zomaya
albert.zomaya@sydney.edu.au