By Topic

Computers, IEEE Transactions on

Issue 1 • Date Jan. 2007

Filter Results

Displaying Results 1 - 18 of 18
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (109 KB)  
    Freely Available from IEEE
  • [Inside front cover]

    Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • Message from the New Editor-in-Chief

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (37 KB)  
    Freely Available from IEEE
  • OS-Aware Branch Prediction: Improving Microprocessor Control Flow Prediction for Operating Systems

    Page(s): 2 - 17
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (8459 KB)  

    Many modern applications have a significant operating system (OS) component. The OS execution affects various architectural states, including the dynamic branch predictions, which are widely used in today's high-performance microprocessor designs to improve performance. This impact tends to become more significant as the designs become more deeply pipelined and more speculative. In this paper, we focus on the issues of understanding the OS effects on the branch predictions and designing architectural support to alleviate the bottlenecks that are created by misprediction. In this work, we characterize the control flow transfer of several emerging applications on a commercial OS. It was observed that the exception-driven, intermittent invocation of OS code and user/OS branch history interference increased misprediction in both user and kernel code. We propose two simple OS-aware control flow prediction techniques to alleviate the destructive impact of user/OS branch interference. The first consists of capturing separate branch correlation information for user and kernel code. The second involves using separate branch prediction tables for user and kernel code. We demonstrate in this paper that OS-aware branch predictions require minimal hardware modifications and additions. Moreover, the OS-aware branch predictions can be integrated with many existing schemes to further improve their performance. We studied the improvement contributed by OS-aware techniques to various branch prediction schemes ranging from the simple Gshare to the more advanced Agree, Multi-Hybrid, and Bi-Mode predictors. On the 32 K-entry predictors, incorporating the OS-aware techniques yields up to 34 percent, 23 percent, 27 percent, and 9 percent prediction accuracy improvement on the Gshare, Multi-Hybrid, Agree, and Bi-Mode predictors, respectively View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing Cache Pollution via Dynamic Data Prefetch Filtering

    Page(s): 18 - 31
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (5099 KB)  

    In order to bridge the gap of the growing speed disparity between processors and their memory subsystems, aggressive prefetch mechanisms, either hardware-based or compiler-assisted, are employed to hide memory latencies. As the first-level cache gets smaller in deep submicron processor design for fast cache accesses, data cache pollution caused by overly aggressive prefetch mechanisms will become a major performance concern. Ineffective prefetches not only offset the benefits of benign prefetches due to pollution but also throttle bus bandwidth, leading to an overall performance degradation. In this paper, we propose and analyze a number of hardware-based prefetch pollution filtering mechanisms to differentiate good and bad prefetches dynamically based on history information. We designed three prefetch pollution filters organized as a one-level, two-level, or gshare style. In addition, we examine two table indexing schemes: per-address (PA) based and program counter (PC) based. Our prefetch pollution filters work in tandem with both hardware and software prefetchers. As our analysis shows, the cache pollution filters can reduce the ineffective prefetches by more than 90 percent and alleviate the excessive memory bandwidth induced by them. Also, the performance can be improved by up to 16 percent when our filtering mechanism is incorporated with aggressive prefetch filters as a result of reduced cache pollution and less competition for the limited number of cache ports. In addition, a number of sensitivity studies are performed to provide more understandings of the prefetch pollution filter design View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Construction of Pipelined Multibit-Trie Router-Tables

    Page(s): 32 - 43
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (2299 KB)  

    Efficient algorithms to construct multibit tries suitable for pipelined router-table applications are developed. We first enhance the 1-phase algorithm of Basu and Narlikar, obtaining a 1-phase algorithm that is 2.5 to 3 times as fast. Next, we develop 2-phase algorithms that not only guarantee to minimize the maximum per-stage memory but also guarantee to use the least total memory subject to the former constraint. Our 2-phase algorithms not only generate better pipelined trees than those generated by the 1-phase algorithm, but they also take much less time. A node pull-up scheme that guarantees no increase in maximum per-stage memory as well as a partitioning heuristic that generates pipelined multibit tries requiring less maximum per-stage memory than required by the tries obtained using the 1-phase and 2-phase algorithms are also proposed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-Weight Polynomial Form Integers for Efficient Modular Multiplication

    Page(s): 44 - 57
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1516 KB)  

    In 1999, Solinas introduced families of moduli called the generalized Mersenne numbers (GMNs), which are expressed in low-weight polynomial form, p=f(t), where t is limited to a power of 2. GMNs are very useful in elliptic curve cryptosystems over prime fields since modular reduction by a GMN requires only integer additions and subtractions. However, since there are not many GMNs and each GMN requires a dedicated implementation, GMNs are hardly useful for other cryptosystems. Here, we modify GMN by removing restriction on the choice of t and restricting the coefficients of f(t) to 0 and plusmn1. We call such families of moduli low-weight polynomial form integers (LWPFIs). We show an efficient modular multiplication method using LWPFI moduli. LWPFIs allow general implementation and there exist many LWPFI moduli. One may consider LWPFIs as a trade-off between general integers and GMNs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A TCAM-Based Parallel Architecture for High-Speed Packet Forwarding

    Page(s): 58 - 72
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3010 KB)  

    A partitioned TCAM-based search engine is presented that increases the packet forwarding rate multiple times over traditional TCAMs. The model works for IPv4 and IPv6 packet forwarding. Unlike the previous art, the improvement is achieved regardless of the incoming traffic pattern. Employing small and private memories that dynamically store popular route prefixes inside the ASIC and taking advantage of the inherent characteristics of Internet traffic to exploit parallelism make this improvement possible. Using four TCAM chips, an embodiment of the proposed model delivered more than six times the throughput of a conventional configuration with equal storage capacity and equal clock rate. Power consumption is also reduced in the new system. Other parameters such as storage density and table update performance are not adversely affected View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • (t, k) - Diagnosis for Matching Composition Networks under the MM* Model

    Page(s): 73 - 79
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (264 KB)  

    (t, k)-diagnosis, which is a generalization of sequential diagnosis, requires at least k faulty processors identified and repaired in each iteration provided there are at most t faulty processors, where tgesk. In this paper, a (t, k)-diagnosis algorithm under the MM* model is proposed for matching composition networks, which include many well-known interconnection networks, such as hypercubes, crossed cubes, twisted cubes, and Mobius cubes. It is shown that a matching composition network of n dimensions is (Omega((2n*log n)/n), n)-diagnosable View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fault-Tolerant Group Communication Protocol in Large Scale and Highly Dynamic Mobile Next-Generation Networks

    Page(s): 80 - 94
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3177 KB)  

    In recent years, the integration of mobile and wireless networks with wired ones has gained in popularity. Many new applications have been brought up and there is increasing demand for enhanced services supporting mobile collaborations. However, many challenging issues, such as scalability and reliability, are very difficult to tackle in such an integrated network environment rather than in the wired network environment. In this paper, we address these issues by proposing a RingNet hierarchy of proxies. This hierarchy is a combination of logical rings and logical trees. Therefore, it takes advantages of the simplicity of logical rings and the scalability of logical trees. More importantly, such a combination makes this hierarchy more reliable than the tree-based hierarchy. Based on this hierarchy, we propose a fault-tolerant group communication protocol in large scale and highly dynamic groups. Both theoretical analysis and simulation studies show that this protocol scales very well when the size of the network becomes large and that it is highly resilient to failures when the node failure probability becomes large. This protocol is especially suitable for those service providers and network operators who have deployed their machines in a hierarchical setting, where each machine can be locally configured to know the information about its sibling and parent machines View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Coordinated Multilevel Buffer Cache Management with Consistent Access Locality Quantification

    Page(s): 95 - 108
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3288 KB)  

    This paper proposes a protocol for effective coordinated buffer cache management in a multilevel cache hierarchy typical of a client/server system. Currently, such cache hierarchies are managed suboptimally-decisions about block placement and replacement are made locally at each level of the hierarchy without coordination between levels. Though straightforward, this approach has several weaknesses: 1) Blocks may be redundantly cached, reducing the effective total cache size, 2) weakened locality at lower-level caches makes recency-based replacement algorithms such as LRU less effective, and 3) high-level caches cannot effectively identify blocks with strong locality and may place them in low-level caches, The fundamental reason for these weaknesses is that the locality information embedded in the streams of access requests from clients is not consistently analyzed and exploited, resulting in globally nonsystematic, and therefore suboptimal, placement and replacement of cached blocks across the hierarchy. To address this problem, we propose a coordinated multilevel cache management protocol based on consistent access-locality quantification. In this protocol, locality is dynamically quantified at the client level to direct servers to place or replace blocks appropriately at each level of the cache hierarchy. The result is that the block layout in the entirely hierarchy dynamically matches the locality of block accesses. Our simulation experiments on both synthetic and real-life traces show that the protocol effectively ameliorates these caching problems. As anecdotal evidence, our protocol achieves a reduction of block accesses of 11 percent to 71 percent, with an average of 35 percent, over uniLRU, a unified multilevel cache scheme View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Provably Secure True Random Number Generator with Built-In Tolerance to Active Attacks

    Page(s): 109 - 119
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1151 KB)  

    This paper is a contribution to the theory of true random number generators based on sampling phase jitter in oscillator rings. After discussing several misconceptions and apparently insurmountable obstacles, we propose a general model which, under mild assumptions, will generate provably random bits with some tolerance to adversarial manipulation and running in the megabit-per-second range. A key idea throughout the paper is the fill rate, which measures the fraction of the time domain in which the analog output signal is arguably random. Our study shows that an exponential increase in the number of oscillators is required to obtain a constant factor improvement in the fill rate. Yet, we overcome this problem by introducing a postprocessing step which consists of an application of an appropriate resilient function. These allow the designer to extract random samples only from a signal with only moderate fill rate and, therefore, many fewer oscillators than in other designs. Last, we develop fault-attack models and we employ the properties of resilient functions to withstand such attacks. All of our analysis is based on rigorous methods, enabling us to develop a framework in which we accurately quantify the performance and the degree of resilience of the design View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimization of Dual-Speed TAM Architectures for Efficient Modular Testing of SOCs

    Page(s): 120 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (6314 KB)  

    The increasing complexity of system-on-chip (SOC) integrated circuits has spurred the development of versatile automatic test equipment (ATE) that can simultaneously drive different channels at different data rates. Examples of such ATEs include the Agilent 93000 series tester based on port scalability and the test processor-per-pin architecture and the Tiger system from Teradyne. The number of tester channels with high data rates may be constrained in practice, however, due to ATE resource limitations, the power rating of the SOC, and scan frequency limits for the embedded cores. Therefore, we formulate the following optimization problem: Given two available data rates for the tester channels, an SOC-level test access mechanism (TAM) width W, an upper limit V (V<W) on the number of channels that can transport test data at the higher data rate, determine an SOC TAM architecture that minimizes the testing time. We present an efficient heuristic algorithm for TAM optimization that exploits port scalability of ATEs to reduce SOC testing time and test cost. We present experimental results for the ITC '02 SOC test benchmarks and investigate the impact of dual-speed TAM architectures on power consumption during testing for one of these benchmarks View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relay Node Placement in Wireless Sensor Networks

    Page(s): 134 - 138
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (126 KB)  

    A wireless sensor network consists of many low-cost, low-power sensor nodes, which can perform sensing, simple computation, and transmission of sensed information. Long distance transmission by sensor nodes is not energy efficient since energy consumption is a superlinear function of the transmission distance. One approach to prolonging network lifetime while preserving network connectivity is to deploy a small number of costly, but more powerful, relay nodes whose main task is communication with other sensor or relay nodes. In this paper, we assume that sensor nodes have communication range r>0, while relay nodes have communication range Rgesr, and we study two versions of relay node placement problems. In the first version, we want to deploy the minimum number of relay nodes so that, between each pair of sensor nodes, there is a connecting path consisting of relay and/or sensor nodes. In the second version, we want to deploy the minimum number of relay nodes so that, between each pair of sensor nodes, there is a connecting path consisting solely of relay nodes. We present a polynomial time 7-approximation algorithm for the first problem and a polynomial time (5+epsi)-approximation algorithm for the second problem, where epsi>0 can be any given constant View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • 2006 Reviewers List

    Page(s): 139 - 143
    Save to Project icon | Request Permissions | PDF file iconPDF (38 KB)  
    Freely Available from IEEE
  • In this issue

    Page(s): 144
    Save to Project icon | Request Permissions | PDF file iconPDF (96 KB)  
    Freely Available from IEEE
  • TC Information for authors

    Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (83 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (109 KB)  
    Freely Available from IEEE

Aims & Scope

The IEEE Transactions on Computers is a monthly publication with a wide distribution to researchers, developers, technical managers, and educators in the computer field.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Paolo Montuschi
Politecnico di Torino
Dipartimento di Automatica e Informatica
Corso Duca degli Abruzzi 24 
10129 Torino - Italy
e-mail: pmo@computer.org