Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Computer Architecture Letters

Issue 1 • Date Jan. 2009

Filter Results

Displaying Results 1 - 15 of 15
  • [Front cover]

    Publication Year: 2009 , Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE
  • Editorial Board [Cover2]

    Publication Year: 2009 , Page(s): c2
    Save to Project icon | Request Permissions | PDF file iconPDF (73 KB)  
    Freely Available from IEEE
  • Weighted Random Routing on Torus Networks

    Publication Year: 2009 , Page(s): 1 - 4
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (125 KB)  

    In this paper, we introduce a new closed-form oblivious routing algorithm called W2TURN that is worst-case throughput optimal for 2D-torus networks. W2TURN is based on a weighted random selection of paths that contain at most two turns. In terms of average hop count, W2TURN outperforms the best previously known closed-form worst-case throughput optimal routing algorithm called IVAL. In addition, we present a new optimal weighted random routing algorithm for rings called WRD. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multicore DIMM: an Energy Efficient Memory Module with Independently Controlled DRAMs

    Publication Year: 2009 , Page(s): 5 - 8
    Cited by:  Papers (9)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (119 KB)  

    Demand for memory capacity and bandwidth keeps increasing rapidly in modern computer systems, and memory power consumption is becoming a considerable portion of the system power budget. However, the current DDR DIMM standard is not well suited to effectively serve CMP memory requests from both a power and performance perspective. We propose a new memory module called a multicore DIMM, where DRAM chips are grouped into multiple virtual memory devices, each of which has its own data path and receives separate commands. The Multicore DIMM is designed to improve the energy efficiency of memory systems with small impact on system performance. Dividing each memory modules into 4 virtual memory devices brings a simultaneous 22%, 7.6%, and 18% improvement in memory power, IPC, and system energy-delay product respectively on a set of multithreaded applications and consolidated workloads. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Predictive Shutdown Technique for GPU Shader Processors

    Publication Year: 2009 , Page(s): 9 - 12
    Cited by:  Papers (5)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (602 KB)  

    As technology continues to shrink, reducing leakage is critical to achieve energy efficiency. Previous works on low-power GPU (graphics processing unit) focus on techniques for dynamic power reduction, such as DVFS (Dynamic Voltage/Frequency Scaling) and clock gating. In this paper, we explore the potential of adopting architecture-level power gating techniques for leakage reduction on GPU. In particular, we focus on the most power-hungry components, shader processors. We observe that, due to different scene complexity, the required shader resources to satisfy the target frame rate actually vary across frames. Therefore, we propose the predictive shader shutdown technique to exploit workload variation across frames for leakage reduction on shader processors. The experimental results show that predictive shader shutdown achieves up to 46% leakage reduction on shader processors with negligible performance degradation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An XML-Based ADL Framework for Automatic Generation of Multithreaded Computer Architecture Simulators

    Publication Year: 2009 , Page(s): 13 - 16
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (106 KB)  

    Computer architecture simulation has always played a pivotal role in continuous innovation of computers. However, constructing or modifying a high quality simulator is time consuming and error-prone. Thus, often architecture description languages (ADLs) are used to provide an abstraction layer for describing the computer architecture and automatically generating corresponding simulators. Along the line of such research, we present a novel XML-based ADL, its compiler, and a generation methodology to automatically generate multithreaded simulators for computer architecture. We utilize the industry-standard extensible markup language XML to describe the functionality and architecture of a modeled processor. Our ADL framework allows users to easily and quickly modify the structure, register set, and execution of a modeled processor. To prove its validity, we have generated several multithreaded simulators with different configurations based on the MIPS five-stage processor, and successfully tested with two programs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • CPU Accounting in CMP Processors

    Publication Year: 2009 , Page(s): 17 - 20
    Cited by:  Papers (7)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (138 KB)  

    Chip-multiprocessor (CMP) architectures introduce complexities when accounting CPU utilization to processes because the progress done by a process during an interval of time highly depends on the activity of the other processes it is co-scheduled with. In this paper, we identify how an inaccurate measurement of the CPU utilization affects several key aspects of the system like the process scheduling or the charging mechanism in data centers. We propose a new hardware accounting mechanism to improve the accuracy when measuring the CPU utilization in chip multiprocessors and compare it with the previous accounting mechanisms. Our results show that currently known mechanisms could lead to a 12% average error when it comes to CPU utilization accounting. Our proposal reduces this error to less than 1% in a modeled 4-core processor system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A High-Throughput Distributed Shared-Buffer NoC Router

    Publication Year: 2009 , Page(s): 21 - 24
    Cited by:  Papers (6)  |  Patents (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (158 KB)  

    Microarchitectural configurations of buffers in routers have a significant impact on the overall performance of an on-chip network (NoC). This buffering can be at the inputs or the outputs of a router, corresponding to an input-buffered router (IBR) or an output-buffered router (OBR). OBRs are attractive because they have higher throughput and lower queuing delays under high loads than IBRs. However, a direct implementation of OBRs requires a router speedup equal to the number of ports, making such a design prohibitive given the aggressive clocking and power budgets of most NoC applications. In this letter, we propose a new router design that aims to emulate an OBR practically based on a distributed shared-buffer (DSB) router architecture. We introduce innovations to address the unique constraints of NoCs, including efficient pipelining and novel flow control. Our DSB design can achieve significantly higher bandwidth at saturation, with an improvement of up to 20% when compared to a state-of-the-art pipelined IBR with the same amount of buffering, and our proposed microarchitecture can achieve up to 94% of the ideal saturation throughput. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Many-Core vs. Many-Thread Machines: Stay Away From the Valley

    Publication Year: 2009 , Page(s): 25 - 28
    Cited by:  Papers (17)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (121 KB)  

    We study the tradeoffs between many-core machines like Intelpsilas Larrabee and many-thread machines like Nvidia and AMD GPGPUs. We define a unified model describing a superposition of the two architectures, and use it to identify operation zones for which each machine is more suitable. Moreover, we identify an intermediate zone in which both machines deliver inferior performance. We study the shape of this ldquoperformance valleyrdquo and provide insights on how it can be avoided. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture Independent Characterization of Embedded Java Workloads

    Publication Year: 2009 , Page(s): 29 - 32
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (95 KB)  

    This paper presents architecture independent characterization of embedded Java workloads based on the industry standard GrinderBench benchmark which includes different classes of real world embedded Java applications. This work is based on a custom built embedded Java virtual machine (JVM) simulator specifically designed for embedded JVM modeling and embodies domain specific details such as thread scheduling, algorithms used for native CLDC APIs and runtime data structures optimized for use in embedded systems. The results presented include dynamic execution characteristics, dynamic bytecode instruction mix, application and API workload distribution, object allocation statistics, instruction-set coverage, memory usage statistics and method code and stack frame characteristics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Comment on "Beyond Fat-tree: Unidirectional Load-Balanced Multistage Interconnection Network"

    Publication Year: 2009 , Page(s): 33 - 34
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (63 KB)  

    A recent work proposed to simplify fat-trees with adaptive routing by means of a load-balancing deterministic routing algorithm. The resultant network has performance figures comparable to the more complex adaptive routing fat-trees when packets need to be delivered in order. In a second work by the same authors published in IEEE CAL, they propose to simplify the fat-tree to a unidirectional multistage interconnection network (UMIN), using the same load-balancing deterministic routing algorithm. They show that comparable performance figures are achieved with much lower network complexity. In this comment we show that the proposed load-balancing deterministic routing is in fact the routing scheme used by the butterfly network. Moreover we show that the properties of the simplified UMIN network proposed by them are intrinsic to the standard butterfly and other existing UMINs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ad - IEEE Computer Society Digital Library

    Publication Year: 2009 , Page(s): 36
    Save to Project icon | Request Permissions | PDF file iconPDF (378 KB)  
    Freely Available from IEEE
  • [Advertisement]

    Publication Year: 2009 , Page(s): 35
    Save to Project icon | Request Permissions | PDF file iconPDF (237 KB)  
    Freely Available from IEEE
  • Information for authors

    Publication Year: 2009 , Page(s): c3
    Save to Project icon | Request Permissions | PDF file iconPDF (73 KB)  
    Freely Available from IEEE
  • IEEE Computer Society [Cover4]

    Publication Year: 2009 , Page(s): c4
    Save to Project icon | Request Permissions | PDF file iconPDF (156 KB)  
    Freely Available from IEEE

Aims & Scope

IEEE Computer Architecture Letters is a rigorously peer-reviewed forum for publishing early, high-impact results in the areas of uni- and multiprocessor computer systems, computer architecture, microarchitecture, workload characterization, performance evaluation and simulation techniques, and power-aware computing. 

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
José Martinez
Cornell University
336 Frank H.T. Rhodes Hall
Ithaca, NY 14853 USA