By Topic

Proceedings of the IEEE

Issue 5 • Date May 2008

Filter Results

Displaying Results 1 - 19 of 19
  • [Front cover]

    Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (574 KB)  
    Freely Available from IEEE
  • Proceedings of the IEEE publication information

    Page(s): C2
    Save to Project icon | Request Permissions | PDF file iconPDF (62 KB)  
    Freely Available from IEEE
  • Table of contents

    Page(s): 753 - 754
    Save to Project icon | Request Permissions | PDF file iconPDF (211 KB)  
    Freely Available from IEEE
  • Focusing on Regional Resilience

    Page(s): 755 - 757
    Save to Project icon | Request Permissions | PDF file iconPDF (314 KB)  
    Freely Available from IEEE
  • Cutting-Edge Computing: Using New Commodity Architectures

    Page(s): 758 - 760
    Save to Project icon | Request Permissions | PDF file iconPDF (175 KB)  
    Freely Available from IEEE
  • Rise of the Graphics Processor

    Page(s): 761 - 778
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1043 KB) |  | HTML iconHTML  

    The modern graphics processing unit (GPU) is the result of 40 years of evolution of hardware to accelerate graphics processing operations. It represents the convergence of support for multiple market segments: computer-aided design, medical imaging, digital content creation, document and presentation applications, and entertainment applications. The exceptional performance characteristics of the GPU make it an attractive target for other application domains. We examine some of this evolution, look at the structure of a modern GPU, and discuss how graphics processing exploits this structure and how nongraphical applications can take advantage of this capability. We discuss some of the technical and market issues around broader adoption of this technology. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Graphics Processing Units for Handhelds

    Page(s): 779 - 789
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1172 KB) |  | HTML iconHTML  

    During the past few years, mobile phones and other handheld devices have gone from only handling dull text-based menu systems to, on an increasing number of models, being able to render high-quality three-dimensional graphics at high frame rates. This paper is a survey of the special considerations that must be taken when designing graphics processing units (GPUs) on such devices. Starting off by introducing desktop GPUs as a reference, the paper discusses how mobile GPUs are designed, often with power consumption rather than performance as the primary goal. Lowering the bus traffic between the GPU and the memory is an efficient way of reducing power consumption, and therefore some high-level algorithms for bandwidth reduction are presented. In addition, an overview of the different APIs that are used in the handheld market to handle both two-dimensional and three-dimensional graphics is provided. Finally, we present our outlook for the future and discuss directions of future research on handheld GPUs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Convergence of Recognition, Mining, and Synthesis Workloads and Its Implications

    Page(s): 790 - 807
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1661 KB) |  | HTML iconHTML  

    This paper examines the growing need for a general-purpose ldquoanalytics enginerdquo that can enable next-generation processing platforms to effectively model events, objects, and concepts based on end-user input, and accessible datasets, along with an ability to iteratively refine the model in real-time. We find such processing needs at the heart of many emerging applications and services. This processing is further decomposed in terms of an integration of three fundamental compute capabilities-recognition, mining, and synthesis (RMS). The set of RMS workloads is examined next in terms of usage, mathematical models, numerical algorithms, and underlying data structures. Our analysis suggests a workload convergence that is analyzed next for its platform implications. In summary, a diverse set of emerging RMS applications from market segments like graphics, gaming, media-mining, unstructured information management, financial analytics, and interactive virtual communities presents a relatively focused, highly overlapping set of common platform challenges. A general-purpose processing platform designed to address these challenges has the potential for significantly enhancing users' experience and programmer productivity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Challenges and Opportunities in Many-Core Computing

    Page(s): 808 - 815
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (832 KB) |  | HTML iconHTML  

    In this paper, we present some of the challenges and opportunities in software development based on the current hardware trends and the impact of massive parallelism on both the software and hardware industry. We indicate some of the approaches that can enable software development to effectively exploit the many-core architectures. Some of these include encapsulating domain-specific knowledge in reusable components, such as libraries, integrating concurrency with languages, and supporting explicit declarations to help compilers and operating system schedulers. Tighter interaction between software and underlying hardware is required to build scalable and portable applications with predictable performance and higher power-efficiency. Overall, many-core computing provides us opportunities to enable new application scenarios that support enhanced functionality and a richer experience for the user on commodity hardware. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable Programming Models for Massively Multicore Processors

    Page(s): 816 - 831
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (874 KB) |  | HTML iconHTML  

    Including multiple cores on a single chip has become the dominant mechanism for scaling processor performance. Exponential growth in the number of cores on a single processor is expected to lead in a short time to mainstream computers with hundreds of cores. Scalable implementations of parallel algorithms will be necessary in order to achieve improved single-application performance on such processors. In addition, memory access will continue to be an important limiting factor on achieving performance, and heterogeneous systems may make use of cores with varying capabilities and performance characteristics. An appropriate programming model can address scalability and can expose data locality while making it possible to migrate application code between processors with different parallel architectures and variable numbers and kinds of cores. We survey and evaluate a range of multicore processor architectures and programming models with a focus on GPUs and the Cell BE processor. These processors have a large number of cores and are available to consumers today, but the scalable programming models developed for them are also applicable to current and future multicore CPUs. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Experimental Study of Self-Optimizing Dense Linear Algebra Software

    Page(s): 832 - 848
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (861 KB) |  | HTML iconHTML  

    Memory hierarchy optimizations have been studied by researchers in many areas including compilers, numerical linear algebra, and theoretical computer science. However, the approaches taken by these communities are very different. The compiler community has invested considerable effort in inventing loop transformations like loop permutation and tiling, and in the development of simple analytical models to determine the values of numerical parameters such as tile sizes required by these transformations. Although the performance of compiler-generated code has improved steadily over the years, it is difficult to retarget restructuring compilers to new platforms because of the need to develop analytical models manually for new platforms. The search for performance portability has led to the development of self-optimizing software systems. One approach to self-optimizing software is the generate-and-test approach, which has been used by the dense numerical linear algebra community to produce high- performance BLAS and fast Fourier transform libraries. Another approach to portable memory hierarchy optimization is to use the divide-and-conquer approach to implementing cache- oblivious algorithms. Each step of divide-and-conquer generates problems of smaller size. When the working set of the subproblems fits in some level of the memory hierarchy, that subproblem can be executed without capacity misses at that level. Although all three approaches have been studied extensively, there are few experimental studies that have compared these approaches. How well does the code produced by current self-optimizing systems perform compared to hand-tuned code? Is empirical search essential to the generate-and- test approach or is it possible to use analytical models with platform-specific parameters to reduce the size of the search space? The cache-oblivious approach uses divide-and-conquer to perform approximate blocking; how well does approximate blocking perform compared to precise - locking? This paper addresses such questions for matrix multiplication, which is the most important dense linear algebra kernel. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Self-Configuring Applications for Heterogeneous Systems: Program Composition and Optimization Using Cognitive Techniques

    Page(s): 849 - 862
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1072 KB) |  | HTML iconHTML  

    This paper describes several challenges facing programmers of future edge computing systems, the diverse many-core devices that will soon exemplify commodity mainstream systems. To call attention to programming challenges ahead, this paper focuses on the most complex of such architectures: integrated, power-conserving systems, inherently parallel and heterogeneous, with distributed address spaces. When programming such complex systems, new concerns arise: computation partitioning across functional units, data movement and synchronization, managing a diversity of programming models for different devices, and reusing existing legacy and library software. We observe that many of these challenges are also faced in programming applications for large-scale heterogeneous distributed computing environments, and current solutions as well as future research directions in distributed computing can be adapted to commodity computing environments. Optimization decisions are inherently complex due to large search spaces of possible solutions and the difficulty of predicting performance on increasingly complex architectures. Cognitive techniques are well suited for managing systems of such complexity, citing recent trends of using cognitive techniques for code mapping and optimization support. Combining these, we describe a fundamentally new programming paradigm for complex heterogeneous systems, where programmers design self-configuring applications and the system automates optimization decisions and manages the allocation of heterogeneous resources. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Database Optimizations for Modern Hardware

    Page(s): 863 - 878
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (440 KB) |  | HTML iconHTML  

    Databases are an important workload for modern commodity microarchitectures. Achieving the best performance requires that careful attention be paid to the underlying architecture, including instruction and data cache usage, data layout, branch prediction, and multithreading. Specialized commodity microarchitectures, such as graphics cards and network processors, have also been investigated as effective query coprocessors. This paper presents a survey of recent architecture-sensitive database research. The insights gained from optimizing database performance on modern microarchitectures are also applicable to other domains, particularly those that are similarly data intensive. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GPU Computing

    Page(s): 879 - 899
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1320 KB) |  | HTML iconHTML  

    The graphics processing unit (GPU) has become an integral part of today's mainstream computing systems. Over the past six years, there has been a marked increase in the performance and capabilities of GPUs. The modern GPU is not only a powerful graphics engine but also a highly parallel programmable processor featuring peak arithmetic and memory bandwidth that substantially outpaces its CPU counterpart. The GPU's rapid increase in both programmability and capability has spawned a research community that has successfully mapped a broad range of computationally demanding, complex problems to the GPU. This effort in general-purpose computing on the GPU, also known as GPU computing, has positioned the GPU as a compelling alternative to traditional microprocessors in high-performance computer systems of the future. We describe the background, hardware, and programming model for GPU computing, summarize the state of the art in tools and techniques, and present four GPU computing successes in game physics and computational biophysics that deliver order-of-magnitude performance gains over optimized CPU applications. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Early Investigations on Ferrite Magnetic Materials by J. L. Snoek and Colleagues of the Philips Research Laboratories Eindhoven

    Page(s): 900 - 904
    Save to Project icon | Request Permissions | PDF file iconPDF (735 KB)  
    Freely Available from IEEE
  • Future Special Issues/Special Sections of the Proceedings

    Page(s): 905 - 906
    Save to Project icon | Request Permissions | PDF file iconPDF (182 KB)  
    Freely Available from IEEE
  • IEEE copyright form

    Page(s): 907 - 908
    Save to Project icon | Request Permissions | PDF file iconPDF (1065 KB)  
    Freely Available from IEEE
  • Put your technology leadership in writing

    Page(s): C3
    Save to Project icon | Request Permissions | PDF file iconPDF (398 KB)  
    Freely Available from IEEE
  • [Back cover]

    Page(s): C4
    Save to Project icon | Request Permissions | PDF file iconPDF (373 KB)  
    Freely Available from IEEE

Aims & Scope

The most highly-cited general interest journal in electrical engineering and computer science, the Proceedings is the best way to stay informed on an exemplary range of topics.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
H. Joel Trussell
North Carolina State University