By Topic

Innovative architecture for future generation high-performance processors and systems, 2007. iwia 2007. international workshop on

Date 11-13 Jan. 2007

Filter Results

Displaying Results 1 - 19 of 19
  • Innovative Architecture for Future Generation High-Performance Processors and Systems - Cover

    Publication Year: 2007, Page(s): c1
    Request permission for commercial reuse | PDF file iconPDF (6848 KB)
    Freely Available from IEEE
  • Innovative Architecture for Future Generation High-Performance Processors and Systems - Title page

    Publication Year: 2007, Page(s):i - iii
    Request permission for commercial reuse | PDF file iconPDF (92 KB)
    Freely Available from IEEE
  • Innovative Architecture for Future Generation High-Performance Processors and Systems - Copyright

    Publication Year: 2007, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (46 KB)
    Freely Available from IEEE
  • Innovative Architecture for Future Generation High-Performance Processors and Systems - TOC

    Publication Year: 2007, Page(s):v - vi
    Request permission for commercial reuse | PDF file iconPDF (174 KB)
    Freely Available from IEEE
  • Message from the Editors

    Publication Year: 2007, Page(s): vii
    Request permission for commercial reuse | PDF file iconPDF (149 KB) | HTML iconHTML
    Freely Available from IEEE
  • Committees

    Publication Year: 2007, Page(s): viii
    Request permission for commercial reuse | PDF file iconPDF (160 KB)
    Freely Available from IEEE
  • Some Initial Explorations into the Hierarchical Multi-core Chip Design Space for HPC Systems

    Publication Year: 2007, Page(s):3 - 10
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (197 KB) | HTML iconHTML

    Multi-core designs have emerged as the dominant trend for commodity and high performance microprocessor chips, in virtually all market segments. This includes the high performance supercomputing arena. Using a particular HPC system as a baseline, this paper performs some initial explorations of how the constraints of chip technology, system-imposed memory and bandwidth, and application characteris... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Impact of Predictive Switching in 2-D Torus Networks

    Publication Year: 2007, Page(s):11 - 19
    Cited by:  Papers (2)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (309 KB) | HTML iconHTML

    Predictive switching is a technique for reducing message latency in parallel computer networks. It tries to decide traversal paths of messages by utilizing a prediction mechanism so that processing time for message headers can be shortened. A key issue of predictive switching is the overhead of prediction failures. This paper presents simple and efficient treatments of prediction failures. Our pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Responsive Link for Distributed Real-Time Processing

    Publication Year: 2007, Page(s):20 - 29
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (713 KB) | HTML iconHTML

    In this paper, we design and implement responsive link, which is a real-time communication link, for distributed real-time systems including sensor-actuator networked systems, ubiquitous computing systems, robot systems, and mechatronic systems. In order to realize flexible real-time communications, the responsive link has many unique features including priority-based packet overtaking (the packet... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerating Brain Circuit Simulations of Object Recognition with CELL Processors

    Publication Year: 2007, Page(s):33 - 42
    Cited by:  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (604 KB) | HTML iconHTML

    Humans outperform computers on many natural tasks including vision. Given the human ability to recognize objects rapidly and almost effortlessly, it is pragmatically sensible to study and attempt to imitate algorithms used by the brain. Analysis of the anatomical structure and physiological operation of brain circuits has led to derivation of novel algorithms that in initial study have successfull... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Search Speed on Pointer-Based Large Data Structures Using a Hierarchical Clustering Copying Algorithm

    Publication Year: 2007, Page(s):43 - 52
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (280 KB) | HTML iconHTML

    The increasing processor-memory performance gap makes improving the cache locality as important as the virtual memory locality. In many applications, especially in search algorithms on pointer-based large data structures, breadth-first copying algorithms increase cache misses, page faults and TLB misses. Since the depth-first copying only achieves limited locality improvement, several clustering c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation and Evaluation of Parallel FFT Using SIMD Instructions on Multi-core Processors

    Publication Year: 2007, Page(s):53 - 59
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (235 KB) | HTML iconHTML

    In this paper, an implementation of a parallel two- dimensional fast Fourier transform (FFT) using short vector SIMD instructions on multi-core processors is proposed. Combination of vectorization and the block two- dimensional FFT algorithm is shown to effectively improve performance. We vectorized FFT kernels using Intel's streaming SIMD extensions 3 (SSE3) instruction. The performance results f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design Assists for Embedded Systems in the COINS Compiler Infrastructure

    Publication Year: 2007, Page(s):60 - 69
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (6048 KB) | HTML iconHTML

    Program design of embedded systems requires special considerations such as compact code size, shorter addressing field, low cost parallelization, tuning to application field, and so on. COINS is a compiler infrastructure that makes compiler development easy by providing a code generator based on target machine description for quick retargeting, and providing modularized analysis/optimization metho... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Cache Memory Architecture for Leakage Energy Reduction

    Publication Year: 2007, Page(s):73 - 80
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (292 KB) | HTML iconHTML

    Recently, energy dissipation by microprocessors is getting larger, which leads to a serious problem in terms of allowable temperature and performance improvement for future microprocessors. Cache memory is effective in bridging a growing speed gap between a processor and relatively slow external main memory, and has increased in its size. Almost all of today's commercial processors, not only high-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Exploring Temperature-Aware Design of Memory Architectures in VLIW Systems

    Publication Year: 2007, Page(s):81 - 87
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1002 KB) | HTML iconHTML

    This paper presents a thermal model to analyze the temperature evolution in the shared register files found on VLIW systems. The use of this model allows the analysis of several factors that have an strong impact on the heat transfer: layout topology, placement and memory accesses. Finally, some relevant conclusions are obtained after analyzing the thermal behavior of several multimedia applicatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Reconfigurable Processor PARS and its Compiler

    Publication Year: 2007, Page(s):91 - 100
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (984 KB) | HTML iconHTML

    The topics of this paper are the following: (1) to introduce a reconfigurable architecture for general purpose, called as PARS, and (2) its compiler to generate high quality code in a reasonable compile time. PARS architecture can execute various application programs by avoiding the program size problem with the fold-down mechanism and reconfiguration. The mechanism is to fold down the program int... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Embedded Processor: Is It Ready for High-Performance Computing?

    Publication Year: 2007, Page(s):101 - 109
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1055 KB) | HTML iconHTML

    High power-performance ratio is the most important factor in high-performance computing, whose performance is limited by its power budget. A SuperH (SH) embedded processor core, SH-X3, implemented in a 90-nm CMOS process running at 600 MHz achieved 1080 Dhrystone MIPS, 4.2 GFLOPS, and 55 M polygons/s. Its power performance ratio reaches to as high as 3000 MIPS/W. It is, therefore, a candidate proc... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Outline of OROCHI: A Multiple Instruction Set Executable SMT Processor

    Publication Year: 2007, Page(s):110 - 117
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (227 KB) | HTML iconHTML

    In recent years, enjoying multimedia contents with portable devices become popular. These multimedia processing workloads are too heavy workload for a conventional processor so that current portable devices implement additional dedicated processor for multimedia processing. But we have to left conventional processor to execute OS and miscellaneous processing so that this solution enlarges cost, fo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2007, Page(s): 118
    Request permission for commercial reuse | PDF file iconPDF (119 KB)
    Freely Available from IEEE