By Topic

2011 Second Workshop on Architecture and Multi-Core Applications (wamca 2011)

26-27 Oct. 2011

Filter Results

Displaying Results 1 - 17 of 17
  • [Front cover]

    Publication Year: 2011, Page(s): C1
    Request permission for commercial reuse | PDF file iconPDF (1835 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2011, Page(s): i
    Request permission for commercial reuse | PDF file iconPDF (77 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2011, Page(s): iii
    Request permission for commercial reuse | PDF file iconPDF (148 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2011, Page(s): iv
    Request permission for commercial reuse | PDF file iconPDF (118 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2011, Page(s): v
    Request permission for commercial reuse | PDF file iconPDF (136 KB)
    Freely Available from IEEE
  • Message from the Program Chairs

    Publication Year: 2011, Page(s): vi
    Request permission for commercial reuse | PDF file iconPDF (101 KB) | HTML iconHTML
    Freely Available from IEEE
  • Committees

    Publication Year: 2011, Page(s): vii
    Request permission for commercial reuse | PDF file iconPDF (74 KB)
    Freely Available from IEEE
  • Keynote

    Publication Year: 2011, Page(s): viii
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (121 KB) | HTML iconHTML

    Summary form only given. In this talk we examine how high performance computing has changed over the last 10-year and look toward the future in terms of trends. These changes have had and will continue to have a major impact on our software. Some of the software and algorithm challenges have already been encountered, such as management of communication and memory hierarchies through a combination ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Industrial Talks

    Publication Year: 2011, Page(s):ix - x
    Request permission for commercial reuse | PDF file iconPDF (123 KB)
    Freely Available from IEEE
  • Tutorial

    Publication Year: 2011, Page(s): xi
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (121 KB)

    Provides an abstract of the tutorial presentation and a brief professional biography of the presenter. The complete presentation was not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large Scale Kronecker Product on Supercomputers

    Publication Year: 2011, Page(s):1 - 4
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (243 KB) | HTML iconHTML

    The Kronecker product, also called tensor product, is a fundamental matrix algebra operation, which is widely used as a natural formalism to express a convolution of many interactions or representations. Given a set of matrices, we need to multiply their Kronecker product by a vector. This operation is a critical kernel for iterative algorithms, thus needs to be computed efficiently. In a previous... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Trace-Based Visualization as a Tool to Understand Applications' I/O Performance in Multi-core Machines

    Publication Year: 2011, Page(s):5 - 11
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (595 KB) | HTML iconHTML

    This paper presents the use of trace-based performance visualization of a large scale atmospheric model, the Ocean-Land-Atmosphere Model (OLAM). The trace was obtained with the libRastro library, and the visualization was done with Paje. The use of visualization aimed to analyze OLAM's performance and to identify its bottlenecks. Especially, we are interested in the model's I/O operations, since i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Adaptive Power Optimization of On-chip SNUCA Cache on Tiled Chip Multicore Architecture Using Remap Policy

    Publication Year: 2011, Page(s):12 - 17
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (270 KB) | HTML iconHTML

    Advances in technology have increased the number of cores and size of caches present on chip multicore platforms(CMPs). As a result, leakage power consumption of on-chip caches has already become a major power consuming component of the memory subsystem. We propose to reduce leakage power consumption in static nonuniform cache architecture(SNUCA) on a tiled CMP by dynamically varying the number of... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating the Problem of Process Mapping on Network-on-Chip for Parallel Applications

    Publication Year: 2011, Page(s):18 - 23
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (268 KB) | HTML iconHTML

    Process mapping on Networks-on-Chip (NoC) is an important issue for the future many-core processors. Mapping strategies can increase performance and scalability by optimizing the communication cost. However, parallel applications have a large set of collective communication performing a high traffic on the Network-on-Chip. Therefore, our goal in this paper is to evaluate the problem related to the... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Economical Two-fold Working Precision Matrix Multiplication on Consumer-Level CUDA GPUs

    Publication Year: 2011, Page(s):24 - 29
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (299 KB) | HTML iconHTML

    Dot product faithfully rounded after "as if" computed in K-fold working precision (K ≤ 2) is known to be computable only with floating-point numbers defined in IEEE 754 floating-point standard. This paper presents a CUDA GPU implementation of two-fold working precision matrix multiplication based on the dot product computation method. Experimental results on a GeForce GTX580 and a GTX560Ti ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Author index

    Publication Year: 2011, Page(s): 30
    Request permission for commercial reuse | PDF file iconPDF (74 KB)
    Freely Available from IEEE
  • [Publishers information]

    Publication Year: 2011, Page(s): 32
    Request permission for commercial reuse | PDF file iconPDF (90 KB) | HTML iconHTML
    Freely Available from IEEE