By Topic

Parallel & Distributed Processing, 2009. IPDPS 2009. IEEE International Symposium on

Date 23-29 May 2009

Filter Results

Displaying Results 1 - 25 of 404
  • [Front cover]

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (151 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (68 KB)  
    Freely Available from IEEE
  • [Title page]

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (167 KB)  
    Freely Available from IEEE
  • Program09 1

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | PDF file iconPDF (176 KB)  
    Freely Available from IEEE
  • Workshops09 1

    Page(s): 1 - 18
    Save to Project icon | Request Permissions | PDF file iconPDF (240 KB)  
    Freely Available from IEEE
  • Message from general chair

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (284 KB)  
    Freely Available from IEEE
  • Message from the program chair

    Page(s): 1 - 2
    Save to Project icon | Request Permissions | PDF file iconPDF (125 KB)  
    Freely Available from IEEE
  • Message from the workshops chairs

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (121 KB)  
    Freely Available from IEEE
  • Message from steering co-chairs

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (273 KB)  
    Freely Available from IEEE
  • IPDPS 2009 organization

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (294 KB)  
    Freely Available from IEEE
  • IPDPS 2009 reviewers

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (75 KB)  
    Freely Available from IEEE
  • IPDPS 2009 technical program

    Page(s): 1 - 4
    Save to Project icon | Request Permissions | PDF file iconPDF (376 KB)  
    Freely Available from IEEE
  • Many-core parallel computing - Can compilers and tools do the heavy lifting?

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (36 KB) |  | HTML iconHTML  

    Modern GPUs such as the NVIDIA GeForce GTX280, ATI Radeon 4860, and the upcoming Intel Larrabee are massively parallel, many-core processors. Today, application developers for these many-core chips are reporting 10X-100X speedup over sequential code on traditional microprocessors. According to the semiconductor industry roadmap, these processors could scale up to over 1,000X speedup over single cores by the end of the year 2016. Such a dramatic performance difference between parallel and sequential execution will motivate an increasing number of developers to parallelize their applications. Today, an application programmer has to understand the desirable parallel programming idioms, manually work around potential hardware performance pitfalls, and restructure their application design in order to achieve their performance objectives on many-core processors. Although many researchers have given up on parallelizing compilers, I will show evidence that by systematically incorporating high-level application design knowledge into the source code, a new generation of compilers and tools can take over the heavy lifting in developing and tuning parallel applications. I will also discuss roadblocks whose removal will require innovations from the entire research community. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Software transactional memory: Where do we come from? What are we? Where are we going?

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (35 KB) |  | HTML iconHTML  

    The transactional memory programming paradigm is gaining momentum as the approach of choice for replacing locks in concurrent programming. Combining sequences of concurrent operations into atomic transactions seems to promise a great reduction in the complexity of both programming and verification, by making parts of the code appear to be sequential without the need to program fine-grained locks. Software transactional memory offers to deliver a transactional programming environment without the need for costly modifications in processor design. However, the story of software transactional memory reminds one of garbage collection in its time: performance is improving, and the semantics are becoming clearer, yet there is still a long road ahead, a road strewn with stones below and crows hovering above, predicting its demise. This talk will try to take a sober look at software transactional memory, its history, the state of research today, and what we can expect to achieve it in the foreseeable future. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Green flash: Designing an energy efficient climate supercomputer

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (35 KB) |  | HTML iconHTML  

    It is clear from both the cooling demands and the electricity costs, that the growth in scientific computing capabilities of the last few decades is not sustainable unless fundamentally new ideas are brought to bear. In this talk we propose a novel approach to supercomputing design that leverages the sophisticated tool chains of the consumer electronics marketplace. We analyze our framework in the context of high-resolution global climate change simulations C an application with multi-trillion dollar ramifications to the world economies. A key aspect of our methodology is hardware-software co-tuning, which utilizes fast and accurate FPGA-based architectural emulation. This enables the design of future exaflop-class supercomputing systems to be defined by scientific requirements instead of constraining science to the machine configurations. Our talk will provide detailed design requirements for a kilometer-scale global cloud system resolving climate models and point the way toward Green Flash: an application-targeted exascale machine that could be efficiently implemented using mainstream embedded design processes. Overall, we believe that our proposed approach can provide a quantum leap in hardware and energy utilization, and may significantly impact the design of the next generation of HPC systems. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • How to build a useful thousand-core manycore system?

    Page(s): 1
    Save to Project icon | Request Permissions | PDF file iconPDF (35 KB)  
    Freely Available from IEEE
  • TCPP Ph.D. Forum

    Page(s): 1 - 29
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1300 KB)  

    First Page of the Article
    View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SiCortex high-productivity, low-power computers

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (200 KB)  

    In order to work efficiently, clusters for high performance computing require a balance between the compute, memory, inter-node communication, and I/O. Fast communications among one thousand multicore nodes requires short wire paths and power-efficient CPUs tightly integrated with memory, communication, and I/O controllers. The tutorial describes the characteristics of a six thousand core cluster that puts all of these elements on a single chip, dramatically reducing cost and power consumption while increasing reliability and performance compared to commodity clusters. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tools for scalable performance analysis on Petascale systems

    Page(s): 1 - 3
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (307 KB)  

    Tools are becoming increasingly important to efficiently utilize the computing power available in contemporary large scale systems. The drastic increase in the size and the complexity of systems require tools to be scalable while producing meaning full and easily digestible information that may help the user pin-point problems at scale. The goal of this tutorial is to introduce some state-of-the-art performance tools from three different organizations to a diverse audience group. Together these tools provide a broad spectrum of capabilities necessary to analyze the performance of scientific and engineering applications on a variety of large and small scale systems. These tools include: • IBM High Performance Computing Toolkit: The IBM High Performance Computing Toolkit is a suite of performance-related tools and libraries to assist in application tuning. This toolkit is an integrated environment for performance analysis of sequential and parallel applications using the MPI and OpenMP paradigms. Scientists can collect rich performance data from selected parts of an execution, digest the data at a very high level, and plan for improvements within a single unified interface. It provides a common framework for IBM's mid-range server offerings, including pSeries and eSeries servers and Blue Gene systems, on both AIX and Linux. More information cab be found here: http://domino.research.ibm.com/comm/research_projects.nsf/pages/hpct.index.html • Scalable Performance Analysis of Large-Scale Applications (SCALASCA) Toolset: Scalasca is an open-source toolset that can be used to analyze the performance behavior of parallel applications and to identify opportunities for optimization. It has been specifically designed for use on large-scale systems including BlueGene and Cray XT, but is also well-suited for small- and medium-scale HPC platforms. Scalasca supports an incremental performance-analysis procedure that integrates runtime summaries with in-depth studie- s of concurrent behavior via event tracing, adopting a strategy of successively refined measurement configurations. A distinctive feature is the ability to identify wait states that occur, for example, as a result of unevenly distributed workloads. Especially when trying to scale communication-intensive applications to large processor counts, such wait states can present severe challenges to achieving good performance. Scalasca is developed by the Julich Supercomputing Centre and available under the New BSD open-source license. More information can be found here: http://www.scalasca.org • CEPBA Toolkit: The CEPBA-tools environment is a trace based analysis environment consisting with two major components, Paraver, a browser for traces obtained from a parallel run and Dimemas , a simulator to rebuild the time behavior of a parallel program from a trace. More information can be found here: http://www.bsc.es/plantillaF.php?cat_id=52 View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HCW 2009 keynote talk: GPU computing: Heterogeneous computing for future systems

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (71 KB) |  | HTML iconHTML  

    Over the last decade, commodity graphics processors (GPUs) have evolved from fixed-function graphics units into powerful, programmable data-parallel processors. Today's GPU is capable of sustaining computation rates substantially greater than today's modern CPUs, with technology trends indicating a widening gap in the future. Researchers in the rapidly evolving field of GPU computing have demonstrated mappings to these processors for a wide range of computationally intensive tasks, and new programming environments offer the promise of a wider role for GPU computing in the coming years. In this talk I will begin by discussing the motivation and background for GPU computing and describe some of the recent advances in the field. The field of GPU computing has substantially changed over its short lifetime due to new applications, techniques, programming models, and hardware. As parallel computing has decidedly moved into the mainstream, the lessons of GPU computing are applicable to both today's systems as well as to the designers of tomorrow's systems. I will address the way a GPU-CPU system is a heterogeneous system and why it is an interesting one, and discuss some case studies on how one can program and use such a system. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Resilient computing: An engineering discipline

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (79 KB) |  | HTML iconHTML  

    The term resiliency has been used in many fields like child psychology, ecology, business, and several others, with the common meaning of expressing the ability to successfully accommodate unforeseen environmental perturbations or disturbances. The adjective resilient has been in use for decades in the field of dependable computing systems however essentially as a synonym of fault-tolerant, thus generally ignoring the unexpected aspect of the phenomena the systems may have to face. These phenomena become of primary relevance when moving to systems like the future large, networked, evolving systems constituting complex information infrastructures - perhaps involving everything from super-computers and huge server ldquofarmsrdquo to myriads of small mobile computers and tiny embedded devices, with humans being central part of the operation of such systems. Such systems are in fact the dawning of the ubiquitous systems that will support Ambient Intelligence. With such ubiquitous systems, what is at stake is to maintain dependability, i.e., the ability to deliver service that can justifiably be trusted, in spite of continuous changes. Therefore the term resilience and resilient computing can be applied to the design of ubiquitous systems and defined as the search for the following property: the persistence of service delivery that can justifiably be trusted, when facing changes. Changes may be of different nature, with different prospect and different timing. Therefore the design of ubiquitous systems requires the mastering of many, often separated, engineering disciplines that span from advanced probability to logic, from human factors to cryptology and information security and to management of large projects. From an educational point of view, very few, if any, Universities are offering a comprehensive and methodical track that is able to provi- de students with a sufficient preparation that makes them able to cope with the challenges posed by the design of ubiquitous systems. In Europe an activity has started towards the identification of a MSc curriculum in Resilient Computing as properly providing a timely and necessary answer to requirements posed by the design of ubiquitous systems. To this aim, a Network of Excellence ReSIST - Resilience for Survivability in IST-was run from January 2006 to March 2009 (see http://www.resist-noe.org). In this presentation the results of ReSIST will be presented as well as the identified MSc curriculum in Resilient Computing, to share its experience and to involve a much larger, open and qualified community in the discussion of the proposed curriculum. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • De novo modeling of GPCR class A structures

    Page(s): 1
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (67 KB)  

    In this talk I will describe recent work to develop novel methods to model G protein-coupled receptor (GPCR) structures from their sequence information and statistically significant side chain contacts within a ldquotemplaterdquo structure. Our approach utilizes methods of bioinformatics to identify likely high confidence side chain side chain TM helical contacts and then reconstitutes the seven TM helical domain through a simulated annealing protocol with refinement using replica exchange and an implicit solvent/implicit membrane sampling scheme. Results will be presented for de novo prediction of the b2 adenergic receptor, the adenine receptor and a number of other amine receptors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Crosstalk-free mapping of two-dimensional weak tori on optical slab waveguides

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (144 KB) |  | HTML iconHTML  

    While optical slab waveguides can deliver a huge bandwidth for the communication need through offering a huge number of communication channels, they require a large number of high speed lasers and photodetectors. This makes a limited use of the offered huge bandwidth. Some trials were proposed to implement communication networks on the slab waveguides [7]. However, the proposed mappings suffer from the possibility of crosstalk among different channels if they are to be used simultaneously. In this paper, we consider solving the problem of crosstalk when mapping weak two-dimensional tori on optical slab waveguides. We introduce the notion of diagonal pair and use it in the proposed mapping. The approach assigns edges to channels such that the mapping guarantees a crosstalk free communication between nodes (no two adjacent channels are used at the same communication step.) We also consider the cost of the mapping in terms of the number of lasers and the number of photodetectors. Our results show that the cost is within constants to the cost lower bound. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Table-based method for reconfigurable function evaluation

    Page(s): 1 - 9
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1105 KB)  

    This paper presents a new approach to function evaluation using tables. The proposal argues for the use of a more complete primitive, namely a weighted sum, which converts the calculation of the function values into a recursive operation defined by a two input table. This weighted sum can be tuned for different values of the weighting parameters holding the features of the specific function to be evaluated. A parametric architecture for reconfigurable FPGA-based hardware implements the design. Our method has been tested in the calculation of the function sine. The comparison with other well-known proposals reveals the advantages of our approach, because it provides memory and hardware resource saving as well as a good trade-off between speed and error. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Uniform scattering of autonomous mobile robots in a grid

    Page(s): 1 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (523 KB) |  | HTML iconHTML  

    We consider the uniform scattering problem for a set of autonomous mobile robots deployed in a grid network: starting from an arbitrary placement in the grid, using purely localized computations, the robots must move so to reach in finite time a state of static equilibrium in which they cover uniformly the grid. The theoretical quest is on determining the minimal capabilities needed by the robots to solve the problem. We prove that uniform scattering is indeed possible even for very weak robots. The proof is constructive. We present a provably correct protocol for uniform self-deployment in a grid. The protocol is fully localized, collision-free, and it makes minimal assumptions; in particular: (1) it does not require any direct or explicit communication between robots; (2) it makes no assumption on robots synchronization or timing, hence the robots can be fully asynchronous in all their actions; (3) it requires only a limited visibility range; (4) it uses at each robot only a constant size memory, hence computationally the robots can be simple Finite-State Machines; (5) it does not need a global localization system but only orientation in the grid (e.g., a compass); (6) it does not require identifiers, hence the robots can be anonymous and totally identical. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.