2014 International Conference on High Performance Computing & Simulation (HPCS)

21-25 July 2014

Filter Results

Displaying Results 1 - 25 of 156
  • Welcome

    Publication Year: 2014, Page(s):1 - 3
    Request permission for commercial reuse | PDF file iconPDF (125 KB) | HTML iconHTML
    Freely Available from IEEE
  • Committes

    Publication Year: 2014, Page(s):1 - 14
    Request permission for commercial reuse | PDF file iconPDF (207 KB)
    Freely Available from IEEE
  • Keynotes

    Publication Year: 2014, Page(s):1 - 3
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (228 KB)

    Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Tutorials

    Publication Year: 2014, Page(s):1 - 19
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (517 KB) | HTML iconHTML

    The following tutorials discuss the following: certified parallel program calculation in Coq; Intel Xeon phi programming; HPC and cloud access; Monte Carlo methods and high-performance computing; parallel discrete event simulation; reversible computing; and high performance computing and Big Data analytics in bioinformatics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HPCS 2014 panel and demo sessions [7 abstracts]

    Publication Year: 2014, Page(s):1 - 16
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (851 KB)

    Provides an abstract for each of the presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sponsors

    Publication Year: 2014, Page(s):1 - 8
    Request permission for commercial reuse | PDF file iconPDF (717 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2014, Page(s):1 - 12
    Request permission for commercial reuse | PDF file iconPDF (328 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 2014, Page(s):1 - 13
    Request permission for commercial reuse | PDF file iconPDF (117 KB)
    Freely Available from IEEE
  • Plenary speech

    Publication Year: 2014, Page(s):1 - 2
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (151 KB)

    Provides an abstract for each of the keynote presentations and a brief professional biography of each presenter. The complete presentations were not made available for publication as part of the conference proceedings. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • [Copyright notice]

    Publication Year: 2014, Page(s): 1
    Request permission for commercial reuse | PDF file iconPDF (907 KB)
    Freely Available from IEEE
  • Multi-Kepler GPU vs. multi-intel MIC: A two test case performance study

    Publication Year: 2014, Page(s):1 - 8
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (957 KB) | HTML iconHTML

    We present and compare the performances of two many-core architectures: the Nvidia Kepler and the Intel MIC both in a single system and in cluster configuration for the simulation of two physical systems. As a first benchmark we consider the time required to update a single spin of the 3D Heisenberg spin glass model by using the Over-relaxation algorithm. The second application we consider is a re... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GPU accelerated three dimensional unstructured geometric multigrid solver

    Publication Year: 2014, Page(s):9 - 16
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (748 KB) | HTML iconHTML

    Graphics processor units (GPUs) have started becoming an integral part of high performance computing. We develop a GPU based 3D-unstructured geometric multigrid solver, which is extensively used in Computational Fluid Dynamics (CFD) applications. Parallelization for GPUs is not straightforward because of the irregularity of the mesh. Using combination of graph coloring and greedy maximal independe... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Method of workload balancing in GPU implementation of breadth-first search

    Publication Year: 2014, Page(s):17 - 22
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (923 KB) | HTML iconHTML

    Optimized version of the parallel breadth-first search algorithm is considered in the paper. An optimization method described in the paper allows reducing the overhead of the suggested algorithm on each its iteration. It is shown that the optimized parallel algorithm for GPU is more than five times faster than its sequential analog on CPU. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse matrix computations on clusters with GPGPUs

    Publication Year: 2014, Page(s):23 - 30
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (730 KB) | HTML iconHTML

    Hybrid nodes containing GPUs are rapidly becoming the norm in parallel machines. We have conducted some experiments regarding how to plug GPU-enabled computational kernels into PSBLAS, a MPI-based library specifically geared towards sparse matrix computations. In this paper, we present our findings on which strategies are more promising in the quest for the optimal compromise among raw performance... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Burrows-Wheeler Transform based indexed exact search on a multi-GPU OpenCL platform

    Publication Year: 2014, Page(s):31 - 38
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (4958 KB) | HTML iconHTML

    A multi-GPU parallelization of exact string matching algorithms based on the backward-search procedure by using indexing techniques, such as the Burrows-Wheeler Transform and the FM-Index, is proposed in this paper. To attain an efficient execution on highly heterogeneous parallel platforms, the proposed parallelization adopted an unified OpenCL implementation that allows its execution either in C... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An area-efficient hexagonal interconnection network for multi-core processors

    Publication Year: 2014, Page(s):39 - 46
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (748 KB) | HTML iconHTML

    With the rapid increase in the number of processor cores on a chip, packet-switching networks on chip (NoCs) have emerged as a promising paradigm for designing scalable communication infrastructures for future multi-core processors. The quest for high-performance networks, however, has led to very area-consuming and complex routers with marginal return in performance. On the other hand, studies sh... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluation of vectorization potential of Graph500 on Intel's Xeon Phi

    Publication Year: 2014, Page(s):47 - 54
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (929 KB) | HTML iconHTML

    Graph500 is a data intensive application for high performance computing and it is an increasingly important workload because graphs are a core part of most analytic applications. So far there is no work that examines if Graph500 is suitable for vectorization mostly due a lack of vector memory instructions for irregular memory accesses. The Xeon Phi is a massively parallel processor recently releas... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Run-time mechanisms for fine-grained parallelism on network processors: The TILEPro64 experience

    Publication Year: 2014, Page(s):55 - 64
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2194 KB) | HTML iconHTML

    The efficient parallelization of very fine-grained computations is an old problem still challenging also on modern shared memory architectures. Scalable parallelizations are possible if the base mechanisms provided by the run-time support (for inter-thread/inter-process synchronization/communication) are carefully designed and developed on top of parallel architectures. This requires a deep knowle... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Analysis of classic algorithms on GPUs

    Publication Year: 2014, Page(s):65 - 73
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1042 KB) | HTML iconHTML

    The recently developed Threaded Many-core Memory (TMM) model provides a framework for analyzing algorithms for highly-threaded many-core machines such as GPUs. In particular, it tries to capture the fact that these machines hide memory latencies via the use of a large number of threads and large memory bandwidth. The TMM model analysis contains two components: computational complexity and memory c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Managing the topology of heterogeneous cluster nodes with hardware locality (hwloc)

    Publication Year: 2014, Page(s):74 - 81
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (673 KB) | HTML iconHTML

    Modern computing platforms are increasingly complex, with multiple cores, shared caches, and NUMA architectures. Parallel applications developers have to take locality into account before they can expect good efficiency on these platforms. Thus there is a strong need for a portable tool gathering and exposing this information. The Hardware Locality project (hwloc) offers a tree representation of t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimum: Thermal-aware task allocation for heterogeneous many-core devices

    Publication Year: 2014, Page(s):82 - 87
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (593 KB) | HTML iconHTML

    Temperature management is a key challenge for many-core platforms in the dark silicon era as all the cores cannot be powered-on together at the maximum frequency and either some cores should run at lower frequency or only a portion can be used without burning the device. In addition, due to process variations and/or design optimization, not all the integrated processing elements (PEs) are identica... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Low-power vectorial VLIW architecture for maximum parallelism exploitation of dynamic programming algorithms

    Publication Year: 2014, Page(s):88 - 95
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (842 KB) | HTML iconHTML

    Dynamic Programming algorithms are widely used in many areas, to divide a complex problem into several simpler sub-problems, with many dependencies. Typical approaches explore data level parallelism by relying on spacialized vector instructions. However, the fully-parallelizable scheme is often not compliant with the memory organization of general purpose processors, leading to a less optimal para... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Determining map partitioning to accelerate wind field calculation

    Publication Year: 2014, Page(s):96 - 103
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (2436 KB) | HTML iconHTML

    Wind speed and direction are parameters that affect forest fire propagation dramatically. So, an accurate estimation of such parameters is crucial to predict the fire propagation precisely. WindNInja is a wind field simulator that can easily be coupled to a forest fire propagation simulator such as FARSITE. However, wind field simulators present to main drawbacks: They take too much time to comput... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Solving a very large-scale sparse linear system with a parallel algorithm in the Gaia mission

    Publication Year: 2014, Page(s):104 - 111
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (1488 KB) | HTML iconHTML

    Gaia is a 5-year ESA (European Space Agency) cornerstone mission launched at the end of 2013. Its main goal is the production of a 5-parameter astrometric catalogue (i.e. positions, parallaxes and the two components of the proper motions) at the micro-arcsecond level for about 1 billion stars of our Galaxy by means of high-precision measurements. The main task of the code presented in this paper i... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable high-quality 1D partitioning

    Publication Year: 2014, Page(s):112 - 119
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (816 KB) | HTML iconHTML

    The decomposition of one-dimensional workload arrays into consecutive partitions is a core problem of many load balancing methods, especially those based on space-filling curves. While previous work has shown that heuristics can be parallelized, only sequential algorithms exist for the optimal solution. However, centralized partitioning will become infeasible in the exascale era due to the vast am... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.