By Topic

Parallel Computing in Electrical Engineering, 2006. PAR ELEC 2006. International Symposium on

Date 13-17 Sept. 2006

Filter Results

Displaying Results 1 - 25 of 84
  • International Symposium on Parallel Computing in Electrical Engineering - Cover

    Page(s): c1
    Save to Project icon | Request Permissions | PDF file iconPDF (621 KB)  
    Freely Available from IEEE
  • International Symposium on Parallel Computing in Electrical Engineering - Title

    Page(s): i - iii
    Save to Project icon | Request Permissions | PDF file iconPDF (157 KB)  
    Freely Available from IEEE
  • International Symposium on Parallel Computing in Electrical Engineering - Copyright

    Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (109 KB)  
    Freely Available from IEEE
  • International Symposium on Parallel Computing in Electrical Engineering - TOC

    Page(s): v - ix
    Save to Project icon | Request Permissions | PDF file iconPDF (166 KB)  
    Freely Available from IEEE
  • Scientific Programming for Heterogeneous Systems - Bridging the Gap between Algorithms and Applications

    Page(s): 3 - 8
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (120 KB) |  | HTML iconHTML  

    High performance computing in heterogeneous environments is a dynamically developing area. A number of highly efficient heterogeneous parallel algorithms have been designed over last decade. At the same time, scientific software based on the algorithms is very much under par. The paper analyses main issues encountered by scientific programmers during implementation of heterogeneous parallel algorithms in a portable form. It explains how programming systems can address the issues in order to maximally facilitate implementation of parallel algorithms for heterogeneous platforms and outlines two existing programming systems for high performance heterogeneous computing, mpC and HeteroMPI View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-Core Processors: New Way to Achieve High System Performance

    Page(s): 9 - 13
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (170 KB) |  | HTML iconHTML  

    Multi-core processors represent an evolutionary change in conventional computing as well setting the new trend for high performance computing (HPC) - but parallelism is nothing new. Intel has a long history with the concept of parallelism and the development of hardware-enhanced threading capabilities. Intel has been delivering threading-capable products for more than a decade. The move toward chip-level multiprocessing architectures with a large number of cores continues to offer dramatically increased performance and power characteristics. Nonetheless, this move also presents significant challenges. This paper describes how far the industry has progressed and evaluates some of the challenges we are facing with multi-core processors and some of the solutions that have been developed View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Challenges to the Design of Mobile Middleware Systems

    Page(s): 14 - 19
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (114 KB) |  | HTML iconHTML  

    Mobile networks provide mobile users with access to computing services and resources anywhere, anytime. While each mobile device has limited resources and services, all of them, by networking, can create a powerful computing mobile platform. The role of the mobile middleware is to facilitate this platform. This paper discusses the main features of mobile networks that represent challenges to the design of a cost-effective mobile middleware layer, then presents several ongoing middleware projects and, in the end, focuses on an original solution View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Automatic High Voltage Apparatus Optimization: Making it More Engineer-Friendly

    Page(s): 20 - 30
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (641 KB) |  | HTML iconHTML  

    A key aspect in the design and optimization process of high voltage apparatus is the precise simulation and geometric optimization of the electric electromagnetic field distribution on electrodes and dielectrics. Since these simulations and optimizations are rather compute intensive, the engineer demands a user friendly working environment requiring as little knowledge as possible with regard to the computer specific aspects of the simulation and optimization process. Furthermore, the engineer demands the optimization run to finish as quickly as possible ("push button solution"), i.e. runtimes for extensive optimizations must be kept at an acceptable level. This paper describes a design and optimization working environment for high voltage apparatus that has been developed and implemented in a joint cooperation project between Technische Universitat Munchen and Asea Brown Boveri (ABB). Furthermore, some methods that enable the programmer accelerate and adapt the simulation program to specific CPU architectures are introduced. Three practical examples on which the working environment has been tested are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Fault-Tolerant Dynamic Fetch Policy for SMT Processors in Multi-Bus Environments

    Page(s): 31 - 36
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (131 KB) |  | HTML iconHTML  

    Modern microprocessors get more and more susceptible to transient faults, e.g. caused by high-energetic particles due to high integration, clock frequencies, temperature and decreasing voltage supplies. A newer method to speed up contemporary processors at small space increase is simultaneous multithreading (SMT). With the introduction of SMT, instruction fetch- and issue policies gained importance. SMT processors are able to simultaneously fetch and issue instructions from multiple instruction streams. In this work, we focus on how dynamic bus arbitration and scheduling of hardware threads within the processors front-end can help to dynamically adjust fault coverage and performance. The novelties which help to reach this goal are: a multi-bus-scheduling scheme which can be used to tolerate permanent bus faults and single event disturbances (SEDs). The second novelty can be used in conjunction with the first: a dynamic fetch scheduling algorithm for a simultaneous multithreaded processor, leading to the introduction of dynamic multithreading. Dynamically multithreaded processors are able to switch between different SMT fetch policies, thus enabling a graceful degradation of the processors front-end View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Efficient SRA Based Isomorphic Task Allocation Scheme for k - ary n - cube Massively Parallel Processors

    Page(s): 37 - 42
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (351 KB) |  | HTML iconHTML  

    A good task allocation algorithm should find available processors for incoming jobs, if they exist, with minimum overhead. Due to its topological generality and flexibility the k-ary n-cube architecture has been chosen for the task allocation problem. We propose a fast and efficient isomorphic processor allocation scheme for k-ary n-cube systems by using isomorphic partitioning where the processor space is partitioned into higher dimensional isomorphic subcube and by using subcube recognition ability algorithm (SRA) which uses simple coordinate calculation and spatial subtraction. Thus the proposed scheme seeks to reduce the search space drastically, and hence can locate a free subcube very quickly providing scalable, faster, processor allocation, complete recognition ability with minimal overhead and minimizes the fragmentation View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Balanced Spatio-Temporal Data Warehouse with R-MVB, STCAT and BITMAP Indexes

    Page(s): 43 - 48
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (218 KB) |  | HTML iconHTML  

    In this article, we present new indexing and balancing methods used in parallel spatio-temporal data warehouse (B-PSTDW). Main motivation for designing the B-PSTDW system are unsatisfying results from exploitation of spatial data warehouse with cascaded star model indexed with aR-tree, which was used for distributed telemetric data processing (DSDW(t)). The DSDW(t) and B-PSTDW(t) services data from media counters (electricity, gas, heat and water). We present results of research on B-PSTDW system with new indexing structures: R-MVB, STCAT, and BITMAP View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Algorithm to Embed Hamiltonian Cycles in Crossed Cubes

    Page(s): 49 - 54
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    In this paper, we study the problem of embedding a family of regularly-structured Hamiltonian cycles in a crosses cube. Since the crossed cube shows performance improvement over a regular hypercube in many aspects, we are interested in knowing whether it has the comparable capability in terms of structure embedding - specifically in this paper, the embedding of Hamiltonian cycles. It needs to be pointed out that we are only considering a family of Hamiltonian cycles that can be systematically constructed, characterized by the permutation of link dimensions (link permutation for short). The total number h(n) of Hamiltonian cycles in a regular n-dimensional hypercube happens to be huge, and many of them cannot be constructed in a systematic way. The exact h(n) has not been established for large n. The main work of this paper is that for those Hamiltonian facilitating permutations, we propose an algorithm that works out a well-structured Hamiltonian cycle View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Application-Driven Development of Concurrent Packet Processing Platforms

    Page(s): 55 - 61
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (209 KB) |  | HTML iconHTML  

    We have developed an application-driven methodology for implementing parallel and heterogeneous programmable platforms. We deploy our flow for network access platforms where we have to trade off flexibility against costs and performance. Our methodology therefore focuses on characterizing the application domain as early as possible. With this input, we can narrow the design space to one major design trajectory that starts with the most flexible solution and refines the platform architecture systematically to meet performance and costs constraints. Our flow includes an efficient path to implementation in hardware and software. The software implementation framework takes a modular application description and generates code for embedded processors that can easily be ported to different platforms and used for profiling. Different communication architectures, co-processors, and specializations of programmable processing elements can be derived from profiling results to affect the platform hardware. A DSL access multiplexer (DSLAM) is used as an example throughout the paper to depict the different phases of our design process View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integrating SHECS-Based Critical Sections with Hardware SMP Scheduler in TLP-CMPs

    Page(s): 62 - 67
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (334 KB) |  | HTML iconHTML  

    This document presents the concept of integrating the SHECS (shared explicit cache system)-based critical sections with SMP scheduler to obtain the efficient general purpose hardware mutual exclusion facility in the TLP-CMP (thread-level parallelism-chip multiprocessing) SMP (symmetric multiprocessing) architectures. There are presented two solutions - the first integrates the SHECS-based CS with software multi-queue SMP scheduler, the second integrates the SHECS-based CS with hardware multi-queue SMP scheduler implemented as an additional functional unit within the TLP-CMP. The both propositions are implemented and simulated with using SoC (system-on-chip) such as Intelreg IXP 2800 network processor. The results of prove-of-concept simulation (obtained with the IXA SDK 4.2 Workbench simulation environment) are presented and discussed in this document View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evolutionary Multiprocessor Task Scheduling

    Page(s): 68 - 76
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (135 KB) |  | HTML iconHTML  

    The genetic algorithm has, to date, been applied to a wide range of problems. It is an ideal tool to solve problem in need of multiple, often interdependent requirements. This is because it has the ability to search within a large solution space while at the same time meeting criteria and constraints within the problem's boundaries. In this paper, we apply this heuristic to the problem of multiprocessor task scheduling - assigning a group of predefined tasks to a set of predefined processors. This task execution should take a minimum amount of time while taking into account certain constraints - e.g., prerequisite constraints between the tasks. Aside from using the genetic algorithm, we incorporate a local search method called a memetic within the genetic algorithm as a global search. Since the tasks are operating in a multiprocessor environment, we also attempt to reduce processor temperature by reducing the total power consumption and load balancing amongst the processors View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast Matrix Multiplication in Dynamic SMP Clusters with Communication on the Fly in Systems on Chip Technology

    Page(s): 77 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (189 KB) |  | HTML iconHTML  

    This paper concerns numerical computations in a new shared memory system architecture oriented towards systems on chip technology. Dynamically reconfigurable processor clusters which adjust at program run-time to computation and communication requirements of programs and a new data exchange method between processors - called "communication on the fly" are main assumed architectural features. They provide a synergy of processor switching between clusters with data reads on the fly by many processors in the cluster while being written by the switched processor into memory. The paper presents results of simulated execution of matrix multiplication parallel program graphs. Considered graphs are based on two data decomposition methods: recursive division of matrices into squares and division into stripes. Elementary serial multiplications of square submatrices in parallel algorithms are done using Strassen method. The experiments show high efficiency of the proposed matrix multiplication method View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Open MP Extension for Multithreaded Computing with Dynamic SMP Processor Clusters with Communication on the Fly

    Page(s): 83 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (143 KB) |  | HTML iconHTML  

    This paper presents a possible extension of the Open MP library for programming parallel multithreaded computations in the architecture of dynamic SMP clusters with communication on the fly. Dynamic SMP clusters are composed of processors directly connected to the same local shared memory modules with the composition of the clusters arranged at program runtime. Inter-processor communication in such clusters is based on a new "communication on the fly" paradigm. It enables direct communication between processor data caches and eliminates many data read/write transactions concerning memory modules. New functions are presented that should be introduced to standard Open MP library to enable writing parallel multithreaded programs with communication based on the new architectural features. These functions are illustrated using the example of a program in C for parallel matrix multiplication based on data decomposition into quarters View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Generalised Resource Model for Parallel Instruction Scheduling

    Page(s): 89 - 94
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (139 KB) |  | HTML iconHTML  

    In this paper we introduce a generalised resource model for parallel instruction scheduling. This model is used to formulate the resource constraints for periodic loop schedules, which are then rewritten employing an efficient flow graph model. The generalisation leads to a significant simplification and acceleration of the painful process of modelling new resource classes, and of incorporating specific processor features. Moreover, the model grants an accurate representation of the processor resources. We illustrate these properties at the examples of functional units and processor registers View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Program Graph Structuring for Execution in Dynamic SMP Clusters Using Moldable Tasks

    Page(s): 95 - 100
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (700 KB) |  | HTML iconHTML  

    The paper concerns task scheduling in dynamic SMP clusters based on the notion of moldable computational tasks. Such tasks have been used as atomic elements in program scheduling algorithms with warranty of schedule length. For program execution, a special shared memory system architecture is used. It is based on dynamic processor clusters, organized around shared memory modules by switching of processors between memory module busses. Fast shared data transfers between processors inside such clusters can be performed through data reads on the fly. The dynamic SMP clusters are implemented inside system on chip (SoC) modules additionally connected by a central global network. A task scheduling algorithm is presented for program macro dataflow graphs for execution in the assumed architecture. The algorithm first identifies a set moldable tasks in a given program graph. Next, this set is scheduled using a 2-phase algorithm including allotment of resources to moldable tasks and final list scheduling, with a warranty of schedule length. The complete algorithm has been implemented as a program package and examined using simulated execution of scheduled program graphs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The Centisecond Two Levels Hidden Semi Markov Model (CTLHSMM)

    Page(s): 101 - 104
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (160 KB) |  | HTML iconHTML  

    A major deficiency of standard hidden Markov models (HMM) is that both the spectral and the prosodic feature are uniformly processed. To combine more efficiently the prosodic cues with the acoustic ones, a segmental two levels hidden Markov model has been recently studied by Suaudeau [Suaudeau 94]. In this paper, we present an adapted version of this model in which the segmental processing is replaced by the classical centisecond processing. This new model is called centisecond two levels hidden semi Markov model (CTLHSMM). Our approach retains the traditional hierarchical structure of an HMM, and facilitate the introduction of others prosodic parameters [Caliope 89] (in particular the energy) in the phonetic level. Experiments on a French database composed of 20 numbers show that this model reduces the recognition error rates View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Energy Optimisation in Resilient Self-Stabilizing Processes

    Page(s): 105 - 110
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (112 KB) |  | HTML iconHTML  

    When performing an algorithm in the self-stabilizing model, a distributed system must achieve a desirable global state regardless of the initial state, whereas each node has only local information about the system. Depending on adopted assumptions concerning the model of simultaneous execution and scheduler fairness, some algorithms may differ in stabilization time or possibly not stabilize at all. Surprisingly, we show that the class of polynomially-solvable self-stabilizing problems is invariant with respect to the assumption of weak scheduler fairness. Furthermore, for systems with a single distinguished vertex we prove a much stronger equivalence, stating that synchronisation, the existence of a central scheduler and its fairness have no influence on polynomial stabilization time View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Building Mini-Grid Environments with Virtual Private Networks: A Pragmatic Approach

    Page(s): 111 - 115
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (137 KB)  

    At our university, we have a number of small-to-medium-size compute clusters and some technical simulations which could benefit from using several of these clusters simultaneously. To reach this goal, we discuss the formation of a "mini-grid" using a virtual private network to couple clusters on the message-passing level. This is a simple and pragmatic approach for those cases where the functionality (and complexity) of full grid middleware stacks is not strictly necessary. We evaluate the performance of our approach with a cellular automata simulation using MPI. Experiments show that the use of a VPN offers sufficient communication performance for our application example View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Specification, Analysis and Testing of Grid Environments Using Abstract State Machines

    Page(s): 116 - 120
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (323 KB) |  | HTML iconHTML  

    Abstract state machines (ASM) are mathematically defined environment for high-level system design, verification and analysis. This paper presents a proposition of a hybrid approach to the specification, analysis and testing of grid middleware using ASM. This approach allows an easy integration of created specification of developed middleware with existing components of grid systems. The important advantage of this approach is an automatic generation of test procedures for the implementation, following the model-based testing approach. This allows a smooth transition from the specification to implementation stage, as well as investigation of features of specification and implementation, at every stage of their development. Also, we propose a software environment which implements the proposed approach. Its use in practice helps to create more reliable grid systems View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Two-Level Approach to Building a Campus Grid

    Page(s): 121 - 126
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (143 KB) |  | HTML iconHTML  

    The article proposes a two-layered system for distributed computation. The setup combines two different tools - Globus and Mosix - in order to harness the computing power wasted in unused student laboratories. The system is easy to set up and use. We present the results of experiments on a simple testbed, using the Google PageRank algorithm as an example task View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Complexity of Collective Communications on NoCs

    Page(s): 127 - 133
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (180 KB) |  | HTML iconHTML  

    The paper addresses the important issue related to communication performance of networks on chip (NoCs), namely the complexity of collective communications measured by a required number of algorithmic steps. Three NoC topologies are investigated, a ring network, Octagon and 2D-mesh, due to their easy manufacturability on a chip. The lower complexity bounds are compared to real values obtained by evolution-based optimizing tools. Results give hints on what communication overhead is to be expected in ring- and mesh-based NoCs with the wormhole switching, full duplex links and k-port non-combining nodes View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.