Scheduled System Maintenance on May 29th, 2015:
IEEE Xplore will be upgraded between 11:00 AM and 10:00 PM EDT. During this time there may be intermittent impact on performance. We apologize for any inconvenience.
By Topic

Parallel and Distributed Processing with Applications (ISPA), 2011 IEEE 9th International Symposium on

Date 26-28 May 2011

Filter Results

Displaying Results 1 - 25 of 66
  • [Front cover]

    Publication Year: 2011 , Page(s): C1
    Save to Project icon | Request Permissions | PDF file iconPDF (858 KB)  
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2011 , Page(s): i
    Save to Project icon | Request Permissions | PDF file iconPDF (30 KB)  
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2011 , Page(s): iii
    Save to Project icon | Request Permissions | PDF file iconPDF (114 KB)  
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2011 , Page(s): iv
    Save to Project icon | Request Permissions | PDF file iconPDF (168 KB)  
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2011 , Page(s): v - ix
    Save to Project icon | Request Permissions | PDF file iconPDF (152 KB)  
    Freely Available from IEEE
  • Message from the General Chairs

    Publication Year: 2011 , Page(s): x
    Save to Project icon | Request Permissions | PDF file iconPDF (128 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Message from the Program Chairs

    Publication Year: 2011 , Page(s): xi
    Save to Project icon | Request Permissions | PDF file iconPDF (134 KB) |  | HTML iconHTML  
    Freely Available from IEEE
  • Organizing Committee

    Publication Year: 2011 , Page(s): xii - xiii
    Save to Project icon | Request Permissions | PDF file iconPDF (133 KB)  
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2011 , Page(s): xiv - xvi
    Save to Project icon | Request Permissions | PDF file iconPDF (126 KB)  
    Freely Available from IEEE
  • Accelerating DFA Construction by Hierarchical Merging

    Publication Year: 2011 , Page(s): 1 - 6
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (274 KB) |  | HTML iconHTML  

    Regular expression matching is widely used in many network applications to analyze suspicious traffic against predefined signatures, and to discover anomalous events. Deterministic Finite Automaton (DFA), which recognizes a set of regular expressions, is the basic data structure to scan input traffic byte by byte. Though DFA meets the requirement of real-time processing of network traffic, constructing a combined DFA for a set of regular expression signatures is very time-consuming, especially when the signature set is large. To attack this problem, we propose new strategies to accelerate DFA construction. The basic idea of our method is to construct the combined DFA by hierarchical merging of the DFAs of each single regular expression. Our method runs in O(|Q||Σ| In n) time, which is substantially superior to the time complexity O(|Q||Σ|(Σi=1n|Qi|)2) of classical subset construction algorithm. Experiment on real signatures from open-source systems, such as L7-filter, BRO and SNORT, demonstrates that our method performs 45 times faster than the subset construction algorithm on average. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New Parallel Prefix Algorithm for Multicomputers

    Publication Year: 2011 , Page(s): 7 - 12
    Cited by:  Papers (2)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (395 KB) |  | HTML iconHTML  

    A new computation-efficient parallel prefix algorithm for message-passing multicomputers is presented. The algorithm uses only half-duplex communications. It provides the flexibility of choosing parameter values for either fewer computation time steps or fewer communication time steps to achieve the minimal running time based on the ratio of the time required by a communication step to the time required by a computation step. Thus, under certain conditions, the new algorithm can run faster than previous ones for the same multicomputer model. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Using Quantum Search to Solve Dynamic Maximum Network Flow Problem

    Publication Year: 2011 , Page(s): 13 - 18
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1090 KB) |  | HTML iconHTML  

    This paper proposes a quantum search based algorithm to improve the dynamic maximum network flow problem. First, we convert the time-dependent process into time-independent process. Second, we use the draining algorithm to find out the maximum flow and compare with classical algorithms. Third, the draining algorithm is speed up by utilizing quantum search. Finally, we also provide the analysis of time complexity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • HSK: A Hierarchical Parallel Simulation Kernel for Multicore Platform

    Publication Year: 2011 , Page(s): 19 - 24
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (419 KB) |  | HTML iconHTML  

    The development of CPU has stepped into the era of multi-core. Due to lack of support on thread level, most of the simulation platform can not take full advantage of multicore. To fulfill this gap, we proposed a hierarchical parallel simulation kernel(HSK) model. The model has two layers. The first layer, named process kernel, was responsible for managing all thread kernels on second layer. The second layer is a group of thread kernels, which were responsible for scheduling and advancing logical processes. Each thread kernel was mapped onto an executing thread to advance simulation parallel. In addition, two algorithms were proposed to support high performance: (1) To improve the communication efficiency between threads, we proposed a pointer-based communication mechanism. By using buffers, synchronization between threads can be annihilated. (2) To eliminate redundant Lower Bound on Time Stamp(LBTS) computation and not to interrupt thread execution, we employ an approximate method to compute LBTS asynchronously. A proof of validity was presented. The execution performance of HSK was demonstrated by a series of simulation experiments with a modified phold model. The HSK can achieve good speedup for applications, especially with coarse-grained event. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Non-blocking Programming Framework for Pipeline Application on Multi-core Platform

    Publication Year: 2011 , Page(s): 25 - 30
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (235 KB) |  | HTML iconHTML  

    Many applications meet certain programming patterns like pipeline, fork-join, do-all etc. While tools such as OS threads and OpenMP allow programmers only to express task or data parallelism, special support for programming patterns is distinctly lacking. Intel threading building blocks (TBB) is developed to address this problem, but its scheduler is general and not optimized for any of its parallel algorithms which include pipeline specially. In this paper, we provide a non-blocking framework for pipeline application on multi-core platform. We target linear pipeline in which each filter has one entrance and one exit. We design a novel work-stealing scheduler optimized specially for pipeline application: first, priority based stealing, priority is calculated for each filter in pipeline so that a worker can find the optimal "victim" easily when it needs to steal, second, multiple tasks can be stolen at a time so that much stealing time is reduced. A non-block queue is used to store intermediate result to reduce lock overhead and increase scalability. We apply our framework to four case studies, including text filter, two fish, ferret, ded up. And our framework reduces execution time of TBB by 72% in best case and 20% on average on an 8 core machine. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel Integral Image Generation Algorithm on Multi-core System

    Publication Year: 2011 , Page(s): 31 - 35
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (397 KB) |  | HTML iconHTML  

    Integral image becomes a very useful tool in most of the computer vision applications in recent years, and the parallelization strategy is the most popular approach for efficiently generating the integral image. However, most of current parallel integral image generation algorithms are developed based on the dedicated hardware architecture. In this paper, we developed a parallel integral image generation algorithm on Tile64 which is a MIMD-based embedded system. It can be found that our parallel integral image generation algorithm achieved 7.66 times more efficient than the original sequential integral image generation. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Impulse Noise Removal Algorithm by Considering Region-Wise Property for Color Image

    Publication Year: 2011 , Page(s): 36 - 40
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (506 KB) |  | HTML iconHTML  

    Impulse noise removal is one of the important image preprocessing techniques since the noise will lead the image processing procedures into an unexpected direction. The candidate-oriented strategy that detects the corrupted pixels (noise candidates) and then updates the intensity value of those pixels can achieve better performance than the brute-force strategy. However, conventional noise detection algorithms determine noises based on the pixel-wise relationship among a fixed sized observing window, and thus it will misclassify the normal edge and detailed pixels into the noises. In this paper, a novel region feature is presented to avoid the misclassification problem. The noises pixels are treated as the small-sized regions, and labeled by the multi-scale connected component labeling algorithm. In this way, the region size can be considered as a clue during the noise detection procedure. This newly developed region feature can be easily utilized to the current noise removal algorithms. The preliminary study results show that the number of misclassification of ROAD algorithm is dramatically reduced when considering the region feature, and thus the performance of conventional impulse noise removal algorithm can improved accordingly. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GPGPU Acceleration Algorithm for Medical Image Reconstruction

    Publication Year: 2011 , Page(s): 41 - 46
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (998 KB) |  | HTML iconHTML  

    Medical imaging techniques such as X-ray, Ultrasound, CT and MRI scan are widely used for diagnosis. The 2D medical images from these scans are difficult to interpret because they can only show cross section views of a human body. Interpreting these images requires experts or trained professionals. Reconstructing 2D images into 3D models can help with the interpretation process. However, such model reconstruction is normally time-consuming and costly. It requires high performance computation, such as grid or parallel computing. This research, thus, proposes a high performance 3D reconstruction method using the General-Purpose computation on Graphics Processing Units (GPGPU). The GPGPU has a high computational performance. Parallel computing method on GPU can thus regenerate a model for real time 3D visualization. In other words, the GPU computational speed sufficiently improves the visualization effectiveness of both images and models to the point where a real-time navigation of the data set is possible. In our work, the 3D reconstruction process reconstructs a set of 2D cross-section images and stacks them to generate a volume data, and then transform them into a 3D model. The generated models are then displayed on the user interface developed with OpenGL. Finally, the performance of the GPU acceleration is presented in this paper. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Equilibrium-Based Approach for Determining Winners in Combinatorial Auctions

    Publication Year: 2011 , Page(s): 47 - 51
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (446 KB) |  | HTML iconHTML  

    Determining winners in combinatorial auctions is an NP-complete problem. Based on the idea of searching Nash Equilibria (NE), this paper presents a local search procedure to determine winners. To improve the solution quality calculated by the local search, we propose Nash Equilibrium Search Approach (NESA) to probe various NE solutions. According to the simulation results, auctioneer's revenue in NESA is at least 1.08% and 9.10% more than that in a general GA and Casanova, respectively. A stable solution based on NESA will be obtained within 400 seconds. Moreover, simulation results show that the solution quality of NESA is near-optimal. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Parallel 3D Delaunay Triangulation Method

    Publication Year: 2011 , Page(s): 52 - 56
    Cited by:  Papers (1)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (219 KB) |  | HTML iconHTML  

    Delaunay triangulation is a common mesh generation method in scientific computation. This parallel 3D Delaunay triangulation method uses domain-decomposition approach. With the properties of Delaunay triangulation, this method devise algorithm when merge block triangulations. To reduce the communications between processors, it finds the 3D affected zone that may be modified during the merge of two sub-block triangulations. The merging triangulation can be generated with the search just on the boundary of block triangulations. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient Resource Selection Algorithm for Enterprise Grid Systems

    Publication Year: 2011 , Page(s): 57 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (392 KB) |  | HTML iconHTML  

    This paper addresses a resource selection problem for applications that update data in enterprise grid systems. The problem is insufficiently addressed as most of the existing resource selection approaches in grid environments primarily deal with read-only job. We propose a simple yet efficient algorithm that deals with the complexity of resource selection problem in enterprise grid systems. The problem is formulated as a Multi Criteria Decision Making (MCDM) problem. Our proposed algorithm hides the complexity of resource selection process without neglecting important components that affect job response time. The difficulty on estimating job response time is captured by representing them in terms of different QoS criteria levels at each resource. Our experiments show that the proposed algorithm achieves very good results with good system performance as compared to existing algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance Evaluation of Wireless Sensor Networks for Mobile Sensor Nodes Considering Goodput and Depletion Metrics

    Publication Year: 2011 , Page(s): 63 - 68
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (709 KB) |  | HTML iconHTML  

    Sensor networks are a sensing, computing and communication infrastructure that are able to observe and respond to phenomena in the natural environment and in our physical and cyber infrastructure. In this paper, we propose a sensor network with mobile and static sensor nodes set-up for performing tasks like sensing a phenomenon or monitoring a region. We investigate how the sensor network performs in the case when the sensor nodes move. We compare the simulation results for two cases: the sensor network with stationary sensor nodes and multi mobile sensor nodes. The simulation results have shown that for the multi mobile sensors, the good put is unstable. The good put of stationary sensors is better than multi mobile sensors, but the consumed energy of multi mobile sensors is better than stationary sensors. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improving Performance on Atmospheric Models through a Hybrid OpenMP/MPI Implementation

    Publication Year: 2011 , Page(s): 69 - 74
    Cited by:  Papers (3)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (992 KB) |  | HTML iconHTML  

    This work shows how a Hybrid MPI/OpenMP implementation can improve the performance of the Ocean-Land-Atmosphere Model (OLAM) on a multi-core cluster environment, which is a typical HPC many small files workload application. Previous experiments have shown that the scalability of this application on clusters is limited by the performance of the output operations. We show that the Hybrid MPI/OpenMP version of OLAM decreases the number of output files, resulting in better performance for I/O operations. We also observe that the MPI version of OLAM performs better for unbalanced workloads and that further parallel optimizations should be included on the hybrid version in order to improve the parallel execution time of OLAM. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Using Pattern Matching Algorithms in MapReduce Applications

    Publication Year: 2011 , Page(s): 75 - 80
    Cited by:  Papers (6)
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (436 KB) |  | HTML iconHTML  

    In this paper, we study CPU utilization time patterns of several MapReduce applications. After extracting running patterns of several applications, they are saved in a reference database to be later used to tweak system parameters to efficiently execute unknown applications in future. To achieve this goal, CPU utilization patterns of new applications are compared with the already known ones in the reference database to find/predict their most probable execution patterns. Because of different patterns lengths, the Dynamic Time Warping (DTW)is utilized for such comparison, a correlation analysis is then applied to DTWs' outcomes to produce feasible similarity patterns. Three real applications (Word Count, Exim Mainlogparsing and Terasort) are used to evaluate our hypothesis in tweaking system parameters in executing similar applications. Results were very promising and showed effectiveness of our approach on pseudo-distributed MapReduce platforms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Dynamic Memory Demand Estimating Based on the Guest Operating System Behaviors for Virtual Machines

    Publication Year: 2011 , Page(s): 81 - 86
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (292 KB) |  | HTML iconHTML  

    In the virtualized environment, memory can be efficiently utilized if the dynamic memory demands of virtual machines can be estimated at runtime. An efficient memory estimator should report the appropriate size of the memory which can be made full use of by the virtual machine while keeping reasonable performance. However, the appropriate size is hard to be estimated accurately with low overhead. This paper presents a memory demand estimator based on the guest operating system behaviors architecturally visible to the virtual machine monitor, and it can accurately reports the expected appropriate memory size with negligible overhead. The estimator consists of two components which respectively, track the amount of the memory residing in virtual address space, and the memory used as page cache only accessible in kernel mode. The experimental results show that the estimation error is only 0.4%~2.1%, and the runtime overhead is only 0.8% on average due to no additional memory protection traps are introduced. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A Priority-Aware NoC to Reduce Squashes in Thread Level Speculation for Chip Multiprocessors

    Publication Year: 2011 , Page(s): 87 - 92
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (274 KB) |  | HTML iconHTML  

    Thread Level Speculation (TLS) is a technique aims at boosting the performance of sequential programs running on Chip Multiprocessors (CMPs) by automatically parallelizing them. It exempts programmers from the heavy task of parallel programming. But its performance may suffer from frequent squashing caused by inter-thread data dependency violation. In this paper, we propose a Network-on-Chip (NoC) in CMP that employs a priority-aware packet arbitration policy. Packet scheduling guided by such policy reduces the occurrence of TLS squashes. Simulation results with 5 applications show that our policy reduces squashes by 22% in best case and 15% on average. Moreover, our priority aware approach could be generalized to similar scenarios in which different threads running on CMP manifest different priorities. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.