Skip to Main Content
The parallelization strategy of the Physically-Constrained Iterative Deconvolution (PCID) algorithm is being altered and optimized to enhance performance on emerging multi-core architectures. This paper reports results from porting PCID to multi-core architectures including the JAWS supercomputer at the Maui HPC Center (60 TFLOPS of dual-dual Xeonreg nodes) and the Cell Cluster at AFRL in Rome, NY (52 TFLOPS of Playstation 3reg nodes with IBM Cell Broadband Enginereg multi-cores and 14 dual-quad Xeon headnodes). For 512times512 image sizes FFT performance exceeding 60 GFLOPS has been observed on dual-quad Xeon nodes. Multi-core architectures programmed with multiple threads delivered significantly better performance for parallelization of the low level image convolution operations compared to earlier parallelization across cluster nodes with MPI. Another focus of the PCID multi-core effort was to move from MPI message passing to a publish-subscribe-query approach to information management. The publish, subscribe and query infrastructure was optimized for large scale machines, such as JAWS, and features a ldquoloose couplingldquo of publishers to subscribers through intervening brokers. This change makes runs on large HPCs with thousands of intercommunicating cores more flexible and more fault tolerant.