10-16 Nov. 2007
Filter Results
-
Programming bits and atoms
Publication Year: 2007, Page(s): 1
Cited by: Papers (1)No abstract available View full abstract»
-
A preliminary investigation of a neocortex model implementation on the Cray XD1
Publication Year: 2007, Page(s):1 - 8
Cited by: Papers (1)In this paper we study the acceleration of a new class of cognitive processing applications based on the structure of the neocortex. Specifically we examine the speedup of a visual cortex model for image recognition. We propose techniques to accelerate the application on general purpose processors and on reconfigurable logic. We present implementations of our approach on a Cray XD1 and compare the... View full abstract»
-
Anatomy of a cortical simulator
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (20) | Patents (3)Insights into brain's high-level computational principles will lead to novel cognitive systems, computing architectures, programming paradigms, and numerous practical applications. An important step towards this end is the study of large networks of cortical spiking neurons. We have built a cortical simulator, C2, incorporating several algorithmic enhancements to optimize the simulation scale and ... View full abstract»
-
Large-scale maximum likelihood-based phylogenetic analysis on the IBM BlueGene/L
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (11)Phylogenetic inference is a grand challenge in Bioinformatics due to immense computational requirements. The increasing popularity of multi-gene alignments in biological studies, which typically provide a stable topological signal due to a more favorable ratio of the number of base pairs to the number of sequences, coupled with rapid accumulation of sequence data in general, poses new challenges f... View full abstract»
-
Age-based packet arbitration in large-radix k-ary n-cubes
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (25) | Patents (14)As applications scale to increasingly large processor counts, the interconnection network is frequently the limiting factor in application performance. In order to achieve application scalability, the interconnect must maintain high bandwidth while minimizing variation in packet latency. As the offered load in the network increases with growing problem sizes and processor counts, so does the expec... View full abstract»
-
Performance adaptive power-aware reconfigurable optical interconnects for high-performance computing (HPC) systems
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (2) | Patents (1)As communication distances and bit rates increase, optoelectronic interconnects are being deployed for designing high-bandwidth low-latency interconnection networks for high performance computing (HPC) systems. While bandwidth scaling with efficient multiplexing techniques (wavelengths, time and space) are available, static assignment of wavelengths can be detrimental to network performance for no... View full abstract»
-
Evaluating network information models on resource efficiency and application performance in lambda-grids
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (2)A critical challenge for wide-area configurable networks is definition and widespread acceptance of Network Information Model (NIM). When a network comprises multiple domains, intelligent information sharing is required for a provider to maintain a competitive advantage and for customers to use a provider's network and make good resource selection decisions. We characterize the information that ca... View full abstract»
-
Using MPI file caching to improve parallel write performance for large-scale scientific applications
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (7)Typical large-scale scientific applications periodically write checkpoint files to save the computational state throughout execution. Existing parallel file systems improve such write-only I/O patterns through the use of client-side file caching and write-behind strategies. In distributed environments where files are rarely accessed by more than one client concurrently, file caching has achieved s... View full abstract»
-
Virtual machine aware communication libraries for high performance computing
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (38) | Patents (3)As the size and complexity of modern computing systems keep increasing to meet the demanding requirements of High Performance Computing (HPC) applications, manageability is becoming a critical concern to achieve both high performance and high productivity computing. Meanwhile, virtual machine (VM) technologies have become popular in both industry and academia due to various features designed to ea... View full abstract»
-
Investigation of leading HPC I/O performance using a scientific-application derived benchmark
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (19)With the exponential growth of high-fidelity sensor and simulated data, the scientific community is increasingly reliant on ultrascale HPC resources to handle their data analysis requirements. However, to utilize such extreme computing power effectively, the I/O components must be designed in a balanced fashion, as any architectural bottleneck will quickly render the platform intolerably inefficie... View full abstract»
-
Automatic resource specification generation for resource selection
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (14)With an increasing number of available resources in large-scale distributed environments, a key challenge is resource selection. Fortunately, several middleware systems provide resource selection services. However, a user is still faced with a difficult question: "What should I ask for?" Since most users end up using naïve and suboptimal resource specifications, we propose an automated way t... View full abstract»
-
Performance and cost optimization for multiple large-scale grid workflow applications
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (22)Scheduling large-scale applications on the Grid is a fundamental challenge and is critical to application performance and cost. Large-scale applications typically contain a large number of homogeneous and concurrent activities which are main bottlenecks, but open great potentials for optimization. This paper presents a new formulation of the well-known NP-complete problems and two novel algorithms... View full abstract»
-
Inter-operating grids through delegated matchmaking
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (10)The grid vision of a single computing utility has yet to materíalize: while many grids with thousands of processors each exist, most work in isolation. An important obstacle for the effective and efficient inter-operation of grids is the problem of resource selection. In this paper we propose a solution to this problem that combines the hierarchical and decentralized approaches for interconn... View full abstract»
-
Automatic software interference detection in parallel applications
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (5) | Patents (1)We present an automated software interference detection methodology for Single Program, Multiple Data (SPMD) parallel applications. Interference comes from the system and unexpected processes. If not detected and corrected such interference may result in performance degradation. Our goal is to provide a reliable metric for software interference that can be used in soft-failure protection and recov... View full abstract»
-
DMTracker: finding bugs in large-scale parallel programs by detecting anomaly in data movements
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (16)While software reliability in large-scale systems becomes increasingly important, debugging in large-scale parallel systems remains a daunting task. This paper proposes an innovative technique to find hard-to-detect software bugs that can cause severe problems such as data corruptions and deadlocks in parallel programs automatically via detecting their abnormal behaviors in data movements. Based o... View full abstract»
-
Scalable security for petascale parallel file systems
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (10) | Patents (1)Petascale, high-performance file systems often hold sensitive data and thus require security, but authentication and authorization can dramatically reduce performance. Existing security solutions perform poorly in these environments because they cannot scale with the number of nodes, highly distributed data, and demanding workloads. To address these issues, we developed Maat, a security protocol d... View full abstract»
-
The Cray BlackWidow: a highly scalable vector multiprocessor
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (17) | Patents (3)This paper describes the system architecture of the Cray BlackWidow scalable vector multiprocessor. The BlackWidow system is a distributed shared memory (DSM) architecture that is scalable to 32K processors, each with a 4-way dispatch scalar execution unit and an 8-pipe vector unit capable of 20.8 Gflops for 64-bit operations and 41.6 Gflops for 32-bit operations at the prototype operating frequen... View full abstract»
-
GRAPE-DR: 2-Pflops massively-parallel computer with 512-core, 512-Gflops processor chips for scientific computing
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (9)We describe the GRAPE-DR (Greatly Reduced Array of Processor Elements with Data Reduction) system, which will consist of 4096 processor chips each with 512 cores operating at the clock frequency of 500 MHz. The peak speed of a processor chip is 512Gflops (single precision) or 256 Gflops (double precision). The GRAPE-DR chip works as an attached processor to standard PCs. Currently, a PCI-X board w... View full abstract»
-
A case for low-complexity MP architectures
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (1)Advances in semiconductor technology have driven shared-memory servers toward processors with multiple cores per die and multiple threads per core. This paper presents simple hardware primitives enabling flexible and low-complexity multi-chip designs supporting an efficient inter-node coherence protocol implemented in software. We argue that our primitives and the example design presented in this ... View full abstract»
-
Variable latency caches for nanoscale processor
Publication Year: 2007, Page(s):1 - 10
Cited by: Papers (1)Variability is one of the important issues in nanoscale processors. Due to increasing importance of interconnect structures in submicron technologies, the physical location and phenomena such as coupling have an increasing impact on the latency of operations. Therefore, traditional view of rigid access latencies to components wil result in suboptimal architectures. In this paper, we devise a cache... View full abstract»
-
Data access history cache and associated data prefetching mechanisms
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (14) | Patents (2)Data prefetching is an effective way to bridge the increasing performance gap between processor and memory. As computing power is increasing much faster than memory performance, we suggest that it is time to have a dedicated cache to store data access histories and to serve prefetching to mask data access latency effectively. We thus propose a new cache structure, named Data Access History Cache (... View full abstract»
-
Scaling performance of interior-point method on large-scale chip multiprocessor system
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (4)In this paper we describe parallelization of interior-point method (IPM) aimed at achieving high scalability on large-scale chip-multiprocessors (CMPs). IPM is an important computational technique used to solve optimization problems in many areas of science, engineering and finance. IPM spends most of its computation time in a few sparse linear algebra kernels. While each of these kernels contains... View full abstract»
-
Data exploration of turbulence simulations using a database cluster
Publication Year: 2007, Page(s):1 - 11
Cited by: Papers (8)We describe a new environment for the exploration of turbulent flows that uses a cluster of databases to store complete histories of Direct Numerical Simulation (DNS) results. This allows for spatial and temporal exploration of high-resolution data that were traditionally too large to store and too computationally expensive to produce on demand. We perform analysis of these data directly on the da... View full abstract»
-
Parallel hierarchical visualization of large time-varying 3D vector fields
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (29)We present the design of a scalable parallel pathline construction method for visualizing large time-varying 3D vector fields. A 4D (i.e., time and the 3D spatial domain) representation of the vector field is introduced to make a time-accurate depiction of the flow field. This representation also allows us to obtain pathlines through streamline tracing in the 4D space. Furthermore, a hierarchical ... View full abstract»
-
Low-constant parallel algorithms for finite element simulations using linear octrees
Publication Year: 2007, Page(s):1 - 12
Cited by: Papers (6)In this article we propose parallel algorithms for the construction of conforming finite-element discretization on linear octrees. Existing octree-based discretizations scale to billions of elements, but the complexity constants can be high. In our approach we use several techniques to minimize overhead: a novel bottom-up tree-construction and 2:1 balance constraint enforcement; a Golomb-Rice enco... View full abstract»