By Topic

IBM Journal of Research and Development

Issue 1.2 • Date Jan. 2008

Filter Results

Displaying Results 1 - 19 of 19
  • Message

    Publication Year: 2008 , Page(s): 2
    Save to Project icon | PDF file iconPDF (61 KB)  
    Freely Available from IEEE
  • Preface

    Publication Year: 2008 , Page(s): 3 - 5
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (51 KB)  

    The most powerful computer systems of today—or supercomputers—exploit massive parallelism to achieve their superlative performance. A recent TOP500™ list of the highest-performing computers in the world, compiled by TOP500.Org (www.top500.org) in June of 2007 as this special issue was being prepared, presents some striking performance numbers. The fastest computer in the world, an IBM System Blue Gene/Le Solution housed at Lawrence Livermore National Laboratory, harnesses the power of 131,072 processors working in parallel to achieve a performance of 280.6 teraflops per second on LINPACK, a linear algebra benchmark that is used to rank supercomputers. We note that a teraflop is a trillion floating-point operations such as multiplications or additions. The second, third, and fourth most powerful computers harness the power of between 23,000 and 41,000 processors. Indeed, the most powerful computer with fewer than 10,000 processors appears only in the eighth position on the list. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Advances in Rosetta protein structure prediction on massively parallel systems

    Publication Year: 2008 , Page(s): 7 - 17
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (390 KB)  

    One of the key challenges in computational biology is prediction of three-dimensional protein structures from amino-acid sequences. For most proteins, the “native state” lies at the bottom of a free-energy landscape. Protein structure prediction involves varying the degrees of freedom of the protein in a constrained manner until it approaches its native state. In the Rosetta protein structure prediction protocols, a large number of independent folding trajectories are simulated, and several lowest-energy results are likely to be close to the native state. The availability of hundred-teraflop, and shortly, petaflop, computing resources is revolutionizing the approaches available for protein structure prediction. Here, we discuss issues involved in utilizing such machines efficiently with the Rosetta code, including an overview of recent results of the Critical Assessment of Techniques for Protein Structure Prediction 7 (CASP7) in which the computationally demanding structure-refinement process was run on 16 racks of the IBM Blue Gene/L™ system at the IBM T. J. Watson Research Center. We highlight recent advances in high-performance computing and discuss future development paths that make use of the next-generation petascale (>1012 floating-point operations per second) machines. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel molecular dynamics simulations of lysozyme unfolding

    Publication Year: 2008 , Page(s): 19 - 30
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1199 KB)  

    We have performed molecular dynamics simulations for a total duration of more than 10 µs (with most molecular trajectories being 1 µs in duration) to study the effect of a single mutation on hen lysozyme protein stability and denaturing, using an IBM Blue Gene/L™ supercomputer. One goal of this study was to assess the use of certain force fields to reproduce experimental results of protein unfolding using thermal denaturing techniques. A second and more important goal was to gain microscopic insights into the mechanism of protein misfolding using both thermal and chemical denaturing techniques. We found that the thermal denaturing results were robust and reproducible with various force fields. The chemical denaturing results explained why the single amino-acid mutation on residue Trp62 causes the disruption of long-range interactions in the tertiary structure. Simulation results revealed that the Trp62 residue was the key to a cooperative long-range interaction within the wild-type protein. Specifically, Trp62 acts as a bridge between two neighboring basic residues through a π-type H-bond or π-cation interaction to form an Arg-Trp-Arg “sandwich-like” structure. Our findings support the general conclusions of the experiment and provide an interesting molecular depiction of the disruption of the long-range interactions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Brain-scale simulation of the neocortex on the IBM Blue Gene/L supercomputer

    Publication Year: 2008 , Page(s): 31 - 41
    Cited by:  Papers (11)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (567 KB)  

    Biologically detailed large-scale models of the brain can now be simulated thanks to increasingly powerful massively parallel supercomputers. We present an overview, for the general technical reader, of a neuronal network model of layers II/III of the neocortex built with biophysical model neurons. These simulations, carried out on an IBM Blue Gene/L™ supercomputer, comprise up to 22 million neurons and 11 billion synapses, which makes them the largest simulations of this type ever performed. Such model sizes correspond to the cortex of a small mammal. The SPLIT library, used for these simulations, runs on single-processor as well as massively parallel machines. Performance measurements show good scaling behavior on the Blue Gene/L supercomputer up to 8,192 processors. Several key phenomena seen in the living brain appear as emergent phenomena in the simulations. We discuss the role of this kind of model in neuroscience and note that full-scale models may be necessary to preserve natural dynamics. We also discuss the need for software tools for the specification of models as well as for analysis and visualization of output data. Combining models that range from abstract connectionist type to biophysically detailed will help us unravel the basic principles underlying neocortical function. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Identifying, tabulating, and analyzing contacts between branched neuron morphologies

    Publication Year: 2008 , Page(s): 43 - 55
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (492 KB)  

    Simulating neural tissue requires the construction of models of the anatomical structure and physiological function of neural microcircuitry. The Blue Brain Project is simulating the microcircuitry of a neocortical column with very high structural and physiological precision. This paper describes how we model anatomical structure by identifying, tabulating, and analyzing contacts between 104 neurons in a morphologically precise model of a column. A contact occurs when one element touches another, providing the opportunity for the subsequent creation of a simulated synapse. The architecture of our application divides the problem of detecting and analyzing contacts among thousands of processors on the IBM Blue Gene/L™ supercomputer. Data required for contact tabulation is encoded with geometrical data for contact detection and is exchanged among processors. Each processor selects a subset of neurons and then iteratively 1) divides the number of points that represents each neuron among column subvolumes, 2) detects contacts in a subvolume, 3) tabulates arbitrary categories of local contacts, 4) aggregates and analyzes global contacts, and 5) revises the contents of a column to achieve a statistical objective. Computing, analyzing, and optimizing local data in parallel across distributed global data objects involve problems common to other domains (such as three-dimensional image processing and registration). Thus, we discuss the generic nature of the application architecture. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Ligand discovery on massively parallel systems

    Publication Year: 2008 , Page(s): 57 - 67
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (246 KB)  

    Virtual screening is an approach for identifying promising leads for drugs and is used in the pharmaceutical industry. We present the parallelization of LIDAEUS (LIgand Discovery At Edinburgh UniverSity), creating a massively parallel high-throughput virtual-screening code. This program is being used to predict the binding modes involved in the docking of small ligands to proteins. Parallelization efforts have focused on achieving maximum parallel efficiency and developing a memory-efficient parallel sorting routine. Using an IBM Blue Gene/L™ supercomputer, runtimes have been reduced from 8 days on a modest seven-node cluster to 62 minutes on 1,024 processors using a standard dataset of 1.67 million small molecules and FKBP12, a protein target of interest in immunosuppressive therapies. Using more-complex datasets, the code scales upward to make use of the full processor set of 2,048. The code has been successfully used for the task of gathering data on approximately 1.67 million small molecules binding to approximately 400 high-quality crystallographically determined ligand-bound protein structures, generating data on more than 646 million protein-ligand complexes. A number of novel ligands have already been discovered and validated experimentally. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • EUDOC on the IBM Blue Gene/L system: Accelerating the transfer of drug discoveries from laboratory to patient

    Publication Year: 2008 , Page(s): 69 - 81
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (414 KB)  

    EUDOC™ is a molecular docking program that has successfully helped to identify new drug leads. This virtual screening (VS) tool identifies drug candidates by computationally testing the binding of these drugs to biologically important protein targets. This approach can reduce the research time required of biochemists, accelerating the identification of therapeutically useful drugs and helping to transfer discoveries from the laboratory to the patient. Migration of the EUDOC application code to the IBM Blue Gene/L™ (BG/L) supercomputer has been highly successful. This migration led to a 200-fold improvement in elapsed time for a representative VS application benchmark. Three focus areas provided benefits. First, we enhanced the performance of serial code through application redesign, hand-tuning, and increased usage of SIMD (single-instruction, multiple-data) floating-point unit operations. Second, we studied computational load-balancing schemes to maximize processor utilization and application scalability for the massively parallel architecture of the BG/L system. Third, we greatly enhanced system I/O interaction design. We also identified and resolved severe performance bottlenecks, allowing for efficient performance on more than 4,000 processors. This paper describes specific improvements in each of the areas of focus. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A massively parallel implementation of the common azimuth pre-stack depth migration

    Publication Year: 2008 , Page(s): 83 - 91
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (356 KB)  

    When accompanied by the appropriate algorithmic approach, seismic imaging is an application that can take advantage of massively parallel computer systems. Three-dimensional (3D) pre-stack time migration (PSTM) and pre-stack depth migration (PSDM) are key components of seismic imaging and require very large computing resources. In this paper, we show that execution of these algorithms can be dramatically accelerated by massive parallelism. Many oil exploration and service companies purchase supercomputing clusters for performing 3D PSTM and PSDM seismic imaging. The common azimuth migration (CAM) algorithm, ported to many architectures, is particularly well suited to offshore marine applications. This paper describes the porting of the CAM algorithm to the IBM Blue Gene/L™ supercomputer, which requires introducing a second level of parallelism, building a parallel 3D-FFT (fast Fourier transform) routine, optimizing a tri-diagonal solver for SIMD (single-instruction, multiple-data) floating-point units, and addressing various I/O concerns. We present results obtained by using up to 16,368 processors for actual data provided from a marine seismic acquisition. Finally, we provide recommendations for porting other pre-stack algorithms to a massively parallel environment. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel electrical-conductivity imaging of hydrocarbons using the IBM Blue Gene/L supercomputer

    Publication Year: 2008 , Page(s): 93 - 103
    Cited by:  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1634 KB)  

    Large-scale controlled-source electromagnetic (CSEM) three-dimensional (3D) geophysical imaging is now receiving considerable attention for electrical-conductivity mapping of potential offshore oil and gas reservoirs. To cope with the typically large computational requirements of the 3D CSEM imaging problem, our strategies exploit computational parallelism and optimized finite-difference meshing. We report on an imaging experiment utilizing 32,768 tasks (and processors) on the IBM Blue Gene/L™ (BG/L) supercomputer at the IBM T. J. Watson Research Center. Over a 24-hour period, we were able to image a large-scale marine CSEM field dataset that previously required more than 4 months of computing time on distributed clusters utilizing 1,024 tasks on an InfiniBand® fabric. The total initial data-fitting errors (i.e., “misfits”) could be decreased by 67% within 72 completed inversion iterations, indicating the existence of an electrically resistive region in the southern survey area below a depth of 1,500 m underneath the seafloor. The major part of the residual misfit stems from transmitter-parallel receiver components that have an offset from the transmitter sail line (broadside configuration). Modeling confirms that improved broadside data fits can be achieved by considering anisotropic electrical conductivities. While delivering a satisfactory gross-scale image for the depths of interest, the experiment provides important evidence for the necessity of discriminating between horizontal and vertical conductivities for maximally consistent 3D CSEM inversions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Large-scale gyrokinetic particle simulation of microturbulence in magnetically confined fusion plasmas

    Publication Year: 2008 , Page(s): 105 - 115
    Cited by:  Papers (3)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (437 KB)  

    As the global energy economy makes the transition from fossil fuels toward cleaner alternatives, nuclear fusion becomes an attractive potential solution for satisfying growing needs. Fusion, the power source of the stars, has been the focus of active research since the early 1950s. While progress has been impressive—especially for magnetically confined plasma devices called tokamaks—the design of a practical power plant remains an outstanding challenge. A key topic of current interest is microturbulence, which is believed to be responsible for the unacceptably large leakage of energy and particles out of the hot plasma core. Understanding and controlling this process is of utmost importance for operating current devices and designing future ones. In addressing such issues, the Gyrokinetic Toroidal Code (GTC) was developed to study the global influence of microturbulence on particle and energy confinement. It has been optimized on the IBM Blue Gene/L™ (BG/L) computer, achieving essentially linear scaling on more than 30,000 processors. A full simulation of unprecedented phase-space resolution was carried out with 32,768 processors on the BG/L supercomputer located at the IBM T. J. Watson Research Center, providing new insights on the influence of collisions on microturbulence. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scaling climate simulation applications on the IBM Blue Gene/L system

    Publication Year: 2008 , Page(s): 117 - 126
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (758 KB)  

    We examine the ability of the IBM Blue Gene/L™ (BG/L) architecture to provide ultrahigh-resolution climate simulation capability. Our investigations show that it is possible to scale climate models to more than 32,000 processors on a 20-rack BG/L system using a variety of commonly employed techniques. One novel contribution is our load-balancing strategy that is based on newly developed space-filling curve partitioning algorithms. Here, we examine three models: the Parallel Ocean Program (POP), the Community Ice CodE (CICE), and the High-Order Method Modeling Environment (HOMME). The POP and CICE models are components of the next-generation Community Climate System Model (CCSM), which is based at the National Center for Atmospheric Research and is one of the leading coupled climate system models. HOMME is an experimental dynamical “core” (i.e., the CCSM component that calculates atmosphere dynamics) currently being evaluated within the Community Atmospheric Model, the atmospheric component of CCSM. For our scaling studies, we concentrate on 1/10° resolution simulations for CICE and POP, and 1/3° resolution for HOMME. The ability to simulate high resolutions on the massively parallel systems, which will dominate high-performance computing for the foreseeable future, is essential to the advancement of climate science. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Terascale turbulence computation using the FLASH3 application framework on the IBM Blue Gene/L system

    Publication Year: 2008 , Page(s): 127 - 136
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (660 KB)  

    Understanding the nature of turbulent flows remains one of the outstanding questions in classical physics. Significant progress has been recently made using computer simulation as an aid to our understanding of the rich physics of turbulence. Here, we present both the computer science and the scientific features of a unique terascale simulation of a weakly compressible turbulent flow that includes tracer particles. (Terascale refers to performance and dataset storage use in excess of a teraflop and terabyte, respectively.) The simulation was performed on the Lawrence Livermore National Laboratory IBM Blue Gene/L™ system, using version 3 of the FLASH application framework. FLASH3 is a modular, publicly available code designed primarily for astrophysical simulations, which scales well to massively parallel environments. We discuss issues related to the analysis and visualization of such a massive simulation and present initial scientific results. We also discuss challenges related to making the database available for public release. We suggest that widespread adoption of an open dataset model of high-performance computing is likely to result in significant advantages for the scientific computing community, in much the same way that the widespread adoption of open-source software has produced similar gains over the last 10 years. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architecture of Qbox: A scalable first-principles molecular dynamics code

    Publication Year: 2008 , Page(s): 137 - 144
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (235 KB)  

    We describe the architecture of Qbox, a parallel, scalable first-principles molecular dynamics (FPMD) code. Qbox is a C++/Message Passing Interface implementation of FPMD based on the plane-wave, pseudopotential method for electronic structure calculations. It is built upon well-optimized parallel numerical libraries, such as Basic Linear Algebra Communication Subprograms (BLACS) and Scalable Linear Algebra Package (ScaLAPACK), and also features an Extensible Markup Language (XML) interface built on the Apache Xerces-C library. We describe various choices made in the design of Qbox that led to excellent scalability on large parallel computers. In particular, we discuss the case of the IBM Blue Gene/L™ platform on which Qbox was run using up to 65,536 nodes. Future design challenges for upcoming petascale computers are also discussed. Examples of applications of Qbox to a variety of first-principles simulations of solids, liquids, and nanostructures are briefly described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Blue Matter: Scaling of N-body simulations to one atom per node

    Publication Year: 2008 , Page(s): 145 - 158s
    Cited by:  Papers (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (470 KB)  

    N-body simulations present some of the most interesting challenges in the area of massively parallel computing, especially when the object is to improve the time to solution for a fixed-size problem. The Blue Matter molecular simulation framework was developed specifically to address these challenges, to explore programming models for massively parallel machine architectures in a concrete context, and to support the scientific goals of the IBM Blue Gene® Project. This paper reviews the key issues involved in achieving ultrastrong scaling of methodologically correct biomolecular simulations, particularly the treatment of the long-range electrostatic forces present in simulations of proteins in water and membranes. Blue Matter computes these forces using the particle-particle particle-mesh Ewald (P3ME) method, which breaks the problem up into two pieces, one that requires the use of three-dimensional fast Fourier transforms with global data dependencies and another that involves computing interactions between pairs of particles within a cutoff distance. We summarize our exploration of the parallel decompositions used to compute these finite-ranged interactions, describe some of the implementation details involved in these decompositions, and present the evolution of strong-scaling performance achieved over the course of this exploration, along with evidence for the quality of simulation achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fine-grained parallelization of the Car-Parrinello ab initio molecular dynamics method on the IBM Blue Gene/L supercomputer

    Publication Year: 2008 , Page(s): 159 - 175
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (457 KB)  

    Important scientific problems can be treated via ab initio-based molecular modeling approaches, wherein atomic forces are derived from an energy function that explicitly considers the electrons. The Car–Parrinello ab initio molecular dynamics (CPAIMD) method is widely used to study small systems containing on the order of 10 to 103 atoms. However, the impact of CPAIMD has been limited until recently because of difficulties inherent to scaling the technique beyond processor numbers about equal to the number of electronic states. CPAIMD computations involve a large number of interdependent phases with high interprocessor communication overhead. These phases require the evaluation of various transforms and non-square matrix multiplications that require large interprocessor data movement when efficiently parallelized. Using the Charm++ parallel programming language and runtime system, the phases are discretized into a large number of virtual processors, which are, in turn, mapped flexibly onto physical processors, thereby allowing interleaving of work. Algorithmic and IBM Blue Gene/L™ system-specific optimizations are employed to scale the CPAIMD method to at least 30 times the number of electronic states in small systems consisting of 24 to 768 atoms (32 to 1,024 electronic states) in order to demonstrate fine-grained parallelism. The largest systems studied scaled well across the entire machine (20,480 nodes). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scalable molecular dynamics with NAMD on the IBM Blue Gene/L system

    Publication Year: 2008 , Page(s): 177 - 188
    Cited by:  Papers (2)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (728 KB)  

    NAMD (nanoscale molecular dynamics) is a production molecular dynamics (MD) application for biomolecular simulations that include assemblages of proteins, cell membranes, and water molecules. In a biomolecular simulation, the problem size is fixed and a large number of iterations must be executed in order to understand interesting biological phenomena. Hence, we need MD applications to scale to thousands of processors, even though the individual timestep on one processor is quite small. NAMD has demonstrated its performance on several parallel computer architectures. In this paper, we present various compiler optimization techniques that use single-instruction, multiple-data (SIMD) instructions to obtain good sequential performance with NAMD on the embedded IBM PowerPC® 440 processor core. We also present several techniques to scale the NAMD application to 20,480 nodes of the IBM Blue Gene/L™ (BG/L) system. These techniques include topology-specific optimizations to localize communication, new messaging protocols that are optimized for the BG/L torus, topology-aware load balancing, and overlap of computation and communication. We also present performance results of various molecular systems with sizes ranging from 5,570 to 327,506 atoms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Massively parallel quantum chromodynamics

    Publication Year: 2008 , Page(s): 189 - 197
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (160 KB)  

    Quantum chromodynamics (QCD), the theory of the strong nuclear force, can be numerically simulated on massively parallel supercomputers using the method of lattice gauge theory. We describe the special programming requirements of lattice QCD (LQCD) as well as the optimal supercomputer hardware architectures for which LQCD suggests a need. We demonstrate these methods on the IBM Blue Gene/L™ (BG/L) massively parallel supercomputer and argue that the BG/L architecture is very well suited for LQCD studies. This suitability arises from the fact that LQCD is a regular lattice discretization of space into lattice sites, while the BG/L supercomputer is a discretization of space into compute nodes. Both LQCD and the BG/L architecture are constrained by the requirement of short-distance exchanges. This simple relation is technologically important and theoretically intriguing. We demonstrate a computational speedup of LQCD using up to 131,072 CPUs on the largest BG/L supercomputer available in 2007. As the number of CPUs is increased, the speedup increases linearly with sustained performance of about 20% of the maximum possible hardware speed. This corresponds to a maximum of 70.5 sustained teraflops. At these speeds, LQCD and the BG/L supercomputer are able to produce theoretical results for the next generation of strong-interaction physics. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Overview of the IBM Blue Gene/P project

    Publication Year: 2008 , Page(s): 199 - 220
    Cited by:  Papers (43)  |  Patents (1)
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1047 KB)  

    On June 26, 2007, IBM announced the Blue Gene/P™ system as the leading offering in its massively parallel Blue Gene® supercomputer line, succeeding the Blue Gene/L™ system. The Blue Gene/P system is designed to scale to at least 262,144 quad-processor nodes, with a peak performance of 3.56 petaflops. More significantly, the Blue Gene/P system enables this unprecedented scaling via architectural and design choices that maximize performance per watt, performance per square foot, and mean time between failures. This paper describes our vision of this petascale system, that is, a system capable of delivering more than a quadrillion (1015) floating-point operations per second. We also provide an overview of the system architecture, packaging, system software, and initial benchmark results. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Clifford A. Pickover
IBM T. J. Watson Research Center