By Topic

IBM Journal of Research and Development

Issue 1 • Date Jan. 2002

Filter Results

Displaying Results 1 - 7 of 7
  • Preface

    Page(s): 3 - 4
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (23 KB)  

    In October 2001 the IBM eServer pSeries 690 was announced. The p690, known in development by the code name Regatta, introduced the POWER4 microprocessor to the high end of the UNIX market. It embraces the IBM eServer objectives. Initially targeted for the pSeries, the POWER4 will also be used in iSeries servers. This technology will be rapidly deployed in the midrange and low end of the pSeries and iSeries offerings. It is the fastest 64-bit microprocessor shipping in the industry today after being introduced at 1.3 GHz. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • POWER4 system microarchitecture

    Page(s): 5 - 25
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (430 KB)  

    The IBM POWER4 is a new microprocessor organized in a system structure that includes new technology to form systems. The name POWER4 as used in this context refers not only to a chip, but also to the structure used to interconnect chips to form systems. In this paper we describe the processor microarchitecture as well as the interconnection architecture employed to form systems up to a 32-way symmetric multiprocessor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The circuit and physical design of the POWER4 microprocessor

    Page(s): 27 - 51
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (1511 KB)  

    The IBM POWER4 processor is a 174-million-transistor chip that runs at a clock frequency of greater than 1.3 GHz. It contains two microprocessor cores, high-speed buses, and an on-chip memory subsystem. The complexity and size of POWER4, together with its high operating frequency, presented a number of significant challenges for its multi-site design team. This paper describes the circuit and physical design of POWER4 and gives results that were achieved. Emphasis is placed on aspects of the design methodology, clock distribution, circuits, power, integration, and timing that enabled the design team to meet the project goals and to complete the design on schedule. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Functional verification of the POWER4 microprocessor and POWER4 multiprocessor systems

    Page(s): 53 - 76
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (278 KB)  

    This paper describes the methods and simulation techniques used to verify the microarchitecture design and functional performance of the IBM POWER4 processor and the POWER4-based Regatta system. The approach was hierarchical, based on but considerably expanding the practice used for verification of the CMOS-based IBM S/390 Parallel Enterprise Server™ G4. For POWER4, verification began at the abstract, high-level design phase and continued throughout the designer and unit levels, the multi-unit level, and finally the multiple-chip system level. The abstract (high-level design) phase permitted early validation of the POWER4 processor design prior to its commitment to HDL. The designer and unit-level stages focused on ensuring the correctness of the microarchitectural components. Multi-unit-level verification, performed on storage and I/O components as well as on the processor, confirmed architectural compliance for each of the chips and subsystems. Finally, system-level verification tested multiprocess or coherence and system-level function, including processor-to-I/O communication and validation of multiple hardware configurations. In parallel with design and functional validation, verification of reliability functions, performance, and degraded configurations was also performed at most of the levels in the hierarchy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fault-tolerant design of the IBM pSeries 690 system using POWER4 processor technology

    Page(s): 77 - 86
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (203 KB)  

    The POWER4-based p690 systems offer the highest performance of the IBM eServer pSeries™ line of computers. Within the general-purpose UNIX® server market, they also offer the highest levels of concurrent error detection, fault isolation, recovery, and availability. High availability is achieved by minimizing component failure rates through improvements in the base technology, and through design techniques that permit hard- and soft-failure detection, recovery, and isolation, repair deferral, and component replacement concurrent with system operation. In this paper, we discuss the fault-tolerant design techniques that were used for array, logic, storage, and I/O subsystems for the p690. We also present the diagnostic strategy, fault-isolation, and recovery techniques. New features such as POWER4 synchronous machine-check interrupt, PCI bus error recovery, array dynamic redundancy, and minimum-element dynamic reconfiguration are described. The design process used to verify error detecti on, fault isolation, and recovery is also described. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Infrastructure requirements for a large-scale, multi-site VLSI development project

    Page(s): 87 - 95
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (141 KB)  

    This paper describes the design infrastructure and environment that were established to support the multi-site design of the IBM POWER4 microprocessor. The Common Tools Environment was created to provide a consistent means for accessing design tools and initiating operating system variables from multiple sites in a site-independent manner. The AIX® operating system and the Common Tools Environment masked local, site-specific details of the design environment, allowing site-specific design practices, shared storage, and information system policies to be transparently maintained. The design data-management system, the importance of highly reliable wide- and local-area networks, and the establishment of automated network monitoring are discussed. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast pseudorandom-number generators with modulus 2k or 2k-1 using fused multiply-add

    Page(s): 97 - 116
    Save to Project icon | Click to expandQuick Abstract | PDF file iconPDF (699 KB)  

    Many numerically intensive computations done in a scientific computing environment require uniformly distributed pseudorandom numbers in the range (0, 1) and (−1, 1). For multiplicative congruential generators with modulus 2k, k ≤ 52, and period 2k-2, we show that the cost per random number for these two distributions is 3 and 3.125 multiply–adds on RS/6000® processors. Our code, on the IBM POWER2 Model 590, produces more than 40 million uniformly distributed pseudorandom numbers per second for both ranges (0, 1) and (−1, 1). Additionally, our code sustains the 40 million per second rate for data out of cache. The Numerical Aerodynamic Simulation (NAS) parallel benchmarks use a linear congruential generator with modulus 246. Our result is about 50 times faster than the generic implementation given in the benchmarks. The extra-accuracy fused multiply-add instruction of RS/6000 machines combined with a few algorithmic innovations gives rise to the 50-fold increase. If IEEE 64-bit arithmetic is used with our Fortran code on POWER and PowerPC® architectures, the results we obtain are bit-wise identical to the generic algorithms. The paper gives several illustrations of a general technique called the Algorithm and Architecture approach. We demonstrate herein that programmer-controlled unrolling of loops is equivalent to “customized vectorization of RISC-type code.” Customized vectorization is more powerful than ordinary vectorization, and it is only possible on RISC-type machines. We illustrate its use to show that RS/6000 processors can compute the distribution (−1, 1) at the rate of 3.125 multiply–adds. We also specify a linear congruential generator that is related to the multiplicative congruential generator referred to above. It has a full period of 2k, where 2k is the modulus. The cost per random number [in the range (0, 1)] for this ge- nerator is four multiply–adds on RS/6000 processors. Our code, on the IBM POWER2 Model 590, for this generator produces more than 30 million uniformly distributed pseudorandom numbers per second for the range (0, 1). We show that this generator is “embarrassingly parallel,” or EP. Using the Algorithm and Architecture approach, we describe a new concept called “generalized unrolling.” Finally, we present a multiplicative congruential generator for which the modulus is not a power of 2. Such a generator, as well as one with modulus 2k, is selectable as the generator used in the RANDOM_NUMBER intrinsic function of IBM XL Fortran and XL High Performance Fortran. All of the generators reported here are EP. Using an IBM SP2 machine with 250 wide nodes, it is possible to compute more than ten billion uniform random numbers in a second. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.

Aims & Scope

The IBM Journal of Research and Development is a peer-reviewed technical journal, published bimonthly, which features the work of authors in the science, technology and engineering of information systems.

Full Aims & Scope

Meet Our Editors

Editor-in-Chief
Clifford A. Pickover
IBM T. J. Watson Research Center