By Topic

High Performance Computing and Grid in Asia Pacific Region, 2004. Proceedings. Seventh International Conference on

Date 20-22 July 2004

Filter Results

Displaying Results 1 - 25 of 86
  • Comparative analysis of high-performance clusters' communication environments using HPL test

    Page(s): 473 - 479
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (324 KB) |  | HTML iconHTML  

    We develop the system software for high-performance clusters and build such clusters using our original methodology. Design study is performed on basis of the analytical models of the cluster (a modification of the well-known LogGP model) and test benchmark being used. In present paper different communication environments being used in high performance clusters are compared. The notion of cluster's efficiency is defined as the ratio between peak and actual performances of the cluster. The results of comparative analysis of the efficiency of clusters using Myrinet, Fast Ethernet and Gigabit Ethernet, as well as the results of comparative analysis of the efficiency of clusters forming Topi00 and Myrinet-clusters built in the Institute are discussed. Presently we have built several clusters including 64 dual Xeon 3.06 GHz Myrinet-cluster for National Academy of Sciences of the Republic of Armenia. Oscar package is used as the control system, some original software being added. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Model development for the global warming prediction by using the Earth Simulator

    Page(s): 480 - 486
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (780 KB) |  | HTML iconHTML  

    A high resolution atmosphere-ocean coupled model has been developed in order to improve a Global Warming simulation. Based on preliminary experiments, it was decided that an atmospheric component is T106 global spectral model with 56 levels and an ocean component is 1/4° 1/6°grid point mode with 46 levels. A land surface model (MATSUSIRO) and a run-off model are incorporated. Corresponding to the increase in horizontal and vertical resolution, subroutines for physical processes are reexamined. Various options have been examined and final specification of the model was decided. Coupling between atmosphere component and ocean component is done through MPMD (multiple program and multiple data) algorithm. 10 nodes in the Earth Simulator (ES) are allocated to the atmospheric component and 76 nodes are allocated to the oceanic component. The version 0 of the coupled model was completed in 2003 and 40 year control run was conducted, and 1% increase of CO2 run was also conducted for 40 years. Simulation of atmospheric and oceanic phenomena was improved at many respects. It is suggested that a regional climate change may be achieved. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Colored Petri nets based modeling and simulation of mixed workload interaction in a nondedicated cluster

    Page(s): 294 - 303
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (355 KB) |  | HTML iconHTML  

    This work presents a colored Petri nets based modeling and performance evaluation of mixed (interactive and parallel) workloads in a nondedicated cluster environment. To control the interactions between the two types of workloads, we propose to constrain the scheduling of local interactive processes by a measure of the maximum response time expected by the workstation (WS) user. The measure is assumed obtained through empirical studies. We propose a scheduling scheme that within the max-response time cycle computes time quanta to satisfy both local interactive processes present in the system and the parallel task process. We proposed a colored Petri net (CPN) model of the scheduling scheme and simulations have shown the effectiveness of the proposed method in allowing the parallel task to ensure a minimum speedup even in heavy loaded situations and to maximize the speedup adoptively depending on the load conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A compensation-based scheduling scheme for grid computing

    Page(s): 334 - 342
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (331 KB) |  | HTML iconHTML  

    Wide fluctuations in the availability of idle processor cycles and communication latencies over multiple resource administrative domains present a challenge to provide quality of service in executing grid applications. In this paper, we propose an adaptive scheduling framework, compensation-based scheduling, to provide predictable execution times by using feedback control. The compensation-based scheduling compensates resource loss during application execution by dynamically allocating additional resources. The framework is evaluated by experiments conducted on the ALICE scheduler. Scalability simulation studies show that the gaps between actual and estimated execution times are reduced to less than 15% of the estimated execution times. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Queuing network modeling of a cluster-based parallel system

    Page(s): 304 - 307
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (299 KB) |  | HTML iconHTML  

    In This work we present two analytical models for a cluster-based parallel system based on open queuing network model (QNM). The parallel system under consideration uses hypercube topology for its interconnection network. The proposed models are general enough to model various types of parallel applications. A multichain QNM is developed that can consider task migration between the nodes. An equivalent single chain model is also developed to enhance computation efficiency. Each of them can model real systems with different policies to execute parallel tasks. Numerical study of the proposed models is carried out that confirm their validity. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Relis-G: remote library install system for computational grids

    Page(s): 44 - 53
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (545 KB) |  | HTML iconHTML  

    When a grid user installs the library which is used by the GridRPC etc. to the servers distributed over a grid, the user needs to solve complicated and laborious problems, such as user's effort to input commands for remote operation, creation of an installation package of the library for heterogeneous server systems, avoidance of redundant compilation, and observance of administration policy of each server. In order to enable the user to solve these problems easily, we propose the remote library install system, Relis-G, which is highly portable and multipurpose in the grid. Besides the function for automatic remote library installation, this system offers functions, such as automatic creation of an installation package, automatic avoidance of redundant compilation, and automatic observance of administration policy of each server. Moreover, we confirmed that this system mitigated the above-mentioned user's burden greatly by operation tests. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Drug discovery using grid technology

    Page(s): 352 - 356
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (486 KB) |  | HTML iconHTML  

    A number of computer resources, such as CPUs and storages, can be connected over networks to construct a huge virtual computing environment using grid technologies. Our project "g-Drug Discovery" aims to develop a platform for drug discovery using grid technologies, on which various analysis and calculations are conducted, such as molecular mechanics method, replica exchange method, docking with proteins, molecular orbital method, and 3-dimensional quantitative structure activity relationship. For this aim we have specified a markup language for drug discovery (DrugML) and constructed database system. In this note we report some results of a ligand-receptor docking simulation, which is calculated on this platform using a grid RPC system "OmniRPC" and virtual screening software "Xsi" (ku-su-shi). We assume only 2-dimensional structure of ligand and 3-dimensional structure of receptor, and reproduce the complex of these molecules. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The impact of local priority policies on grid scheduling performance and an adaptive policy-based grid scheduling algorithm

    Page(s): 343 - 346
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (254 KB) |  | HTML iconHTML  

    This work presents a performance evaluation of a scheduling algorithm on a computational grid where each site employs a different local scheduling policy. It is demonstrated that when some sites apply a priority policy in favor of local jobs, other sites suffer from much longer response time. We propose an adaptive site selection algorithm for a grid scheduler based on priority policies of local schedulers in order to reduce the severity of such effect without interfering the autonomy of local schedulers. The results show that the proposed algorithm can lower the difference in average wait times among the sites with different priority-based scheduling policies. The algorithm can perform effectively under various levels of workload and fractions of sites with different policies. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel algorithm for mapping parallel applications in computational grid environments

    Page(s): 347 - 350
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (281 KB) |  | HTML iconHTML  

    This work explains a heuristic algorithm, task self mapping algorithm (TSMA), for mapping parallel applications in computational grids. The strategy of the algorithm is that each task of a parallel application has an associated execution cost, which is the execution cost of the processor on which the task maps, and each of them minimises its associated execution cost by mapping itself on a new processor. While each task is optimising its execution cost also means that the application execution cost is being optimised. Experimental results have shown that TSMA produces better mapping solutions than graph partitioning based mapping algorithms. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast mapping on Myrinet networks

    Page(s): 462 - 466
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (297 KB) |  | HTML iconHTML  

    This work presents an alternate method for discovering new switches and comparing switches when mapping a Myrinet network. Existing methods for mapping Myrinet networks rely on timeouts and are prone to false negatives due to network deadlock. Our algorithm increases the mapper's speed by providing a fast negative answer without relying on a timeout. In addition, the mapper's reliability is increased because it allows the mapper to detect if network deadlock has occurred. Mapping time on our 1024 port network has been reduced from approximately 7 minutes to 15 seconds. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pink: a 1024-node single-system image Linux cluster

    Page(s): 454 - 461
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (488 KB) |  | HTML iconHTML  

    This work describes our experience of designing and building Pink, a 1024-node (2048 processor) Myrinet-based single-system image Linux cluster that was installed in January 2003 at the Los Alamos National Laboratory. At the time of its installation, Pink was the largest single-system image Linux cluster in the world, and was based entirely on open-source software - from the BIOS up. Pink was the proof-of-concept prototype for Lightning, a production 1408-node (2816 processor) cluster that begin operation at LANL. Lightning is currently number 6 on the Top500 list. In This work we examine the issues that were encountered and the problems that needed to be overcome in order to scale a cluster to this size. We also present some performance numbers that demonstrate the scalability and manageability of the cluster software suite. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An improvement of SAINV and RIF preconditionings of CG method by double dropping strategy

    Page(s): 142 - 149
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (481 KB) |  | HTML iconHTML  

    Preconditioning based on incomplete factorization of the matrix A is among the best known and most popular methods for solving a linear system of equations with symmetric positive definite coefficient matrix. However, the existence of an incomplete factorization is a delicate issue which must be overcame if one has a desire to design reliable preconditioning. Stabilized AINV (approximate in-verse) and RIF (robust incomplete factorization) preconditionings with single dropping have been proposed. Dropping procedure is a key to improvement of efficiency of computation. In this paper, new dropping strategy for improvement of both SAINV and RIF preconditionings are proposed. Moreover comparisons with other incomplete factorization and original SAINV and RIF preconditionings using challenging linear systems from realistic structural analysis are presented. We discuss double dropping strategy in the context of computation time of CG method with preconditioning for successful convergence and memory requirement for factorization. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • GBTK: a toolkit for grid implementation of BLAST

    Page(s): 378 - 382
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (504 KB) |  | HTML iconHTML  

    We describe the implementation of GBTK (Grid BLAST Tool Kit), a lightweight application specific grid framework for BLAST, for geographically distributed high performance computing (HPC) systems. This framework was built on the concept of synchronized Web services using RPC (remote procedure calls) encoded as XML (extensible Markup Language). View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design and implementation of a fault-tolerant RPC system: Ninf-C

    Page(s): 9 - 18
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (439 KB) |  | HTML iconHTML  

    We describe the design and implementation of a fault tolerant GridRPC system, Ninf-C, designed for easy programming of large-scale master-worker programs that take from few days to few months for its execution in a grid environment. Ninf-C employs Condor, developed at University of Wisconsin, as the underlying middleware supporting remote file transmission and checkpointing for system-wide robustness for application users on the grid. Ninf-C layers all the GridRPC communication and task parallel programming features on top of Condor in a non-trivial fashion, assuming that the entire program is structured in a master-worker style-in fact, older Ninf master-worker programs can be run directly or trivially ported to Ninf-C. In contrast to the original Ninf, Ninf-C exploits and extends Condor features extensively for robustness and transparency, such as 1) checkpointing and stateful recovery of the master process, 2) the master and workers mutually communicating using (remote) files, not IP sockets, and 3) automated throttling of parallel GridRPC calls; and in contrast to using Condor directly, programmers can set up complex dynamic workflow as well as master-worker parallel structure with almost no learning curve involved. To prove the robustness of the system, we performed an experiment on a heterogeneous cluster that consists of x86 and SPARC CPUs, and ran a simple but long-running master-worker program with staged rebooting of multiple nodes to simulate some serious fault situations. The program execution finished normally avoiding all the fault scenarios, demonstrating the robustness of Ninf-C. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel distributed application of the wireless sensor network

    Page(s): 81 - 88
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3988 KB) |  | HTML iconHTML  

    The paper describes the use of a wireless sensor network (WSN) for performing parallel pattern recognition computations. A complexity analysis indicates that the proposed algorithm is independent of the number of nodes and hence may scale up indefinitely with the network. It's shown that any material object once overlaid with a WSN, develops a latent associative memory, which enables the object to memorise some of its critical internal states for a real time comparison with those induced by the transient external conditions. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementations and performance of nonlinear CG methods by TAO on Dawning2000+

    Page(s): 252 - 255
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (685 KB) |  | HTML iconHTML  

    Nonlinear conjugate gradient methods (CG) are typical unconstrained optimization methods. As the optimization problems to be solved become larger, the dependence on efficient and scalable software is severe. Toolkit for Advanced Optimization (TAO) is a parallel package that can currently solve several kinds of optimization problems. In this paper, we give the framework of several variants of CG: CGFR, CGPR, CGPRP and their implementations in TAO 1.5, which have been tested up to 64 processors on Dawning2000 to solve problems with up to 10 variables. The results show that the scalability of CG implementations in TAO 1.5 is excellent. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Development of nonhydrostatic coupled ocean-atmosphere simulation code on the Earth Simulator

    Page(s): 487 - 495
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1122 KB) |  | HTML iconHTML  

    Non-hydrostatic coupled ocean-atmosphere simulation code has been developed in the Earth Simulator Center. For the first step to validate of our simulation code various test experiments were performed and results are presented in this paper. In optimization of our code has promoted to pull out the maximum capability of the Earth Simulator, 60% to the theoretical peak performance have been achieved using 512 nodes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A coarse-grained reconfigurable architecture supporting flexible execution

    Page(s): 448 - 451
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (252 KB) |  | HTML iconHTML  

    In our research, we have proposed a reconfigurable architecture 'PARS' for general purpose. For developing software assets required as a general purpose processor, the PARS architecture introduces an I-PARS execution model as an ideal execution model for coarse-grained reconfigurable processors. The I-PARS execution model is based on an execution model of an extremely wide VLIW processor. With the use of the I-PARS execution model we are not only able to generate the configuration data for step by step execution mode as for the conventional processors, but also possible to generate configuration data for streaming execution mode, which works fine on reconfigurable processors. Further, as a processor supporting the above model efficiently, we designed a prototype PARS processor UNITE. We present the design of the UNITE processor, and show how we support the two execution modes on it. Also we introduce the compiler we are developing for the UNITE processor. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable neural network using DAP/DNA

    Page(s): 432 - 433
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (220 KB) |  | HTML iconHTML  

    We want to find ways of realizing a neural network in DAP/DNA. As convenient platform for experiments the DAP/DNA were taken, which allows the change of hardware in one clock per application, flexibly and dynamically. Implementing neural network by using this method, high computational power will be provided, and flexibility and scalability are hold. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The parallel communication protocol in BCL-4

    Page(s): 98 - 103
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (308 KB) |  | HTML iconHTML  

    As CLUMPS become the main stream of clusters and the number of nodes in a cluster increases, it requires enhancing the bandwidth performance and availability of the communication system used in clusters. Parallel communication based on multiple system area networks (SANs) can fulfill the requirements. This work introduces the parallel communication protocol used in BCL-4, which is a high efficient communication system used in DAWNING-4000A, a large-scale Linux cluster. It dispatches small messages and sub-messages stripped from large messages into multiple SANs and maintains the communication semantics as before. The parallel communication process is transparent to both users and the control program on network interface card (NIC). It also provides an efficient load balance mechanism. Using the parallel communication protocol, BCL-4 provides many key features, such as multiple throughput, high availability, and backward compatibility. The experimental results show that the peak bandwidth of BCL-4 over two Myrinet is 494.7MB/s, which is almost twice of that over one, and that there is only 0.02us overhead of short message at the same time. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On emerging trends and future challenges in aerospace CFD using the CeNSS system of JAXA NS-III

    Page(s): 388 - 395
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (1543 KB) |  | HTML iconHTML  

    Japan Aerospace Exploration Agency has introduced a terascale SMP cluster type parallel supercomputer system as the main compute engine of Numerical Simulator III for aerospace science and engineering purposes. The computing system, called CeNSS, has 9.3Tflop/s peak performance and 3.6TB user memory, about 1,800 scalar processors are used. It also has a mass storage consisting of 57TB disk and 620TB tape library, and a visualization system is tightly integrated to the computing system. In this paper, after reviewing the history of the Numerical Simulator project, we describe the system configuration of NS-III. Next, we mention the performance issues on the CeNSS. Finally, illustrating some examples of the recent application results performed on the JAXA NS-III, we discuss the emerging trends and the future challenges in aerospace CFD. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Design, implementation and performance of fault-tolerant message passing interface (MPI)

    Page(s): 120 - 129
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (339 KB) |  | HTML iconHTML  

    Fault tolerant MPI (FTMPI) enables fault tolerance to the MPICH, an open source GPL licensed implementation of MPI standard by Argonne National Laboratory's Mathematics and Computer Science Division. FTMPI is a transparent fault-tolerant environment, based on synchronous checkpointing and restarting mechanism. FTMPI relies on non-multithreaded single process checkpointing library to synchronously checkpoint an application process. Global replicated system controller and cluster node specific node controller monitors and controls check pointing and recovery activities of all MPI applications within the cluster. This work details the architecture to provide fault tolerance mechanism for MPI based applications running on clusters and the performance of NAS parallel benchmarks and parallelized medium range weather forecasting models, P-T80 and P-TI26. The architecture addresses the following issues also: Replicating system controller to avoid single point of failure. Ensuring consistency of checkpoint files based on distributed two phase commit protocol, and robust fault detection hierarchy. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of protein tertiary structure prediction system with NetSolve

    Page(s): 320 - 327
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (549 KB) |  | HTML iconHTML  

    In this study, the protein tertiary structure prediction systems on the grid are proposed for progress of the bioinformatics. The prediction is mainly performed by the protein energy minimization. However, this method has many iterated calculation of the protein energy in most cases. To use the grid as the large-scale computing environment would be valuable for this system. In the system, parallel simulated annealing using genetic crossover (PSA/GAc) is a minimization engine and NetSolve is a basic tool to use the grid. In this study, two types of implementations are prepared. The first naive implementation of the system has a critical overhead due to large communication delay over the Internet. The second system, asynchronous crossover model, improves the performance in the second implementation. The details of the system and the experimental results solving C-peptide are shown as an example of grid application. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable HPC: torpedoed by deficits in education?

    Page(s): 428 - 429
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (207 KB) |  | HTML iconHTML  

    Currently there is a deep chasm between reconfigurable computing (RC) and the way, how "classical" CS people look at parallelism (Hartenstein, 2004). In education until recently RC has been subject of embedded systems or SoC design within EE departments, whereas most classical CS departments have ignored the enormous speed-up opportunities which can be obtained from this field. Only a few departments provide special courses mostly attended by a small percentage of graduate students. Conferences like ISCA stubbornly refused to include RC and related areas in their scope. Also many major players in the IT market have mainly ignored this area. Since some months ago this situation is on the way to be changed. An increasing number of colleagues from classical parallel computing or supercomputing communities is going to be ready to discuss fundamental issues. A major break-through also in CS education is overdue. Not only the HPC community urgently needs to benefit from a curricular revision. A rapidly increasing percentage of programmers implements code for embedded systems. However, most CS graduates are not qualified for this changing labour market. With their procedural-only mind set they cannot cope with hardware/configware/software partitioning. Currently such tasks are mainly carried out by EE professionals. In order not to lose this competition, and, to avoid a disaster for future CS graduates looking for their first job, CS departments have to wake up. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scaling evalutation of the lattice solid model on the SGI Altix 3700 [evalutation read evaluation]

    Page(s): 226 - 233
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (3841 KB) |  | HTML iconHTML  

    The lattice solid model is a particle based method which has been successfully employed for simulating the fracturing of rocks, the dynamics of faults, earthquakes and gouge processes. However, results from initial simulations demonstrate that models consisting of only thousands of particles are inadequate to accurately reproduce the micro-physics of seismic phenomenon. Instead, models with millions or tens of millions of particles are required to produce realistic simulations. Parallel computing architectures, such as the SGI Altix 3700, provide the opportunity to solve much larger computational problems than traditional single processor systems. In order to take advantage of high performance systems, a message passing interface version of the lattice solid model has been implemented. Benchmarks, presented in this paper, demonstrate an 80% parallel efficiency for the parallel lattice solid model on 128 processors of the SGI Altix 3700. These results, for a two-dimensional wave propagation problem, indicate the potential for the lattice solid model to simulate more computationally challenging three-dimensional geophysical processes. View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.