By Topic

High-Performance Computing, 1997. Proceedings. Fourth International Conference on

Date 18-21 Dec. 1997

Filter Results

Displaying Results 1 - 25 of 83
  • Proceedings Fourth International Conference on High-Performance Computing

    Publication Year: 1997
    Request permission for commercial reuse | PDF file iconPDF (451 KB)
    Freely Available from IEEE
  • Conference Organization

    Publication Year: 1997, Page(s):xviii - xxiii
    Request permission for commercial reuse | PDF file iconPDF (418 KB)
    Freely Available from IEEE
  • Author index

    Publication Year: 1997, Page(s):539 - 541
    Request permission for commercial reuse | PDF file iconPDF (141 KB)
    Freely Available from IEEE
  • A tight layout of the cube-connected cycles

    Publication Year: 1997, Page(s):422 - 427
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    F.P. Preparata and J. Vuillemin (1981) proposed the cube connected cycles (CCC) and in the same paper, gave an asymptotically optimal layout scheme for the CCC. We give a new layout scheme for the CCC which requires less than half of the area of the Preparata-Vuillemin layout. We also give a non trivial lower bound on the layout area of the CCC. There is a constant factor of 2 between the new layo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FP-map-an approach to the functional pipelining of embedded programs

    Publication Year: 1997, Page(s):415 - 420
    Cited by:  Papers (6)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    Practice shows that increasing the amount of instruction level parallelism offered by an architecture (like adding instruction slots to VLIW instructions) does not necessarily lead to significant performance gains. Instead, high hardware costs and inefficient use of this hardware may occur. Mapping embedded applications onto multiprocessor systems forms a very interesting extension to ILP. We prop... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A study of tree-based control flow prediction schemes

    Publication Year: 1997, Page(s):28 - 33
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB)

    In order to fetch a large number of instructions per cycle from a sequential program, wide-issue superscalar processors have to predict the outcome of multiple branches in a cycle, and fetch instructions from non-contiguous portions of code. Past research has developed schemes that predict the outcome of multiple branches by performing a single prediction. That is, instead of predicting the outcom... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Modeling multi-threaded architectures in PAMELA for real-time high performance applications

    Publication Year: 1997, Page(s):407 - 414
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (760 KB)

    Presents a method to explore the design space of multi-threaded architectures using PAMELA (PerformAnce ModEling LAnguage). The domain of applications we consider is digital signal processing (DSP), where high performance is derived by exploiting both fine-grain and coarse-grain parallelism in the application. The modeling scheme takes an unified view of both fine-grain and coarse-grain parallelis... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A different approach to high performance computing

    Publication Year: 1997, Page(s):22 - 27
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (568 KB)

    A common approach to enhance the performance of processors is to increase the number of function units which operate concurrently. We observe this development in all recent superscalar and VLIW (very-long instruction word) processors. VLIWs are easier extensible to high performance ranges because they lack much of the superscalar hardware required for dependence checking and hardware resource allo... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Single step undirected reconfigurable networks

    Publication Year: 1997, Page(s):284 - 289
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    The reconfigurable mesh (RN-MESH) can solve a large class of problems in constant time, including problems that require logarithmic time by other, even shared memory, models such as the PRAM with a similar number of processors. In this work we show that for the RN-MESH these constants can always be reduced to one, still using a polynomial number of processors. Given a reconfigurable mesh that comp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Global I/O optimizations for out-of-core computations

    Publication Year: 1997, Page(s):401 - 406
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (580 KB)

    The use of parallel machines to solve large-scale computational problems in science and engineering has increased considerably in recent times. Many of these problems have computational requirements which stretch the capabilities of even the fastest machine available today. In addition to requiring a great deal of computational power, these problems usually deal with large quantities of data up to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A comparison of two context allocation approaches for fast protected calls

    Publication Year: 1997, Page(s):16 - 21
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (572 KB)

    Secure computing systems require the implementation of protection domains and a safe way of transferring control across such domains. Isolating the contexts (activation stacks) of the caller and the callee, to avoid unintended information flow, is a fundamental requirement for implementing cross-domain transfers. We present and evaluate two approaches for implementing contexts for cross-domain cal... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PiSMA: an upgradable fault tolerant approach to parallel processing

    Publication Year: 1997, Page(s):277 - 283
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (648 KB)

    Parallel processors reduce the communication overhead problem with the employment of some form of global communication network. This network however, imposes restrictions on the scalability and technological evolution of the parallel processor. In this paper a novel architecture called PiSMA (Parallel Virtual Shared Memory Architecture) is proposed, which consists of a basic substrate, without a n... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Evaluating the performance implications of binding threads to processors

    Publication Year: 1997, Page(s):393 - 400
    Cited by:  Papers (1)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (576 KB)

    The default scheduling algorithm in Solaris and other operating systems may result in frequent relocation of threads at run-time. Excessive thread relocation cause poor memory performance. This can be avoided by binding threads to processors. However, binding threads to processors may result in an unbalanced load. By considering a previously obtained theoretical result and by evaluating a set of m... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel domain decomposition and load balancing using space-filling curves

    Publication Year: 1997, Page(s):230 - 235
    Cited by:  Papers (23)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    Partitioning techniques based on space filling curves have received much recent attention due to their low running time and good load balance characteristics. The basic idea underlying these methods is to order the multidimensional data according to a space filling curve and partition the resulting one dimensional order. However, space filling curves are defined for points that lie on a uniform gr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel data cube construction for high performance on-line analytical processing

    Publication Year: 1997, Page(s):10 - 15
    Cited by:  Papers (2)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB)

    Decision support systems use online analytical processing (OLAP) to analyze data by posing complex queries that require different views of data. Traditionally, a relational approach (ROLAP) has been taken to build such systems. More recently, multi-dimensional database techniques (MOLAP) have been applied to decision-support applications. Data is stored in multi-dimensional arrays, which is a natu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Virtual control channel and its application to the massively parallel computer RWC-1

    Publication Year: 1997, Page(s):443 - 448
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (460 KB)

    Global operation and system control are important issues in massively parallel systems. The paper discusses virtual control networks (VCNs), which are substitutes for the current dedicated control networks. First we introduce a new mechanism called Virtual Control Channel (VCC) used to conduct control information over data network links. The network nodes have control finite state machines (CFSMs)... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Gradient method based design methodology for time and area optimization of a pipelined attached processor architecture

    Publication Year: 1997, Page(s):272 - 276
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (392 KB)

    A procedure for producing a design of a pipelined attached processor is described. It assumes that a set of algorithms and their frequencies of execution are specified. Then it determines designs that tend to minimize the execution time-cost product and execution time2-cost product, using gradient methods involving steepest descent. The designs are produced by allocating hardware subsys... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Supporting unbounded process parallelism in the SPC programming model

    Publication Year: 1997, Page(s):168 - 173
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    In automatic mapping of parallel programs to target parallel machines the efficiency of the compile-time cost estimation needed to steer the optimization process is highly dependent on the choice of programming model. Recently a new parallel programming model, called SPC, has been introduced that specifically aims at the efficient computation of reliable cost estimates, paving the way for automati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Predicting the speedup of multithreaded Solaris programs

    Publication Year: 1997, Page(s):386 - 392
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    Presents a method and a set of tools for predicting the speedup of multithreaded Solaris programs. The predictions are based on recordings from a single-processor execution of the multithreaded program. The routines in the thread library are overloaded with an instrumented thread library developed by us. We do not need to have access to the source code of the multithreaded program and no recompila... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Sparse matrix decomposition with optimal load balancing

    Publication Year: 1997, Page(s):224 - 229
    Cited by:  Papers (9)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (664 KB)

    Optimal load balancing in sparse matrix decomposition without disturbing the row/column ordering is investigated. Both asymptotically and run time efficient exact algorithms are proposed and implemented for one dimensional (1D) striping and two dimensional (2D) jagged partitioning. Binary search method is successfully adopted to 1D striped decomposition by deriving and exploiting a good upper boun... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Concurrency control of nested cooperative transactions in active DBMS

    Publication Year: 1997, Page(s):4 - 9
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    Active database management systems (ADBMSs) use event-condition-action (ECA) rules. Each ECA rule specifies what action is to be taken when an event occurs and the specified condition is satisfied. In this paper, we introduce a concurrency control scheme for handling nested cooperative transactions using detached-mode ECA rules of an ADBMS. A state transition model has been proposed to specify dif... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed delay constrained multicast path setup algorithm for high speed networks

    Publication Year: 1997, Page(s):438 - 442
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    The problem of finding an optimal multicast tree in a point to point network translates to the Steiner problem in graphs. Since the Steiner problem is NP complete, heuristic approaches are required for path setup. The problem takes a new dimension in wide area networks, where centralized algorithms are not feasible, and distributed schemes are needed. It is also desirable that node participation f... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast multiplier schemes using large parallel counters and shift switches

    Publication Year: 1997, Page(s):302 - 308
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (628 KB)

    We present novel fast parallel multiplier schemes. In contrast to the full adder binary logic based traditional designs, we use (incomplete) large parallel counters and large shift switch compressors, which are built based on shift switch logic, a logic with shift switches as logic elements performing modulo arithmetic operations on (non-binary) state signals. With the unique feature of shift swit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable custom computing as a supercomputer replacement

    Publication Year: 1997, Page(s):260 - 269
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (856 KB)

    Reconfigurable computers are a new class of customisable computers constructed using field programmable gate arrays (FPGAs). They offer us the potential for supercomputing performance at high-end workstation costs for a range of niche applications. The performance achievable is a direct consequence of the machine architecture which gives the user direct exposure to the inherent parallelism present... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Integer sorting algorithms for coarse-grained parallel machines

    Publication Year: 1997, Page(s):159 - 164
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (600 KB)

    Integer sorting is a subclass of the sorting problem where the elements have integer values and the largest element is polynomially bounded in the number of elements to be sorted. It is useful for applications in which the size of the maximum value of element to be sorted is bounded. In this paper, we present a new distributed radix-sort algorithm for integer sorting. The structure of our algorith... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.