Application-Specific Array Processors, 1993. Proceedings., International Conference on

25-27 Oct. 1993

Filter Results

Displaying Results 1 - 25 of 63
  • Proceedings of International Conference on Application Specific Array Processors (ASAP '93)

    Publication Year: 1993
    Request permission for commercial reuse | PDF file iconPDF (146 KB)
    Freely Available from IEEE
  • The Xor embedding: An embedding of hypercubes onto rings and toruses

    Publication Year: 1993, Page(s):15 - 28
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (761 KB)

    Many parallel algorithms use hypercubes as the communication topology among processes, which make them suitable to be executed on a hypercube multicomputer. In this way the communication cost is kept to a minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputer is constr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A wavefront array processor for on the fly processing of digital video streams

    Publication Year: 1993, Page(s):101 - 108
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (462 KB)

    The authors present a wavefront array processor architecture developed at ETCA and dedicated to real-time processing of digital video streams. The core of the architecture is a mesh-connected three-dimensional network of 1024 custom processing elements. Each processing element can perform up to 50 millions 8- or 16-bit operations per second, working with a 25 MHz clock frequency. Thus algorithms a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A real-time systolic algorithm for on-the-fly hidden surface removal

    Publication Year: 1993, Page(s):238 - 249
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (543 KB)

    Hidden surface removal for real-time realistic display of complex scenes requires intensive computation and justifies usage of parallelism to provide the needed response time. The authors present a systolic algorithm that identifies visible segments on a scanline with the "real-time" characteristic: visible segments are output on-the-fly as soon as segments are input to the systolic array. The pro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A 1D linearly expandable interconnection network performance analysis

    Publication Year: 1993, Page(s):572 - 582
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (540 KB)

    The authors describe the design and evaluation of a linear interconnection network for the image processing parallel architecture GFLOPS. This is a SIMD/MIMD architecture which can treat a wide range of application from low level to high level. The different memory banks of this structure are connected to the processors through a one stage interconnection network. This network is linearly expandab... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A period-processor-time-minimal schedule for cubical mesh algorithms

    Publication Year: 1993, Page(s):261 - 272
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (548 KB)

    The paper, using a direct acyclic graph (dag) model of algorithms, investigates precedence constrained multiprocessor schedules for the n × n × n directed mesh. This cubical mesh is fundamental, representing the standard algorithm for square matrix product, as well as many other algorithms. Its completion requires at least 3n - 2 multiprocessor steps. Time-minimal multiprocessor schedu... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient architecture of a programmable block matching processor

    Publication Year: 1993, Page(s):560 - 571
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (560 KB)

    An efficient VLSI architecture of a programmable block matching processor for the emulation of a wide spectrum of full search and reduced complexity search block matching algorithms is presented. Optimized efficiency is obtained by using a quadratic systolic array architecture with global accumulation, combined with a programmable meander-like data flow. Flexibility is further increased by cascada... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient algorithm for image-template product on SIMD mesh connected computers

    Publication Year: 1993, Page(s):250 - 260
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    Convolutions and correlations are fundamental operations in computer vision and image processing. The image-template product expresses convolutions and correlations. In this paper, the authors present an efficient algorithm for the image-template product on SIMD mesh connected computers. The image-template product is computed along disjoint convolution paths of the template. For an M × N ima... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An optimal algo-tech-cuit for the knapsack problem

    Publication Year: 1993, Page(s):548 - 559
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (472 KB)

    The authors first present a formal derivation and proof of correctness of a systolic array for the knapsack problem, an NP-complete problem whose dependency graph is not completely known statically. With q PEs, each with a fixed size memory, the arraystretch runs in Γ(mc/q), which gives optimal speedup of the algorithm. However, it has an intricate tag-based control mechanism which is diffic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient scalable architectures for Viterbi decoders

    Publication Year: 1993, Page(s):89 - 100
    Cited by:  Papers (4)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (492 KB)

    Viterbi decoders (VDs) are widely used today for the decoding of convolutional codes in forward error correction schemes. Efficient deeply pipelined VLSI architectures, the generalized cascade VD and the trellis pipeline-interleaving (TPI) VD are adaptable to a given data rate only to a limited extent. The authors propose a novel unified class of deeply pipelined architectures, the scalable parall... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An algorithm for accurate data dependence test

    Publication Year: 1993, Page(s):404 - 415
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (532 KB)

    To test if there is a dependence between different iterations in a loop can be converted to checking if there exist integral points in a polyhedron described by a set of linear equations and inequalities. In this paper, a method for accurate data dependence test is proposed. In this method, first, data dependence test problems with any number of linear equations are transformed equivalently to tes... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable hardware for molecular biology computing systems

    Publication Year: 1993, Page(s):184 - 187
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (216 KB)

    The authors explore the scope of possibilities of the implementation of molecular biology algorithms on a reconfigurable hardware based architecture. In order to demonstrate both the flexibility and power of reconfigurable hardware, two algorithms have been implemented on an architecture constituted by 23 FPGAs (Xilinx XC3090) and 4MB of SRAM. These two algorithms have been chosen because of their... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping algorithms onto a multiple-chip data-driven array

    Publication Year: 1993, Page(s):41 - 52
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    Data-driven arrays provide high levels of parallelism and pipelining for algorithms with no internal regularity. Most of the methods previously developed for mapping algorithms onto processor arrays assumed an unbounded array (i.e., one in which there will always be a sufficient number of processing elements (PEs) for the mapping). Implementing such an array is not practical. A more practical appr... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Subband filtering: Cordic modulation and systolic quadrature mirror filter tree

    Publication Year: 1993, Page(s):109 - 123
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (464 KB)

    The decomposition (analysis) of a finite-energy signal into a relatively small number of mutually independent signals which allows reconstruction (synthesis) of the original signal is called subband filtering. Subbands can be processed in parallel or recursively. In the latter case, one obtains a so-called quadrature mirror filter tree. The former case leads to cosine-modulated filter banks. The a... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping Monte Carlo-Metropolis algorithm onto a double ring architecture

    Publication Year: 1993, Page(s):192 - 195
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (196 KB)

    Mathematical models are frequently used to evaluate the physical properties of matter; they involve intensive computation: processing times and loan costs soar, using general purpose computers. Remarkable improvements are promised by special purpose architectures, in particular for the parallel execution of the most time consuming routines; the Monte Carlo-Metropolis method has been examined, and ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication-minimal mapping of uniform loop nests onto distributed memory architectures

    Publication Year: 1993, Page(s):1 - 14
    Cited by:  Papers (5)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    The authors deal with mapping techniques for uniform loop nests. Target machines are SPMD distributed memory parallel computers. They use affine-by-variable mapping to synthesize a virtual grid architecture from the original loop nest. The key to the mapping strategy is the communication graph, which enables us to derive optimal mappings, i.e., where the number of communications is proved to be mi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • VLSI array synthesis for polynomial GCD computation

    Publication Year: 1993, Page(s):536 - 547
    Cited by:  Papers (3)  |  Patents (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (500 KB)

    Polynomial GCD (greatest common divisor) finding is an important problem in algebraic computation, especially in decoding error correcting codes. The authors show a new systolic array structure for the polynomial GCD problem using a systematic array synthesis technique. The VLSI implementation of the array structure is area-efficient and achieves maximum throughput with pipelining. The dependency ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast, storage-efficient parallel sorting algorithm

    Publication Year: 1993, Page(s):369 - 379
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (436 KB)

    A parallel sorting algorithm is presented for storage-efficient internal sorting on MIMD machines. The algorithm first sorts the elements within each node using a serial based algorithm, then a two-phase parallel merge. It requires additional storage of order of the square root of the number of elements in each node. Performance of the algorithm on two general-purpose MIMD machines, the Fujitsu AP... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systolic evaluation of functions: Digit-level algorithm and realization

    Publication Year: 1993, Page(s):514 - 525
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (496 KB)

    The author presents a novel algorithm for the evaluation of functions. The algorithm is systolic and may be realized as a fully scalable and very regular design consisting of merely full-adders and registers. The algorithm evaluates a polynomial according to the Horner scheme, i.e., it performs a cascade of multiply-and-add operations. Data are represented as two's complement fixed-point numbers t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An application-specific array architecture for feedforward with backpropagation ANNs

    Publication Year: 1993, Page(s):333 - 344
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    An application-specific array architecture for Artificial Neural Networks (ANNs) computation is proposed. This array is configured as a mesh-of-appendixed-trees (MAT). Algorithms to implement both the recall and the training phases of the multilayer feedforward with backpropagation ANN model are developed on MAT. The proposed MAT architecture requires only O(log N) time, while other reported techn... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel framework for multi-rate scheduling in DSP applications

    Publication Year: 1993, Page(s):77 - 88
    Cited by:  Papers (6)  |  Patents (7)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    The authors present a novel framework for multi-rate scheduling of signal processing programs represented by regular stream flow graphs (RSFGs). The nodes of an RSFG may execute at different rates to avoid unbounded storage requirement under repetitive computation. A distinct feature of the scheduling framework, called the multi-rate software pipelining, is to allow maximum overlapping of operatio... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Multi-rate transformation of directional affine recurrence equations

    Publication Year: 1993, Page(s):392 - 403
    Cited by:  Papers (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (516 KB)

    There has been an increased attention to the synthesis of algorithmic specific pipeline arrays such as systolic arrays. Most of the existing synthesis techniques are based on a transformation of the algorithm from a class of Recurrence Equations such as Uniform Recurrence Equations (UREs). However, many algorithms cannot be transformed to a URE and the temporal locality of systolic arrays results ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A novel architecture for a decision-feedback equalizer using extended signal-digit feedback

    Publication Year: 1993, Page(s):490 - 501
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (404 KB)

    A novel bit-level systolic array architecture for implementing a bit parallel decision-feedback equalizer (DFE) is presented. Core of the architecture is an array multiplier using redundant arithmetic in combination with bit-level feedback. The use of signal-digit (SD) circuitry allows one to feed back each digit as soon as it is available. So the recursive computation can be executed with the mos... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The PAPRICA SIMD array: Critical reviews and perspectives

    Publication Year: 1993, Page(s):309 - 320
    Cited by:  Papers (9)  |  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (564 KB)

    The PAPRICA project started in 1988 as an experimental VLSI architecture devoted to the efficient computation of data with two-dimensional structure. The main goal of the project is to develop a subsystem that could operate as an attached processing unit to a standard workstation and in perspective as a specialized processing module in dedicated systems devoted to low level image analysis, cellula... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Heterogeneous BISR techniques for yield and reliability enhancement using high level synthesis transformations

    Publication Year: 1993, Page(s):454 - 465
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    Built-In-Self-Repair (BISR) is a fault tolerance technique against permanent faults, where in addition to core operational modules, a set of spare modules is provided. If a faulty core module is detected, it is replaced with a spare module. The BISR methodology has been used only in situations where a failed module of one type can only be replaced by a backup module of the same type. The authors p... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.