25-27 Oct. 1993
Filter Results
-
Proceedings of International Conference on Application Specific Array Processors (ASAP '93)
Publication Year: 1993|
PDF (146 KB)
-
Communication-minimal mapping of uniform loop nests onto distributed memory architectures
Publication Year: 1993, Page(s):1 - 14
Cited by: Papers (5)The authors deal with mapping techniques for uniform loop nests. Target machines are SPMD distributed memory parallel computers. They use affine-by-variable mapping to synthesize a virtual grid architecture from the original loop nest. The key to the mapping strategy is the communication graph, which enables us to derive optimal mappings, i.e., where the number of communications is proved to be mi... View full abstract»
-
The Xor embedding: An embedding of hypercubes onto rings and toruses
Publication Year: 1993, Page(s):15 - 28
Cited by: Papers (4)Many parallel algorithms use hypercubes as the communication topology among processes, which make them suitable to be executed on a hypercube multicomputer. In this way the communication cost is kept to a minimum since processes can be allocated to processors in such a way that only communication between neighbor processors is required. However, the scalability of hypercube multicomputer is constr... View full abstract»
-
Resource constrained scheduling of uniform algorithm
Publication Year: 1993, Page(s):29 - 40
Cited by: Papers (20)A method for optimizing the schedule and allocation of uniform algorithms onto processor arrays is derived. The main results described in the following paper are: (1) single (integer) linear programs are given for the optimal schedule of regular algorithms with and without resource constraints, (2) the class of algorithms is extended by allowing certain nonconvex index domains, (3) effecient branc... View full abstract»
-
Mapping algorithms onto a multiple-chip data-driven array
Publication Year: 1993, Page(s):41 - 52
Cited by: Papers (1)Data-driven arrays provide high levels of parallelism and pipelining for algorithms with no internal regularity. Most of the methods previously developed for mapping algorithms onto processor arrays assumed an unbounded array (i.e., one in which there will always be a sufficient number of processing elements (PEs) for the mapping). Implementing such an array is not practical. A more practical appr... View full abstract»
-
Scheduling partitioned algorithms on processor arrays with limited communication supports
Publication Year: 1993, Page(s):53 - 64
Cited by: Papers (7)It is important that array designs, especially the scheduling of partitioned arrays, must cope with various kinds of communication constraints such as interconnection topology, channel bandwidth, and inhomogeneous communication delay. The interprocessor communication requirements can be dictated by the dependence vectors and size of the partitioned tiles. A folded constraint graph is created to de... View full abstract»
-
Parallel processing architectures for rank order and stack filters
Publication Year: 1993, Page(s):65 - 76
Cited by: Papers (2)To achieve additional speedup in rank order and stack filter architectures requires the use of parallel processing techniques such as pipelining and block processing. Pipelining is well understood but few block architectures have been developed for rank order and stack filtering. Block processing is essential when the architecture reaches the throughput limits caused by the underlying technology. ... View full abstract»
-
A novel framework for multi-rate scheduling in DSP applications
Publication Year: 1993, Page(s):77 - 88
Cited by: Papers (7) | Patents (7)The authors present a novel framework for multi-rate scheduling of signal processing programs represented by regular stream flow graphs (RSFGs). The nodes of an RSFG may execute at different rates to avoid unbounded storage requirement under repetitive computation. A distinct feature of the scheduling framework, called the multi-rate software pipelining, is to allow maximum overlapping of operatio... View full abstract»
-
Efficient scalable architectures for Viterbi decoders
Publication Year: 1993, Page(s):89 - 100
Cited by: Papers (5) | Patents (2)Viterbi decoders (VDs) are widely used today for the decoding of convolutional codes in forward error correction schemes. Efficient deeply pipelined VLSI architectures, the generalized cascade VD and the trellis pipeline-interleaving (TPI) VD are adaptable to a given data rate only to a limited extent. The authors propose a novel unified class of deeply pipelined architectures, the scalable parall... View full abstract»
-
A wavefront array processor for on the fly processing of digital video streams
Publication Year: 1993, Page(s):101 - 108
Cited by: Papers (1)The authors present a wavefront array processor architecture developed at ETCA and dedicated to real-time processing of digital video streams. The core of the architecture is a mesh-connected three-dimensional network of 1024 custom processing elements. Each processing element can perform up to 50 millions 8- or 16-bit operations per second, working with a 25 MHz clock frequency. Thus algorithms a... View full abstract»
-
Subband filtering: Cordic modulation and systolic quadrature mirror filter tree
Publication Year: 1993, Page(s):109 - 123The decomposition (analysis) of a finite-energy signal into a relatively small number of mutually independent signals which allows reconstruction (synthesis) of the original signal is called subband filtering. Subbands can be processed in parallel or recursively. In the latter case, one obtains a so-called quadrature mirror filter tree. The former case leads to cosine-modulated filter banks. The a... View full abstract»
-
On synthesizing application-specific array architectures from behavioral specifications
Publication Year: 1993, Page(s):124 - 127
Cited by: Patents (1)The authors describe a design framework, Architect, being developed for synthesizing application-specific array architectures from behavioral specifications to Register-Transfer (RT) descriptions , which can be identified as a number of cooperating tasks; signal transformations, hardware mapping expressed as, in general, nonlinear mapping and scheduling function with hardware constraints, memory m... View full abstract»
-
A simple expert system for the reasoning of systolic designs
Publication Year: 1993, Page(s):128 - 131The author presents a simple expert system developed for the reasoning of systolic designs. It is based on the STA formalism, the spatial inductive techniques developed earlier, and a temporal induction technique (briefly introduced in this paper) to perform formal verification of systolic array designs. Induction techniques exploit the regularity and locality attributes of systolic arrays. The sy... View full abstract»
-
RELACS for systolic programming
Publication Year: 1993, Page(s):132 - 135
Cited by: Papers (9)The RELACS language is a systolic programming language, which simplifies the programmer's task by making explicit the data-flow of systolic algorithms, and by exposing the data delivery mechanism. The underlying architecture model is different from other SIMD architectures in that it physically separates computation and data management. The authors introduce the RELACS language as a syntaxic and a... View full abstract»
-
Data flow graphs granularity for overhead reduction within a PE in multiprocessor systems
Publication Year: 1993, Page(s):136 - 139
Cited by: Patents (1)The authors propose a method to implement Acyclic Data Flow Graphs (ADFG) in any general purpose multiprocessor system supporting a CSP type language. The granularity of ADFG nodes is discussed During ADFG analysis the authors use fine granularity to exploit all the parallelism inherent in the problem. When the graph G has been allocated, it is divided into P subgraphs G/sub k/ (P is the number of... View full abstract»
-
A massively parallel diagonal-fold array processor
Publication Year: 1993, Page(s):140 - 143
Cited by: Papers (3) | Patents (4)Image processing for multimedia workstations is a computationally intensive task typically requiring special purpose hardware, for example a nearest neighbor mesh parallel machine organization. One type of nearest neighbor mesh computer consists of a K /spl times/ K square array of Processor Elements (PEs) where each PE is connected to the North, South, East, and West PEs only. In a torus configur... View full abstract»
-
Response-pipelined CAM chips - Building blocks for large associated arrays
Publication Year: 1993, Page(s):144 - 147The authors introduce the architecture of a new type of fully parallel content addressable memory chips that serve as building blocks for large associated arrays. These new chips can be easily cascaded to increase the logical word size or the number of words and yet allow the search rate to be maintained constant irrespective of the logical word size or word count. Prototype CMOS implementations o... View full abstract»
-
An array-processor based architecture for classification problems
Publication Year: 1993, Page(s):148 - 151The authors describe the design and implementation of an application-specific digital architecture aimed at the solution in real time of the "K nearest neighbors" algorithm for classification problems, whose computational weight is very high. The system described here is an array-processor architecture in which the data base is split among several units that at the same time apply the same operati... View full abstract»
-
Realization of a real time phasecorrelation chipset used in a hierarchical two step HDTV motion vector estimator
Publication Year: 1993, Page(s):152 - 155The phasecorrelation algorithm - as a method for motion estimation - is a key component of todays TV and tomorrows HDTV-systems. One advantage of hardware realization of this algorithm for efficient real time processing - in opposite to blockmatching - is the possibility to process multiple pixels per system clock cycle. A suitable partition using three different VLSI-circuits to perform the phase... View full abstract»
-
Asynchronous relaxation of locally-coupled automata networks, with application to parallel VLSI implementation of iterative image processing algorithms
Publication Year: 1993, Page(s):156 - 159
Cited by: Papers (2)Array processors tailored to mesh-based iterative algorithms benefit from shifting to an asynchronous mode. An architecture implementing this functionally asynchronous state-space update with self-timed elementary processors can dispense with the overhead of classical data exchange protocols and offer a flexible hierarchical mapping of the state lattice onto the array. The performance and practica... View full abstract»
-
Mapping arbitrary projections for volume rendering onto an array processor
Publication Year: 1993, Page(s):160 - 163
Cited by: Patents (5)The authors propose a new mapping technique for volume rendering on massively parallel computer. Previous mappings of volume rendering algorithms onto array processors either required large amounts of interprocessor communication or lacked the generality needed to deal with arbitrary rotations and perspective projections. The mapping developed here is based on an enhanced ray-tracking algorithm. I... View full abstract»
-
M/sup 3/: A high performance signal processor for RADAR applications
Publication Year: 1993, Page(s):164 - 167Real time radar computing requires high processing performances and fast and efficient I/O capabilities. These goals have been achieved by means of a new multiprocessor architecture based on the Motorola DSP 96002. This system was developed entirely in the FIAR laboratories, and now is a state-of-the-art unit in their avionic radar family. The authors describe a computer system developed specifica... View full abstract»
-
COLUMNUS - An SIMD architecture for pattern recognition and simulations of statistical physics
Publication Year: 1993, Page(s):168 - 171
Cited by: Papers (1)Many interesting problems including simulations of statistical physics, pattern recognition and neural networks can be treated efficiently by performing calculations in parallel on a large number of discrete, often even binary variables. As general-purpose computers are not well adapted to these problems, the authors have developed an SIMD array of bit-sequential processors providing an extended s... View full abstract»
-
Processing of variable size images on a cellular array: Performance analysis with the Abingdon Cross Benchmark
Publication Year: 1993, Page(s):172 - 175Handling a continuous flow of variable size images is a requirement for real time computer vision machines. A modular system based on a small size SIMD cellular array of 1-bit processing elements has been developed with this goal in mind and it is now evaluated against the Abingdon Cross Benchmark specifications. The benchmark tests the combination of algorithms and architecture and generates a qu... View full abstract»
-
Matrix-matrix multiplications and fault tolerance on hypercube multiprocessors
Publication Year: 1993, Page(s):176 - 180Several new algorithms for matrix-matrix multiplications on hypercube multiprocessors are presented and evaluated based on the number of multiplications, additions, and transfers. The matrices to be multiplied are uniformly distributed to all processors of a hypercube system. Each processor owns some submatrices which are derived by dividing the source matrices. Each submatrix multiplication can n... View full abstract»