Date 2527 May 1988
Filter Results

Proceedings of the International Conference on Systolic Arrays (Cat. No.88CH26039)

An efficient systolic array for MVDR beamforming
Page(s): 11  20An efficient systolic array for computing the minimum variance distortionless response (MVDR) from an adaptive antennas array is described. The MVDR beamforming problem amounts to minimizing, in a leastsquares sense, the combined output from an antenna array subject of K independent linear equality constraints each of which corresponds to a chosen 'look direction'. The array is fully pipelined and based on numerically stable algorithm that requires O(p/sup 2/+Kp) arithmetic operations per sample time, where p is the number of antenna elements.<
> View full abstract» 
Implementation of synthetic aperture radar algorithms on a systolic/cellular architecture
Page(s): 21  30Two sequences of operations necessary for implementation of highresolution image formation in strip and spotlight modes of the syntheticaperture radar (SAR) are presented. The sequences are mapped onto a systolic/cellular architecture. The mapping includes parallel implementation of all the basic operations and the pertinent data communication. Detailed estimates of the computation times are provided.<
> View full abstract» 
Synthesizing optimal family of linear systolic arrays for matrix computations
Page(s): 51  60A method is proposed for designing a family of linear systolic arrays for matrixoriented problems for which twodimensional arrays have been designed. The design exhibits a tradeoff between local storage, s, and number of processing elements, n. The arrays are linear, with each processor having storage O(s),1
> View full abstract» 
Theory for systolizing global computational problems
Page(s): 61  71A theory is presented for rasterizing a class of twodimensional problems including signal/image processing, computer vision, and linear algebra. The rasterization theory is steered by an isomorphic relationship between the multidimensional shuffleexchange network (mDSE) and the multidimensional butterfly network (mDBN). Many important multidimensional signalprocessing problems can be solved on a mDSE with a solution time approaching known theoretical lower bounds. The isomorphism between mDSE and mDBN is exploited by transforming and mDSE solution into its equivalent mDBN solution. A methodology for rastering the mDBN solution is developed. It turns out that not all mD algorithms can be rasterized. A sufficient condition for algorithm rasterization is given.<
> View full abstract» 
New architectures for systolic hashing
Page(s): 73  82Two and threedimensional systolic architectures are proposed for the hash table data structure (hashing). The parallel systolic hashing architecture provides the facility for implementing the hash operations of Insert, Delete, and Member in a constant time complexity. The importance and advantages of extending sequential hashing to a parallelized form are discussed. An implementation is presented of a sorting problem of N numbers in an O(L) time complexity, where L is constant, using a threedimensional parallelized systolic hashing process. This is compared to a sequential hashing process, which requires O(N) time complexity.<
> View full abstract» 
Linear systolic array for leastsquares estimation
Page(s): 83  92The use of squarerootfree linear systolic array structure to perform the QR decomposition needed in the solution of leastsquares (LS) problems is proposed. A form of the Kalman filter algorithm is applied to perform the recursive LS estimation. Compared with the conventional triangular systolic array structure for LS estimation, the linear array has the advantage of requiring less area and being simpler for VLSI implementation.<
> View full abstract» 
A cellular algorithm for straight line extraction
Page(s): 93  102Straightlineedge extraction can be carried out in two successive phases: identifying the pixels that belong to edges and conducting straightline segments from these edge pixels. A parallel approach based on a cellular algorithm is proposed for the second phase. Each cell sends a message that compiles distances between a pattern segment and the real segment on the image. The value of the message identifies a segment and codifies its length and endpoints. If the parameters of the algorithm are properly chosen, it can be adjusted to different kinds of contours: noised or blurred edges and disconnected segments. The algorithm takes computation time proportional to the linear dimension of the image (for an image of N*N pixels the linear dimension is N) and the number of generalized directions.<
> View full abstract» 
A one dimensional systolic array for solving arbitrarily large least mean square problems
Page(s): 103  112The design is presented of a onedimensional systolic array for solving arbitrarily large leastmeansquare problems involving QR decomposition and a triangular system of equations. The main characteristics of this array are maximization of array utilization, thus achieving a minimum global computation time, and low complexity of the resulting array, which can also be used in problems such as matrixbyvector, matrixbymatrix, and LU decomposition. Two systolic algorithms for QR decomposition have been designed. Their chained execution is shown.<
> View full abstract» 
Performance evaluation of the HERMES multibit systolic array architecture for low level processing tasks
Page(s): 113  124The performance of the various parts of the HERMES multiprocessor vision system is evaluated. HERMES is an autonomous, hierarchical, heterogenic vision processing system, consisting of N/sup 2//4/sup i/, 0
> View full abstract» 
On partitioning the Faddeev algorithm
Page(s): 125  134Partitioned schemes for computing the Faddeev algorithm are derived, using a graphbased methodology. Such implementations are obtained by performing transformations on the fully parallel dependence graph of the algorithm. Linear and twodimensional structures are derived and evaluated in terms of throughput, I/O bandwidth, utilization of processing elements, and overhead due to partitioning. The partitioned implementation are compared with schemes previously proposed. It is shown that throughput of both the linear and twodimensional arrays derived here tends to 3m/7n/sup 3/ for n*n matrices, where m is the number of cells and utilization tends to 1. A twodimensional scheme that is more efficient and has less overhead than others previously proposed is derived. It is shown that for partitioned implementations with the same number of cells, a linear array performs better, its implementation is easier, and it is better suited for faulttolerant capabilities than a twodimensional one.<
> View full abstract» 
A systolic architecture for the symmetric tridiagonal eigenvalue problem
Page(s): 145  150The first step in the development of a chip set to support eigenvalueeigenvectorbased estimation algorithms is presented. It is based on the assumption that an averaging technique will produce a symmetric covariance matrix. Such a matrix can be reduced to a symmetric tridiagonal matrix, and hence the eigenvalues and eigenvectors can be found by successive iterations involving QR decomposition. The architecture is unique in that other architectures either solve only for the eigenvalues or use methods other than QR iteration. It has potential for use in a systolic computer for computer intensive digital signal processing based on modern spectralanalysis techniques.<
> View full abstract» 
Stereo matching of satellite images with transputers
Page(s): 175  182A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a realtime system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<
> View full abstract» 
A million transistor systolic array graphics engine
Page(s): 193  202A description is given of a million transistor systolic array graphics engine (SAGE) that can render a horizontal 3D span in every clock cycle at the rate of 25 million spans/s, independent of the pixel length of the span. For the average span length in the 1032 pixel range, this translates into 250800 million pixels/s. Assuming that the front end of the system can generate a span in every clock cycle, then in the best case SAGE polygon performance is 25,000,000 polygons/s for 100 pixel polygons and drops down to 750000 polygons/s as the average area of the polygons increases to 1000 pixels. For example, a system using the SAGE chip has the potential to interactively display the motion and time behavior of a 20000 polygon scene at a rate of 30 frames/s. In the extreme case where all spans are 1024 pixels long, SAGE operates at a peak parallel pixelprocessing rate of 25000 million pixels/s.<
> View full abstract» 
A massively parallel systolic array processor system
Page(s): 217  225The design of a massively parallel processor, comprised of 2304bitserial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistortransistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements userdefined instructions.<
> View full abstract» 
Implementation of array structured maximum likelihood decoders
Page(s): 227  236Efficient VLSI array processor architectures for maximumlikelihood decoding (MLD) have been developed to meet the high throughput and data processing requirements of modern communication systems. Both 1D and 2D MLD processors with large constraint length (>8) have been derived. Radix4p processing elements and delay commutating switching processors for MLD have been concatenated to construct a pipeline MLD processor. The pipeline length can be adapted to meet the time/area constraints for various applications. A 2D MLD array processor is also presented. Processing data are modulized, data transmission are embedded into processing elements, and a fixedsize 2D MLD array is derived to meet highdatathroughput requirements.<
> View full abstract» 
Regular processor arrays for matrix algorithms with pivoting
Page(s): 237  246It is shown how to obtain regular (though nonsystolic) processor arrays for algorithms with pivoting. First, the fact that pivoting algorithms cannot be systolic is established. Then it is shown how regular iterative algorithms can be formulated for the Gaussian elimination algorithm with partial pivoting and how the algorithm can then be implemented on the socalled regular iterative arrays (locally connected arrays of essentially identical processor modules, with register pipelines and/or LIFO (lastin/firstout) buffers in some of the links).<
> View full abstract» 
Systolic algorithms for some scheduling and graph problems
Page(s): 247  256A simple model of a linear systolic array with serial input/output and oneway data communication is considered. It is shown that such an array can be used to solve some scheduling and graph problems efficiently. The systolic algorithms are developed in two stages. First an algorithm on a restricted type of sequential machine is constructed. Then the sequential machine algorithm is transformed into a systolic algorithm. The transformation can be done automatically and efficiently.<
> View full abstract» 
The design of a systolic array system for linear state equations
Page(s): 275  284The dependencegraph (DG) approach is extended and applied to the systematic design of a systolic array system. Two DGs that represent two different but datadependent process algorithms are first linked together. Tag bits are added onto index nodes in this linked DG and used to indicate the different functions to be executed on single processor element. By applying the conventional timescheduling and nodeassignment procedures to this tagged DG, the interfacing communication problem of a systolic array system can be solved and the optimal latency can be easily obtained. Using this method, an optimal linearstate solver has been designed.<
> View full abstract» 
Mapping strategy for automatic design of systolic arrays
Page(s): 285  294A mapping strategy for automatic design of systolic arrays is presented. Algorithms are specified in terms of data dependency and identity, and implementations are specified in terms of data propagation and sequence behavior. By establishing a relation between data propagation and sequence, an optimal mapping strategy is formulated as a problem of finding an integer solution of a set of linear equations. This approach provides a uniform framework to design a variety of systolic arrays. An automatic design program and some design examples are presented.<
> View full abstract» 
The derivation of regular synchronous circuits
Page(s): 305  314An approach to derive parameterized representations of regular synchronous circuits from their specification is presented. The derivation of designs consists of two steps: rewriting the specification in terms of predefined structures to obtain a draft architecture, and optimizing that architecture by successive correctnesspreserving transformations using algebraic theorems. These steps can be repeated to obtain, at a lower level of abstraction, architectures that still satisfy the original specification. A number of wordlevel and bitlevel rank evaluator designs are developed to illustrate the techniques describes.<
> View full abstract» 
Systolic arrays for group explicit methods for solving parabolic partial differential equations
Page(s): 315  329A systolic array implementation for solving parabolic equations numerically is presented. The finitedifference methods used are stable asymmetric approximations to the partial differential equations, which when coupled in groups of two adjacent points on the grid result in implicit equations that are easily converted to explicit form, thus offering many advantages suitable for solution by VLSI techniques. The regularity obtained from the grid structure and locality of data from groups of small size, combined with the attributes of truncation error cancellations and alternating the strategies of grid points, give unconditional stability and an efficient, systolic design.<
> View full abstract» 
Parallel algorithms and systolic array designs for RSA cryptosystem
Page(s): 341  350Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<
> View full abstract» 
A systolic algorithm and architecture for solving sets of linear equations with multiband coefficient matrix
Page(s): 361  371A parallel blockiterative algorithm for solving sets of linear equations with a positive multiband coefficient matrix is presented. The parallel structure is obtained by decoupling the sets of equations into subsets instead of partitioning the coefficient matrix into a lower and upper (block) triangular matrix. An important feature of the algorithm is that the coefficient matrices of the decoupled subsets are inverted by a novel direct algorithm. The global algorithm iterates using a GaussSeidellike method toward a solution. For this problem, a systolic algorithm/architecture is designed.<
> View full abstract» 
Scheduling a system of affine recurrence equations onto a systolic array
Page(s): 373  382Most work on the problem of scheduling computations on a systolic array is restricted to systems of uniform recurrence equations. This restriction is relaxed to include systems of affine recurrence equations. In this broader class, a sufficient condition is given for the system to be computable. Necessary and sufficient conditions are given for the existence of an affine schedule, along with a procedure that constructs the schedule vector when one exists.<
> View full abstract»