By Topic

Systolic Arrays, 1988., Proceedings of the International Conference on

Date 25-27 May 1988

Filter Results

Displaying Results 1 - 25 of 67
  • Proceedings of the International Conference on Systolic Arrays (Cat. No.88CH2603-9)

    Save to Project icon | Request Permissions | PDF file iconPDF (28 KB)  
    Freely Available from IEEE
  • An efficient systolic array for MVDR beamforming

    Page(s): 11 - 20
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (288 KB)  

    An efficient systolic array for computing the minimum variance distortionless response (MVDR) from an adaptive antennas array is described. The MVDR beamforming problem amounts to minimizing, in a least-squares sense, the combined output from an antenna array subject of K independent linear equality constraints each of which corresponds to a chosen 'look direction'. The array is fully pipelined and based on numerically stable algorithm that requires O(p/sup 2/+Kp) arithmetic operations per sample time, where p is the number of antenna elements.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of synthetic aperture radar algorithms on a systolic/cellular architecture

    Page(s): 21 - 30
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (480 KB)  

    Two sequences of operations necessary for implementation of high-resolution image formation in strip and spotlight modes of the synthetic-aperture radar (SAR) are presented. The sequences are mapped onto a systolic/cellular architecture. The mapping includes parallel implementation of all the basic operations and the pertinent data communication. Detailed estimates of the computation times are provided.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesizing optimal family of linear systolic arrays for matrix computations

    Page(s): 51 - 60
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (383 KB)  

    A method is proposed for designing a family of linear systolic arrays for matrix-oriented problems for which two-dimensional arrays have been designed. The design exhibits a tradeoff between local storage, s, and number of processing elements, n. The arrays are linear, with each processor having storage O(s),1> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Theory for systolizing global computational problems

    Page(s): 61 - 71
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (348 KB)  

    A theory is presented for rasterizing a class of two-dimensional problems including signal/image processing, computer vision, and linear algebra. The rasterization theory is steered by an isomorphic relationship between the multidimensional shuffle-exchange network (mDSE) and the multidimensional butterfly network (mDBN). Many important multidimensional signal-processing problems can be solved on a mDSE with a solution time approaching known theoretical lower bounds. The isomorphism between mDSE and mDBN is exploited by transforming and mDSE solution into its equivalent mDBN solution. A methodology for rastering the mDBN solution is developed. It turns out that not all mD algorithms can be rasterized. A sufficient condition for algorithm rasterization is given.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • New architectures for systolic hashing

    Page(s): 73 - 82
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (446 KB)  

    Two- and three-dimensional systolic architectures are proposed for the hash table data structure (hashing). The parallel systolic hashing architecture provides the facility for implementing the hash operations of Insert, Delete, and Member in a constant time complexity. The importance and advantages of extending sequential hashing to a parallelized form are discussed. An implementation is presented of a sorting problem of N numbers in an O(L) time complexity, where L is constant, using a three-dimensional parallelized systolic hashing process. This is compared to a sequential hashing process, which requires O(N) time complexity.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Linear systolic array for least-squares estimation

    Page(s): 83 - 92
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (333 KB)  

    The use of square-root-free linear systolic array structure to perform the QR decomposition needed in the solution of least-squares (LS) problems is proposed. A form of the Kalman filter algorithm is applied to perform the recursive LS estimation. Compared with the conventional triangular systolic array structure for LS estimation, the linear array has the advantage of requiring less area and being simpler for VLSI implementation.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A cellular algorithm for straight line extraction

    Page(s): 93 - 102
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (340 KB)  

    Straight-line-edge extraction can be carried out in two successive phases: identifying the pixels that belong to edges and conducting straight-line segments from these edge pixels. A parallel approach based on a cellular algorithm is proposed for the second phase. Each cell sends a message that compiles distances between a pattern segment and the real segment on the image. The value of the message identifies a segment and codifies its length and endpoints. If the parameters of the algorithm are properly chosen, it can be adjusted to different kinds of contours: noised or blurred edges and disconnected segments. The algorithm takes computation time proportional to the linear dimension of the image (for an image of N*N pixels the linear dimension is N) and the number of generalized directions.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A one dimensional systolic array for solving arbitrarily large least mean square problems

    Page(s): 103 - 112
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (461 KB)  

    The design is presented of a one-dimensional systolic array for solving arbitrarily large least-mean-square problems involving QR decomposition and a triangular system of equations. The main characteristics of this array are maximization of array utilization, thus achieving a minimum global computation time, and low complexity of the resulting array, which can also be used in problems such as matrix-by-vector, matrix-by-matrix, and LU decomposition. Two systolic algorithms for QR decomposition have been designed. Their chained execution is shown.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Performance evaluation of the HERMES multibit systolic array architecture for low level processing tasks

    Page(s): 113 - 124
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (444 KB)  

    The performance of the various parts of the HERMES multiprocessor vision system is evaluated. HERMES is an autonomous, hierarchical, heterogenic vision processing system, consisting of N/sup 2//4/sup i/, 0> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On partitioning the Faddeev algorithm

    Page(s): 125 - 134
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (455 KB)  

    Partitioned schemes for computing the Faddeev algorithm are derived, using a graph-based methodology. Such implementations are obtained by performing transformations on the fully parallel dependence graph of the algorithm. Linear and two-dimensional structures are derived and evaluated in terms of throughput, I/O bandwidth, utilization of processing elements, and overhead due to partitioning. The partitioned implementation are compared with schemes previously proposed. It is shown that throughput of both the linear and two-dimensional arrays derived here tends to 3m/7n/sup 3/ for n*n matrices, where m is the number of cells and utilization tends to 1. A two-dimensional scheme that is more efficient and has less overhead than others previously proposed is derived. It is shown that for partitioned implementations with the same number of cells, a linear array performs better, its implementation is easier, and it is better suited for fault-tolerant capabilities than a two-dimensional one.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A systolic architecture for the symmetric tridiagonal eigenvalue problem

    Page(s): 145 - 150
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (164 KB)  

    The first step in the development of a chip set to support eigenvalue-eigenvector-based estimation algorithms is presented. It is based on the assumption that an averaging technique will produce a symmetric covariance matrix. Such a matrix can be reduced to a symmetric tridiagonal matrix, and hence the eigenvalues and eigenvectors can be found by successive iterations involving QR decomposition. The architecture is unique in that other architectures either solve only for the eigenvalues or use methods other than QR iteration. It has potential for use in a systolic computer for computer intensive digital signal processing based on modern spectral-analysis techniques.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Stereo matching of satellite images with transputers

    Page(s): 175 - 182
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (355 KB)  

    A demanding problem involving several algorithmic phases with varying degrees of regularity and data dependence is used to show that a network of transputers programmed in OCCAM has all the attributes needed to explore several processing paradigms. Two alternative organizations of the problem on a network of 21 transputers are compared from the standpoints of speed, hardware efficiency, and ease of programming. Two highly parallel implementations of an algorithm that constitutes part of a real-time system to generate terrain relief maps from satellite stereo image pairs have been programmed. An optimum strategy that demonstrates the power of MIMD (multiple instruction, multiple data streams) parallel computing is determined.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A million transistor systolic array graphics engine

    Page(s): 193 - 202
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (621 KB)  

    A description is given of a million transistor systolic array graphics engine (SAGE) that can render a horizontal 3-D span in every clock cycle at the rate of 25 million spans/s, independent of the pixel length of the span. For the average span length in the 10-32 pixel range, this translates into 250-800 million pixels/s. Assuming that the front end of the system can generate a span in every clock cycle, then in the best case SAGE polygon performance is 25,000,000 polygons/s for 100 pixel polygons and drops down to 750000 polygons/s as the average area of the polygons increases to 1000 pixels. For example, a system using the SAGE chip has the potential to interactively display the motion and time behavior of a 20000 polygon scene at a rate of 30 frames/s. In the extreme case where all spans are 1024 pixels long, SAGE operates at a peak parallel pixel-processing rate of 25000 million pixels/s.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A massively parallel systolic array processor system

    Page(s): 217 - 225
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (428 KB)  

    The design of a massively parallel processor, comprised of 2304-bit-serial processor elements arranged in a 48 by 48 systolic array, is described. The system consists of the processor array, a microstore controller, and a host computer interface. Program development tools are available on the host computer. The processor array uses 32 NCR GAPP (Geometric Arithmetic Parallel Processor) microprocessor chips, while the microstore controller is implemented with a TMS32010 DSP chip and TTL (transistor-transistor logic) circuitry. Utilizing the nearest neighbor communication capabilities of the GAPP, the array receives data from the host at the south end of the array, outputs data to the host at the north edge of the array, and can wrap data between either the east and west or north and south edges. The array can also be configured as a linear array of 2304 processor elements. The microstore controller interfaces with the host and facilitates downloading of GAPP array machine code, provides for the debugging and monitoring of GAPP array execution from the host, and implements user-defined instructions.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation of array structured maximum likelihood decoders

    Page(s): 227 - 236
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (364 KB)  

    Efficient VLSI array processor architectures for maximum-likelihood decoding (MLD) have been developed to meet the high throughput and data processing requirements of modern communication systems. Both 1-D and 2-D MLD processors with large constraint length (>8) have been derived. Radix-4p processing elements and delay commutating switching processors for MLD have been concatenated to construct a pipeline MLD processor. The pipeline length can be adapted to meet the time/area constraints for various applications. A 2-D MLD array processor is also presented. Processing data are modulized, data transmission are embedded into processing elements, and a fixed-size 2-D MLD array is derived to meet high-data-throughput requirements.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Regular processor arrays for matrix algorithms with pivoting

    Page(s): 237 - 246
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (487 KB)  

    It is shown how to obtain regular (though nonsystolic) processor arrays for algorithms with pivoting. First, the fact that pivoting algorithms cannot be systolic is established. Then it is shown how regular iterative algorithms can be formulated for the Gaussian elimination algorithm with partial pivoting and how the algorithm can then be implemented on the so-called regular iterative arrays (locally connected arrays of essentially identical processor modules, with register pipelines and/or LIFO (last-in/first-out) buffers in some of the links).<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systolic algorithms for some scheduling and graph problems

    Page(s): 247 - 256
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (428 KB)  

    A simple model of a linear systolic array with serial input/output and one-way data communication is considered. It is shown that such an array can be used to solve some scheduling and graph problems efficiently. The systolic algorithms are developed in two stages. First an algorithm on a restricted type of sequential machine is constructed. Then the sequential machine algorithm is transformed into a systolic algorithm. The transformation can be done automatically and efficiently.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The design of a systolic array system for linear state equations

    Page(s): 275 - 284
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (362 KB)  

    The dependence-graph (DG) approach is extended and applied to the systematic design of a systolic array system. Two DGs that represent two different but data-dependent process algorithms are first linked together. Tag bits are added onto index nodes in this linked DG and used to indicate the different functions to be executed on single processor element. By applying the conventional time-scheduling and node-assignment procedures to this tagged DG, the interfacing communication problem of a systolic array system can be solved and the optimal latency can be easily obtained. Using this method, an optimal linear-state solver has been designed.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Mapping strategy for automatic design of systolic arrays

    Page(s): 285 - 294
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (330 KB)  

    A mapping strategy for automatic design of systolic arrays is presented. Algorithms are specified in terms of data dependency and identity, and implementations are specified in terms of data propagation and sequence behavior. By establishing a relation between data propagation and sequence, an optimal mapping strategy is formulated as a problem of finding an integer solution of a set of linear equations. This approach provides a uniform framework to design a variety of systolic arrays. An automatic design program and some design examples are presented.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • The derivation of regular synchronous circuits

    Page(s): 305 - 314
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (405 KB)  

    An approach to derive parameterized representations of regular synchronous circuits from their specification is presented. The derivation of designs consists of two steps: rewriting the specification in terms of predefined structures to obtain a draft architecture, and optimizing that architecture by successive correctness-preserving transformations using algebraic theorems. These steps can be repeated to obtain, at a lower level of abstraction, architectures that still satisfy the original specification. A number of word-level and bit-level rank evaluator designs are developed to illustrate the techniques describes.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Systolic arrays for group explicit methods for solving parabolic partial differential equations

    Page(s): 315 - 329
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (369 KB)  

    A systolic array implementation for solving parabolic equations numerically is presented. The finite-difference methods used are stable asymmetric approximations to the partial differential equations, which when coupled in groups of two adjacent points on the grid result in implicit equations that are easily converted to explicit form, thus offering many advantages suitable for solution by VLSI techniques. The regularity obtained from the grid structure and locality of data from groups of small size, combined with the attributes of truncation error cancellations and alternating the strategies of grid points, give unconditional stability and an efficient, systolic design.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallel algorithms and systolic array designs for RSA cryptosystem

    Page(s): 341 - 350
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (360 KB)  

    Two algorithms for computing very large integer modular exponentiation are proposed. One is based on a recording technique that significantly reduces the total number of modular multiplications. The second is parallel algorithm that can be implemented by two parallel processors and achieves optimal performance. Two corresponding systolic array designs are developed. The main advantage of these systolic architectures is to provide a potentially higher throughput for a large number of computations, namely, encryptions and decryptions in an RSA cryptosystem.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A systolic algorithm and architecture for solving sets of linear equations with multi-band coefficient matrix

    Page(s): 361 - 371
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (330 KB)  

    A parallel block-iterative algorithm for solving sets of linear equations with a positive multiband coefficient matrix is presented. The parallel structure is obtained by decoupling the sets of equations into subsets instead of partitioning the coefficient matrix into a lower and upper (block) triangular matrix. An important feature of the algorithm is that the coefficient matrices of the decoupled subsets are inverted by a novel direct algorithm. The global algorithm iterates using a Gauss-Seidel-like method toward a solution. For this problem, a systolic algorithm/architecture is designed.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling a system of affine recurrence equations onto a systolic array

    Page(s): 373 - 382
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (398 KB)  

    Most work on the problem of scheduling computations on a systolic array is restricted to systems of uniform recurrence equations. This restriction is relaxed to include systems of affine recurrence equations. In this broader class, a sufficient condition is given for the system to be computable. Necessary and sufficient conditions are given for the existence of an affine schedule, along with a procedure that constructs the schedule vector when one exists.<> View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.