By Topic

Application Specific Array Processors, 1994. Proceedings. International Conference on

22-24 Aug. 1994

Filter Results

Displaying Results 1 - 25 of 41
  • On the injectivity of modular mappings

    Publication Year: 1994, Page(s):236 - 247
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (614 KB)

    Affine space-time mappings have been extensively studied for systolic array design and parallelizing compilation. However, there are practical important cases that require other types of transformations. This paper considers so-called modular mappings described by linear transformations modulo a constant vector. Sufficient conditions for these mappings to be one-to-one are investigated for rectang... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Proceedings of IEEE International Conference on Application Specific Array Processors (ASSAP'94)

    Publication Year: 1994
    Request permission for commercial reuse | PDF file iconPDF (315 KB)
    Freely Available from IEEE
  • Automated design of DSP array processor chips

    Publication Year: 1994, Page(s):33 - 44
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    Details are presented of the DAC (DSP ASIC Compiler) silicon compiler framework. DAC allows a non-specialist to automatically design DSP ASICs and DSP ASIC cores directly form a high level specification. Typical designs take only several minutes and the resulting layouts are comparable in area and performance to handcrafted designs View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Behavioral synthesis of high performance, low cost, and low power application specific processors for linear computations

    Publication Year: 1994, Page(s):45 - 56
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    Throughput has been widely traditionally recognized as the most popular performance metric for implementation of application specific computations. However, increasingly applications such as embedded controllers impose constraints on both throughput and latency as important metrics of speed. Although throughput alone can be arbitrarily improved for several classes of systems using previously publi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Access and alignment of arrays for a bidimensional parallel memory

    Publication Year: 1994, Page(s):346 - 356
    Cited by:  Patents (4)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (372 KB)

    Describes the use of a parallel memory system for a SIMD architecture for signal processing. This paper develops the Chinese linear skewing scheme in order: (1) to have conflict-free access to vectors of interest in signal processing; (2) to allow a simple computation of local addresses; and (3) to use 100% of the memory capacity. With a linear skewing scheme, the vectors fetched from the parallel... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Distributed control synthesis for data-dependent iterative algorithms

    Publication Year: 1994, Page(s):57 - 68
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (480 KB)

    Data-dependent control flow changes are typically implemented in complex general-purpose controllers. However, in medium to fine-grained iterative algorithms found in DSP and arithmetic, it is desirable for both cost and performance reasons to develop simplified and distributed control structures throughout the array architectures. We present a transformation technique to systematically convert an... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Constant-time triangulation problems on reconfigurable meshes

    Publication Year: 1994, Page(s):357 - 368
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (584 KB)

    Triangulating a set of points in the plane is a central theme in computer-aided manufacturing, robotics, CAD, VLSI design, geographic data processing, and computer graphics. Even more challenging are constrained triangulations, where a triangulation is sought in the presence of a number of constraints such as prescribed edges and/or forbidden areas. In this paper, we show that the flexibility of t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Rapid prototyping with programmable control paths

    Publication Year: 1994, Page(s):69 - 74
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (300 KB)

    The provision of a programmable control path allows a designer to experimentally build and evaluate many different instruction sets and data paths in a short period of time. For this approach to be practical, the designer needs a way to quickly modify the control path hardware to reflect the changes in the instruction set. To this end, we describe a flexible and efficient method for generating con... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Architectures for lattice structure based orthonormal discrete wavelet transforms

    Publication Year: 1994, Page(s):259 - 270
    Cited by:  Papers (7)  |  Patents (45)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (528 KB)

    This paper presents efficient single-rate architectures for the orthonormal discrete wavelet transform (DWT). Folded and digit-serial architectures are derived from an efficient lattice implementation of two-channel FIR paraunitary systems known as the quadrature mirror filter (QMF) lattice. Folded architectures are derived by applying systematic folding techniques to multirate systems. For digit-... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Data alignment of loop nests without nonlocal communications

    Publication Year: 1994, Page(s):439 - 450
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (536 KB)

    In this paper, how to distribute data to different memory modules and how to distribute computations to different processors for execution in a distributed memory parallel computer without nonlocal communications or with minimum nonlocal communications are addressed. Nonlocal communications are much more expensive compared to local communications, e.g., nearest neighbor shifts of data. Algorithms ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A scalable bit-sequential SIMD array for nearest-neighbor classification using the city-block metric

    Publication Year: 1994, Page(s):369 - 380
    Cited by:  Patents (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (588 KB)

    We present a fully scalable SIMD array architecture for a most efficient implementation of pattern classification by nearest-neighbor algorithms using the city-block metric. The elementary accumulator cell is highly optimized for a sequential accumulation of absolute integer differences, so that several hundreds of them can be easily integrated on a single chip. A two-dimensional M×N array s... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Minimizing memory requirements in rate-optimal schedules

    Publication Year: 1994, Page(s):75 - 86
    Cited by:  Papers (20)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (608 KB)

    We address the problem of minimizing buffer storage requirement in constructing rate-optimal compile-time schedules for multi-rate dataflow graphs. We demonstrate that this problem, called the Minimum Buffer Rate-Optimal (MBRO) scheduling problem, con be formulated as a unified linear programming problem. A novel feature of the method is that it tries to minimize the memory requirement while simul... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A data path array with shared memory as core of a high performance DSP

    Publication Year: 1994, Page(s):271 - 282
    Cited by:  Papers (8)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (656 KB)

    A data path array has been designed as core of a digital signal processor architecture for image processing applications. Data supply to data paths and exchange of data among data paths is performed via an on-chip shared memory with two-dimensional address space. Distribution of data onto these memory blocks enables simultaneous, conflict-free access to the shared memory by the data paths. Data th... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast linear Hough transform

    Publication Year: 1994, Page(s):1 - 9
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (352 KB)

    The Hough transform is the choice technique for identifying straight lines through digital images, with applications to high energy physics and computer vision. Classical methods for implementing the Hough transform of a N×N binary image require to compute N3 additions over n=log2(N) bits integers, hence nN3 bit operations per transform. We introduce a new ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal mapping of systolic algorithms by regular instruction shifts

    Publication Year: 1994, Page(s):224 - 235
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (496 KB)

    This paper addresses the problem of determining efficient mappings of systems of affine recurrence equations into regular arrays, in a nearly space-optimal fashion. A new nonlinear allocation technique is presented: the Instruction Shift. It allows to synthesize planar regular arrays without increasing the initial linear schedule. This technique is illustrated with the LLt Cholesky fact... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A parallel DSP-based neural network emulator with CMOS VLSI packet switching hardware

    Publication Year: 1994, Page(s):381 - 391
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (512 KB)

    This work describes a parallel neural network emulator which uses standard DSPs and application-specific VLSI communication processors with an integrated hardware routing algorithm. The use of DSPs as programmable processing elements enables the emulation of different types of neurons including biologically inspired models with learnable synaptic weights and delays, variable neuron gain, and stati... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A methodology for performance prediction of Sphinx I in multi-computer architectures

    Publication Year: 1994, Page(s):87 - 98
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (544 KB)

    A methodology is proposed for performance prediction of an application in a multi-computer architecture. The methodology introduces a two-step approach. In the first step the application (Sphinx I) is considered as a benchmark and its performance in uniprocessor systems is predicted from other standard benchmarks. In the second step the predicted performance values for the compositive modules of t... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Synthesis of a class of data format converters with specified delays

    Publication Year: 1994, Page(s):283 - 294
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (556 KB)

    We propose a design methodology for synthesis of a special class of Data Format Converters (DFCs) in which the site of I/O sequences and the delays between I/O sequences are specified. The need for such DFCs arises in many signal and image processing applications. Our DFCs are based on a two-dimensional architecture. The designs using our methodology have maximum throughput rate and are area-effic... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithms and architectures for hierarchical compression of video

    Publication Year: 1994, Page(s):10 - 21
    Cited by:  Patents (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (488 KB)

    The paper addresses the problem of collaborative video over “heterogeneous” networks. Current standards for video compression are not designed to deal with this problem. We define an additional set of metrics (ie., in addition to the standard rate versus distortion measure) to evaluate compression algorithms for this application. We also present an efficient algorithm and corresponding... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A fast pipelined FFT unit

    Publication Year: 1994, Page(s):143 - 151
    Cited by:  Papers (3)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (420 KB)

    This paper is dedicated to the presentation of the architecture of a VLSI butterfly processing element, for computing FFT in serial arithmetic. This butterfly PE uses complex samples and weights, with real and imaginary parts represented separately in full fractional two's complement form. The PE is based on a compact serial/parallel to serial complex multiplier, which optimises complex multiplica... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An efficient VLSI architecture for digital geometry

    Publication Year: 1994, Page(s):392 - 403
    Cited by:  Papers (6)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (644 KB)

    The main contribution of this work is to show that a number of fundamental digital geometry tasks can be solved fast on a novel VLSI architecture obtained by augmenting the mesh with multiple broadcast architecture (MMB) with precharged 1-bit row and column buses. The new architecture that we call mesh with hybrid buses (MHB) is readily implementable in VLSI with no increase in the area or the wir... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A processor-time-minimal schedule for the standard tensor product algorithm

    Publication Year: 1994, Page(s):176 - 187
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (632 KB)

    The paper, using a directed acyclic graph (dag) model of algorithms, investigates precedence constrained multiprocessor schedules for the n×n×n×n directed mesh. Its completion requires at least 4n-3 multiprocessor steps. Time-minimal multiprocessor schedules that use as few processors as possible are called processor-time-minimal. For the 4D mesh, such a schedule requires at leas... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimal synthesis of application specific heterogeneous pipelined multiprocessors

    Publication Year: 1994, Page(s):99 - 110
    Cited by:  Papers (5)  |  Patents (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (508 KB)

    We present a technique and formal model for optimal synthesis of specialized heterogeneous multiprocessors, given task flow graphs to be executed in a pipelined (periodic) fashion. SOS is a formal approach to system synthesis using mixed integer-linear programming, ensuring optimally of the final solutions. SOS was extended to cover the pipelined design style. The extensions were made while trying... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Designing systolic arrays for integer GCD computation

    Publication Year: 1994, Page(s):295 - 301
    Cited by:  Papers (2)
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (332 KB)

    We improve the classical result of Brent and Kung (1985) by a factor of 12 in area consumption, while maintaining the same average running time. Global broadcasting is eliminated using a novel technique which is more efficient then Leisersons (1982) semisystolic-to-systolic transformation and can be also applied to other arithmetic algorithms. Experiments using field programmable gate arrays demon... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A high performance IIR filter chip and its evaluation system

    Publication Year: 1994, Page(s):22 - 32
    Request permission for commercial reuse | Click to expandAbstract | PDF file iconPDF (380 KB)

    A highly flexible programmable IIR filter chip has been designed and fabricated to commercial requirements within a collaborative project involving several industrial partners. The device uses 8 highly regular 16 bit array multiplier-accumulators which have been pipelined to achieve an overall computational rate of 30 MHz using a 1 micron gate array process. Most significant bit first arithmetic h... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.