By Topic

Field-Programmable Custom Computing Machines, 1999. FCCM '99. Proceedings. Seventh Annual IEEE Symposium on

Date 23-23 April 1999

Filter Results

Displaying Results 1 - 25 of 52
  • Seventh Annual IEEE Symposium on Field-Programmable Custom Computing Machines (Cat. No.PR00375)

    Save to Project icon | Request Permissions | PDF file iconPDF (297 KB)  
    Freely Available from IEEE
  • Author index

    Page(s): 318 - 319
    Save to Project icon | Request Permissions | PDF file iconPDF (9 KB)  
    Freely Available from IEEE
  • A super-serial Galois fields multiplier for FPGAs and its application to public-key algorithms

    Page(s): 232 - 239
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (236 KB)  

    This contribution introduces a scalable multiplier architecture for Galois field GF(2k) amenable for field programmable gate arrays (FPGAs) implementations. This architecture is well suited for the implementation of public-key cryptosystems which require programmable multipliers in large Galois fields. The architecture trades a reduction in resources with an increase in the number of clock cycles. This architecture is also fine grain scalable in both the time and the area (or logic) dimensions thus facilitating implementations that maximize their use of finite FPGA resources while achieving fast computational speed. This leads to an architecture that requires less resources than traditional bit serial multipliers, which we demonstrated with implementations of multipliers in the field GF(2167). Our results demonstrate that for this field one can realize super-serial multipliers that use 2.76 times fewer function generators and 6.84 times fewer flip-flops than their serial multiplier counterparts. We also extrapolated the performance of these multipliers in an elliptic curve cryptosystem View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Hybrid data/configuration caching for striped FPGAs

    Page(s): 294 - 295
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (156 KB)  

    Most custom computing machine (CCM) design has centered around field-programmable gate array (FPGA) technology and rapid prototyping applications. FPGAs are reconfigured to map parts of the application. The performance of an FPGA when used as a virtual hardware engine depends on its reconfiguration granularity. We study the striped FPGA and propose a hybrid mechanism to process a large amount of data using a combination of data and configuration caching View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An edge-endpoint-based configurable hardware architecture for VLSI CAD layout design rule checking

    Page(s): 158 - 167
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    Design rule checking (DRC) is an important step in VLSI design in which the widths and spacings of design features in a VLSI circuit layout are checked against the design rules of a particular fabrication process. In the past, some efforts to build hardware accelerators for DRC have been proposed, but these efforts were hobbled by the fact that it is often impractical to build a different rule-checking ASIC each time design rules or fabrication processes change. In this paper, we propose a configurable hardware approach to DRC. Because the rule-checking is built in configurable hardware, it can garner impressive speedups over software approaches, while retaining the flexibility needed to easily change the rule checker as rules or processes change. Our work proposes an edge-endpoints-based method for performing Manhattan geometry checking; this approach is particularly well-suited to the constraints of configurable hardware. Although design rules do change over time, their intrinsic similarity allows us to propose a general scalable architecture for DRC. We then demonstrate our approach by applying this architecture to a set of design rules for the MOSIS SCN4N SUB process. The hardware required per rule is quite small; we have implemented several design rule checks within a single Xilinx XC4013 FPGA. Our hardware, implemented on a Pamette board, runs at a clock rate of 33 MHz. We also compare the performance of our approach to software methods and demonstrate overall speedups in excess of 25X View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ICARUS: a dynamically reconfigurable computer architecture

    Page(s): 278 - 279
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    ICARUS (Image Computing, Automatically Reconfigurable, Unlimited Scale), is an architecture for general purpose parallel computing. The current implementation uses standard FPGAs in novel ways, has no host CPU and differs in many ways from the “fixed CPU plus variable EPCA” approach to computing. Different instruction set architectures (ISAs) are loaded automatically during runtime. Two key architectural elements are S-Machines (Symbolic Machines), and T-Machines (Transaction Machines) View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A virtual hardware handler for RTR systems

    Page(s): 262 - 263
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (28 KB)  

    The design of a Virtual Hardware Handler for run-time reconfiguration is presented. A windows-based system that works with the VCC Hotworks board has been implemented and results are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Pipeline vectorization for reconfigurable systems

    Page(s): 52 - 62
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (312 KB)  

    This paper presents pipeline vectorization, a method for synthesizing hardware pipelines in reconfigurable systems based on software vectorizing compilers. The method improves efficiency and ease of development of reconfigurable designs, particularly for users with little electronics design experience. We propose several loop transformations to customize pipelines to meet hardware resource constraints, while maximising available parallelism. For ran-time reconfigurable systems, we apply hardware specialization to increase circuit utilization. Our approach is especially effective for highly repetitive computations in DSP and multimedia applications. Case studies using FPGA-based platforms are presented to demonstrate the benefits of our approach and to evaluate trade-offs between alternative implementations. The loop tiling transformation, for instance, has been found to improve performance by 30 to 40 times above a PC-based software implementation, depending on whether run-time reconfiguration is used View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A virtual logic algorithm for solving satisfiability problems using reconfigurable hardware

    Page(s): 306 - 307
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (16 KB)  

    Satisfiability (SAT) is a computationally expensive algorithm central to computer science. In this paper, we present a virtual logic algorithm that allows an FPGA based reconfigurable comparing platform to process SAT solver circuits much larger than its available capacity. Our algorithm is based on decomposition techniques that create independent subproblems (pages) that fit the size of the available reconfigurable hardware. Those pages can take turns reusing the platform, and creating a virtual logic environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Algorithm analysis and mapping environment for adaptive computing systems: further results

    Page(s): 264 - 265
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (140 KB)  

    We are developing an integrated algorithm analysis and mapping environment particularly tailored for signal processing applications on Adaptive Computing Systems (ACS). Our environment allows a designer to map signal processing algorithms to an ACS faster, by an order of magnitude, than is currently possible View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reducing compilation time of Zhong's FPGA-based SAT solver

    Page(s): 308 - 309
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    We present schemes to reduce the compilation time of configurable hardware that solves Boolean satisfiability. The SAT solver presented by Zhong et al. (1998) has a large compilation time overhead mainly due to placement and routing of many FPGAs. We attack the problem on three fronts. First, we partition the SAT solver into instance-specific and instance non-specific components. Secondly, we transform SAT instances into canonical forms; and finally we present a board-level multiple-chip architecture to solve large SAT instances. All these efforts amount to a reduction in placement and routing time to configure the configurable hardware. We are able to reduce the compilation time to mere routing time of the implication circuits for each instance of the SAT problem, given the best scenario View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Field programmable gate array based radar front-end digital signal processing

    Page(s): 178 - 187
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (104 KB)  

    As Field programmable Gate Array (FPGA) technology has steadily improved, FPGAs are now viable alternatives to other technology implementations for high-speed classes of digital signal processing (DSP) applications. In particular, radar front-end signal processing, an application formerly dominated by custom very large scale integration (VLSI) chips, may now be a prime candidate for migration to FPCA technology. As this paper demonstrates, current FPGA devices have the power and capacity to implement a FIR filter with the performance and specifications of an existing, in-system, front-end signal processing custom VLSI chip. A 512-tap, 18-bit FIR filter was built that could achieve sample rates of 5 MHz (with a clock rate of at least 40 MHz) using Xilinx Virtex FPGA technology, and was demonstrated through simulation. Distributed arithmetic was determined to be the most optimal structure for a FPGA FIR design, although future research may show that fast FIR algorithms or filtering in the frequency domain might give better results View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • A reconfigurable platform for academic purposes

    Page(s): 282 - 283
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (136 KB)  

    Labomat 3 is a reconfigurable platform for teaching and research purposes developed by our laboratory. The main features of the board are: (1) a microprocessor associated with two mid-range FPGAs, (2) a powerful multitasking real-time operating system including a JavaVM, (3) easy to use design tools, and (4) a networking interface. In this paper we describe the hardware and software of the board as well as some application domains View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Fast online placement for reconfigurable computing systems

    Page(s): 300 - 302
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    Advances in programmable hardware have lead to new architectures where the hardware can be dynamically adapted to the application to gain better performance. There are still many challenging problems to be solved before any practical general-purpose reconfigurable system is built. One fundamental issue is the placement of the modules on the reconfigurable functional unit (RFU). In this paper we present an online heuristic placement method with overall O(n log n) space complexity and O(log n) time complexity for each insertion/deletion of modules on the RFU chip, n being the number of modules currently on the RFU. Our proposed method is O(n) faster than an algorithm which considers all possible locations for placing a new module, but as experimental results show its quality is 7% worse View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Optimizing FPGA-based vector product designs

    Page(s): 188 - 197
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (432 KB)  

    This paper presents a method, called multiple constant multiplier trees (MCMTs), for producing optimized reconfigurable hardware implementations of vector products. An algorithm for generating MCMTs has been developed and implemented, which is based on a novel representation of common subexpressions in constant data patterns. Our optimization framework covers a wider solution space than previous approaches; it also supports exploitation of full and partial run-time reconfiguration as well as technology-specific constraints, such as fanout limits and routing. We demonstrate that while distributed arithmetic techniques require storage size exponential in the number of coefficients, the resource utilization of MCMTs usually grows linearly with problem size. MCMTs have been implemented in Xilinx 4000 and Virtex FPGAs, and their size and speed efficiency are confirmed in comparisons with Xilinx LogiCore and ASIC implementations of FIR filter designs. Preliminary results show that the size of MCMT circuits is less than half of that of comparable distributed arithmetic cores View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Development system for FPGA-based digital circuits

    Page(s): 266 - 267
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (28 KB)  

    The paper discusses some new hardware and software tools that can be used for the design of virtual circuits based on dynamically reconfigurable FPGAs. With the aid of these tools we can implement a system that requires some, hardware resources Rc, on available hardware that has resources Rh, where Rc>Rh. The main idea of the approach supported by these tools is the rational combination of FPGA capabilities with some proposed methods for producing a modifiable specification, together with a novel technique for architectural and logic synthesis, which has been incorporated into the new design environment View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementation and evaluation of a prototype reconfigurable router

    Page(s): 44 - 50
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (52 KB)  

    The evolution of computer networking technology will likely require hardware that is flexible enough to adapt to changing standards while maintaining the highest possible performance. Much research has recently been done in active networks, which increase network flexibility by allowing the routers to be reprogrammed, often at the cost of lower throughput. A reconfigurable router implemented on a Custom Computing Machine (CCM) can provide the flexibility required for active networking while approaching the high throughput of inflexible application-specific integrated circuit (ASIC)-based routers. This paper presents an implementation of a prototype reconfigurable router on the Wildforce platform. The prototype implements IPv4 routing with a throughput of up to 576 Mbps, using a stream-based approach that facilitates dynamic reconfiguration View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerating run-time reconfiguration on FCCMs

    Page(s): 260 - 261
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (32 KB)  

    The paper describes the implementation of the arithmetic operations of multiplication, division and square root on a Xilinx XC6200 FPGA. By using a design approach to enhance similarities across circuits, partial reconfiguration has been used to allow reductions in reconfiguration times of up to 75% on trials using the VCC HOTWorks board View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Accelerating an IR automatic target recognition application with FPGAs

    Page(s): 290 - 291
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (20 KB)  

    An infrared automatic target recognition (IR ATR) application is accelerated with an FPGA co-processor board. The board features and the application are first stated. The FPGA design is then described. The achieved performance is reported and analyzed at the end View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • PCI-PipeRench and the SWORDAPI: a system for stream-based reconfigurable computing

    Page(s): 200 - 208
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (76 KB)  

    Reconfigurable hardware accelerators have been shown to be flexible and efficient in stream-based applications. In this paper, we discuss the design of PCI-PipeRench and the SWORDAPI. PCI-PipeRench is a coprocessor utilizing the PipeRench architecture which includes on-chip control and data buffering to interface with a host processor over a PCI bus. SWORDAPI calls resemble standard C file control functions, and allow developers to create applications Independent of underlying reconfigurable hardware details. In addition, the SWORDAPI provides a cosimulation environment so that verification can be performed using unmodified application source with a hardware simulator. Efficient utilization of the bus is of critical importance in the design of such a system; various methods used to address this issue are presented View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Reconfigurable pipelines in VLIW execution units

    Page(s): 298 - 299
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (56 KB)  

    The basic question addressed was whether the greater hardware utilization offered by reconfigurable functional units could compensate for a reduction in clock rate. That is, would the average number of clock cycles per instruction improve sufficiently to compensate for a slower clock rate to yield a net improvement in performance? The architecture limited the number of execution units because of constraints external to the execution units. An example of such a constraint would be limited efficient access to a common register file. Under these constraints, inefficient matching of program needs with hardware capabilities limits performance because of hardware idle cycles. The work found very limited conditions that can yield performance improvement using reconfigurable pipelines when compared with static pipelines. Greater promise was found with instruction set enhancements made possible by the reconfigurable pipelines View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FAFNER-accelerating nesting problems with FPGAs

    Page(s): 168 - 176
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (128 KB)  

    The nesting problem consists of defining the cutting plan of a piece of raw material in smaller irregular shapes, and has applications in the apparel and footwear industries. Due to its NP-hard nature, the optimal solution can only be guaranteed by exhaustively trying all possible solutions and choosing the best one. Because this is impractical in real-life industrial problems, automatic approaches are based on optimization meta-heuristics that search for sub-optimal but good enough solutions. These optimization techniques rely on the construction and evaluation of several solutions, thus requiring heavy geometric manipulation of the irregular polygons that constitute the problem data. Efficient processing of this geometric information is thus necessary to make effective fully automatic approaches to nesting problems in industrial environments. This paper describes FAFNER, an FPGA-based custom computing machine that is used to accelerate the geometric operations, that are in the core of heuristic solutions to the nesting problem. The system is used as an auxiliary processor attached to a low cost personal computer, and combines a custom programmable processor with an array of custom circuits for the processing of irregular polygons View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An FPGA-based fan beam image reconstruction module

    Page(s): 312 - 313
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (24 KB)  

    Filtered Back-Projection (FBP) is a well-known algorithm for reconstruction of tomographic images from projections. Some of FBP's highlights are: (i) it allows agile software implementations, and; (ii) it produces images of good quality, i.e., relatively free of artifacts. Our goal is to reconstruct images from fan beam projections collected by detectors set in a linear array View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • ConCISe: a compiler-driven CPLD-based instruction set accelerator

    Page(s): 92 - 101
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (396 KB)  

    We propose a smart compilation chain in which the compiler is no longer limited by a pre-defined instruction set, but can generate application-specific custom instructions and synthesise them in Field-Programmable Logic. We also present a RISC micro-architecture enhanced by a CPLD-based Reconfigurable Functional Unit (RFU) which supports our compiler approach. The main difference between our smart compiler and similar methods is the ability to encode multiple custom instructions in a single RFU configuration, cross-minimising the logic among them. The objective is to reduce (or eliminate) the reconfiguration overhead and optimise the utilisation of resources. The CPLD core that implements the RFU is based on the Philips XPLA2 architecture. We discuss the advantages of using the XPLA2 instead of conventional FPGAs. Application examples are also presented, which show that our RFU-extended CPU can achieve speed-ups of more than 40% for encryption algorithms, when compared to the standard CPU core alone View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Parallelizing applications into silicon

    Page(s): 70 - 80
    Save to Project icon | Request Permissions | Click to expandQuick Abstract | PDF file iconPDF (248 KB)  

    The next decade of computing will be dominated by embedded systems, information appliances and application-specific computers. In order to build these systems, designers will need high-level compilation and CAD tools that generate architectures that effectively meet the needs of each application. In this paper we present a novel compilation system that allows sequential programs, written in C or FORTRAN, to be compiled directly into custom silicon or reconfigurable architectures. This capability is also interesting because trends in computer architecture are moving towards more reconfigurable hardware-like substrates, such as FPGA based systems. Our system works by successfully combining two resource-efficient computing disciplines: Small Memories and Virtual Wires. For a given application, the compiler first analyzes the memory access patterns of pointers and arrays in the program and constructs a partitioned memory system made up of many small memories. The computation is implemented by active computing elements that are spatially distributed within the memory array. A space-time scheduler assigns instructions to the computing elements in a way that maximizes locality and minimizes physical communication distance. It also generates an efficient static schedule for the interconnect. Finally, specialized hardware for the resulting schedule of memory accesses, wires, and computation is generated as a multi-process state machine in synthesizable Verilog. With this system, implemented as a set of SUIF compiler passes, we have successfully compiled programs into hardware and achieve specialization performance enhancements by up to an order of magnitude versus a single general purpose processor. We also achieve additional parallelization speedups similar to those obtainable using a tightly-interconnected multiprocessor View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.