2017 IEEE 25th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)

April 30 2017-May 2 2017

Filter Results

Displaying Results 1 - 25 of 67
  • [Front cover]

    Publication Year: 2017, Page(s): c1
    Request permission for commercial reuse | |PDF file iconPDF (914 KB)
    Freely Available from IEEE
  • [Title page i]

    Publication Year: 2017, Page(s): i
    Request permission for commercial reuse | |PDF file iconPDF (97 KB)
    Freely Available from IEEE
  • [Title page iii]

    Publication Year: 2017, Page(s): iii
    Request permission for commercial reuse | |PDF file iconPDF (134 KB)
    Freely Available from IEEE
  • [Copyright notice]

    Publication Year: 2017, Page(s): iv
    Request permission for commercial reuse | |PDF file iconPDF (114 KB)
    Freely Available from IEEE
  • Table of contents

    Publication Year: 2017, Page(s):v - ix
    Request permission for commercial reuse | |PDF file iconPDF (161 KB)
    Freely Available from IEEE
  • A Message from the General Chair and Program Chair

    Publication Year: 2017, Page(s):x - xi
    Request permission for commercial reuse | |PDF file iconPDF (113 KB)
    Freely Available from IEEE
  • Program Committee

    Publication Year: 2017, Page(s):xii - xiii
    Request permission for commercial reuse | |PDF file iconPDF (98 KB)
    Freely Available from IEEE
  • Additional Reviewers

    Publication Year: 2017, Page(s): xiv
    Request permission for commercial reuse | |PDF file iconPDF (73 KB)
    Freely Available from IEEE
  • Sponsors

    Publication Year: 2017, Page(s): xv
    Request permission for commercial reuse | |PDF file iconPDF (200 KB)
    Freely Available from IEEE
  • High-Performance Hardware Merge Sorter

    Publication Year: 2017, Page(s):1 - 8
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (357 KB) | HTML iconHTML

    State-of-the-art studies show that FPGA-based hardware merge sorters (HMSs) can achieve superior performance compared with optimized algorithms on CPUs and GPUs. The performance of any HMS is proportional to its operating frequency (F) and the number of records that can be output each cycle (E). However, all existing HMSs have a problem that F drops significantly with increasing E due to the incre... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Communication-Aware MCMC Method for Big Data Applications on FPGAs

    Publication Year: 2017, Page(s):9 - 16
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (252 KB) | HTML iconHTML

    Markov Chain Monte Carlo (MCMC) based methods have been the main tool for Bayesian Inference for some years now, and recently they find increasing applications in modern statistics and machine learning. Nevertheless, with the availability of large datasets and increasing complexity of Bayesian models, MCMC methods are becoming prohibitively expensive for real-world problems. At the heart of these ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Terabyte Sort on FPGA-Accelerated Flash Storage

    Publication Year: 2017, Page(s):17 - 24
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (214 KB) | HTML iconHTML

    Sorting is one of the most fundamental and useful applications in computer science, and continues to be an important tool in analyzing large datasets. An important and challenging subclass of sorting problems involves sorting terabyte scale datasets with hundreds of billions of records. The conventional method of sorting such large amounts of data is to distribute the data and computation over a c... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Improved Synthesis of Compressor Trees on FPGAs in High-Level Synthesis

    Publication Year: 2017, Page(s): 25
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (111 KB) | HTML iconHTML

    In this paper, an approach to synthesize compressor trees in High-level Synthesis (HLS) for FPGAs is proposed. Our approach utilizes the bit-level information to improve the compressor tree synthesis. To obtain the bit-level information targeting compressor tree synthesis, a modified bitmask analysis technique based on prior work is proposed. A series of experimental results show that, compared to... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • SWiF: A Simplified Workload-Centric Framework for FPGA-Based Computing

    Publication Year: 2017, Page(s): 26
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (157 KB) | HTML iconHTML

    In this paper, we introduce SWiF - Simplified Workload-intuitive Framework - a workload-centric, application programming framework designed to simplify the large-scale deployment of FPGAs in end-to-end applications. SWiF can intelligently mediate access to shared resources by orchestrating the distribution and scheduling of tasks across a heterogeneous mix of FPGA and CPU resources in order to imp... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Megrez: Parallelizing FPGA Routing with Strictly-Ordered Partitioning

    Publication Year: 2017, Page(s): 27
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (111 KB) | HTML iconHTML

    FPGAs play a crucial role in the space of customizable accelerators over the next few years. A chief limiting factor is that FPGA CAD tools are cumbersome and time-consuming to most application developers. Routing is the most complex step in FPGA design flow and NP-complete problem. The PathFinder routing algorithm is in dominant use in FPGA CAD research. However, PathFinder is sequential in natur... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An FPGA Design Framework for CNN Sparsification and Acceleration

    Publication Year: 2017, Page(s): 28
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (101 KB) | HTML iconHTML

    Convolutional neural networks (CNNs) have recently broken many performance records in image recognition and object detection problems. The success of CNNs, to a great extent, is enabled by the fast scaling-up of the networks that learn from a huge volume of data. The deployment of big CNN models can be both computation-intensive and memory-intensive, leaving severe challenges to hardware implement... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA Delay Model Considering Logic-Level and Transistor-Level Parameters

    Publication Year: 2017, Page(s): 29
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (109 KB) | HTML iconHTML

    Field programmable gate arrays (FPGAs) have been widely used in various application areas, such as industrial control, telecommunications and signal processing. To meet the diverse demands, each generation of commercial FPGAs comes with the architectural enhancements and a number of options covering specific application domains. The optimized architectures are usually generated from a time-consumi... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Scheduling Considerations for Voter Checking in TMR-MER Systems

    Publication Year: 2017, Page(s): 30
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (124 KB) | HTML iconHTML

    Field-Programmable Gate Arrays (FPGAs) are susceptible to radiation-induced Single Event Upsets (SEUs). A common technique for dealing with SEUs is Triple Modular Redundancy (TMR) combined with Module-based configuration memory Error Recovery (MER). By triplicating components and voting on their outputs, TMR helps localize the configuration memory errors, and by reconfiguring the faulty component,... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bit-Width Based Resource Partitioning for CNN Acceleration on FPGA

    Publication Year: 2017, Page(s): 31
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (111 KB) | HTML iconHTML

    Convolutional neural networks (CNNs) have achieved great success in many applications. Recently, various FPGA-based accelerators have been proposed to improve the performance of CNNs. However, current most FPGA-based methods use single bit-width selection for all CNN layers, which lead to very low resource utilization efficiency and difficulty in further performance improvement. In this paper, we ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • On Bit-Serial NoCs for FPGAs

    Publication Year: 2017, Page(s):32 - 39
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (14146 KB) | HTML iconHTML

    We can build lightweight bit-serial FPGA NoC routers that cost 20 LUT, 17 FF per router and operate at 800-900 MHz speeds. Each bit-serial router implements deflection-routing on a unidirectional torus topology requiring lb-wide connection per port. The key ideas that enable this implementation are (1) reformulation of the dimension-ordered routing (DOR) function using compact 1 LUT, 1 FF streamin... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Implementing FPGA Overlay NoCs Using the Xilinx UltraScale Memory Cascades

    Publication Year: 2017, Page(s):40 - 47
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (1905 KB) | HTML iconHTML

    We can enhance the performance and efficiency of deflection-routed FPGA overlay NoCs by exploiting the cascading featureof the Xilinx UltraScale BlockRAMs. This allows us to (1) hardenthe multiplexers in the NoC switch crossbars, and (2) efficientlyadd buffering support to deflection-routing. While buffering isnot required for correct operation of a deflection routed NoC, it can boost network thro... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Efficient GPGPU Computing with Cross-Core Resource Sharing and Core Reconfiguration

    Publication Year: 2017, Page(s):48 - 55
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (513 KB) | HTML iconHTML

    GPUs are capable of running a variety of applications, however their generic parallel-architecture can lead to inefficient use of resources and reduced power efficiency, due to algorithmic or architectural constraints. In this work, taking inspiration from CGRAs (coarse-grained reconfigurable architectures), we demonstrate resource sharing and re-distribution as a solution that can be leveraged by... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • An Architecture for the Acceleration of a Hybrid Leaky Integrate and Fire SNN on the Convey HC-2ex FPGA-Based Processor

    Publication Year: 2017, Page(s):56 - 63
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (2603 KB) | HTML iconHTML

    Neuromorphic computing is expanding by leaps and bounds through custom integrated circuits (digital and analog), and large scale platforms developed by industry or government-funded projects (e.g. TrueNorth and BrainScaleS, respectively). Whereas the trend is for massive parallelism and neuromorphic computation in order to solve problems, such as those that may appear in machine learning and deep ... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • FPGA-Based Real-Time Charged Particle Trajectory Reconstruction at the Large Hadron Collider

    Publication Year: 2017, Page(s):64 - 71
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (729 KB) | HTML iconHTML

    The upgrades of the Compact Muon Solenoid particle physics experiment at CERN's Large Hadron Collider provide a major challenge for the real-time collision data selection. This paper presents a novel approach to pattern recognition and charged particle trajectory reconstruction using an all-FPGA solution. The challenges include a large input data rate of about 20 to 40 Tbps, processing a new batch... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.
  • Bonded Force Computations on FPGAs

    Publication Year: 2017, Page(s):72 - 75
    Cited by:  Papers (1)
    Request permission for commercial reuse | Click to expandAbstract |PDF file iconPDF (655 KB) | HTML iconHTML

    While acceleration of Molecular Dynamics has received much attention, a significant part of that application, the bonded force calculation, has not. We present what we believe to be the first description and analysis of bonded force calculations outside of ASICs. We characterize the computational requirements. We find that a naive direct implementation requires FPGA resources out of proportion wit... View full abstract»

    Full text access may be available. Click article title to sign in or learn about subscription options.