Abstract:
A significant trend in machine learning is sparsifying the training of neural networks to reduce the amount of computation required. Algorithms like Sub-LInear Deep learn...Show MoreMetadata
Abstract:
A significant trend in machine learning is sparsifying the training of neural networks to reduce the amount of computation required. Algorithms like Sub-LInear Deep learning Engine (SLIDE) [2] use locality-sensitive hashing (LSH) to create sparsity. These sparse training algorithms were originally developed on multi-threaded multicore CPUs. However, they are not well-studied and optimized for accelerator platforms such as GPUs and reconfigurable dataflow architectures (RDAs). In this paper, we study the different variants of the SLIDE algorithm and investigate accuracy-performance tradeoffs on CPU, GPU, and RDAs. The implementation targeting RDA outperforms the GPU by 7.5×. The performance on a limited-memory RDA is improved further by proposing a smart caching algorithm, which is 2 × faster than the baseline RDA. Furthermore, we are able to achieve another 2 × performance by putting all of the weights on-chip using an RDA with enough memory. We believe our work will pave the road for the future development of both algorithm and hardware architecture for sparse training.
Published in: 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Date of Conference: 30 May 2022 - 03 June 2022
Date Added to IEEE Xplore: 01 August 2022
ISBN Information:
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Accelerator Architecture ,
- Neural Network ,
- Deep Learning ,
- Graphics Processing Unit ,
- Training Algorithm ,
- Amount Of Computation ,
- Hardware Architecture ,
- Locality Sensitive Hashing ,
- Neuronal Activity ,
- Network Layer ,
- Multilayer Perceptron ,
- Calculation Error ,
- Hash Function ,
- Green Curve ,
- End Of Training ,
- Forward Propagation ,
- Big Gap ,
- Hardware Configuration ,
- Backward Propagation ,
- Sparsity Pattern ,
- Small Kernel ,
- Large Kernel ,
- Previous Epoch ,
- Hardware Performance ,
- Domain-specific Languages ,
- Entire Weight ,
- Yellow Curve
Keywords assist with retrieval of results and provide a means to discovering other relevant content. Learn more.
- IEEE Keywords
- Index Terms
- Accelerator Architecture ,
- Neural Network ,
- Deep Learning ,
- Graphics Processing Unit ,
- Training Algorithm ,
- Amount Of Computation ,
- Hardware Architecture ,
- Locality Sensitive Hashing ,
- Neuronal Activity ,
- Network Layer ,
- Multilayer Perceptron ,
- Calculation Error ,
- Hash Function ,
- Green Curve ,
- End Of Training ,
- Forward Propagation ,
- Big Gap ,
- Hardware Configuration ,
- Backward Propagation ,
- Sparsity Pattern ,
- Small Kernel ,
- Large Kernel ,
- Previous Epoch ,
- Hardware Performance ,
- Domain-specific Languages ,
- Entire Weight ,
- Yellow Curve