Abstract:
Deep Learning is ubiquitous in a wide field of applications ranging from research to industry. In comparison to timeconsuming iterative training of convolutional neural n...Show MoreMetadata
Abstract:
Deep Learning is ubiquitous in a wide field of applications ranging from research to industry. In comparison to timeconsuming iterative training of convolutional neural networks (CNNs), inference is a relatively lightweight operation making it amenable to execution on mobile devices. Nevertheless, lower latency and higher computation efficiency are crucial to allow for complex models and prolonged battery life. Addressing the aforementioned challenges, we propose FeatherCNN- a fast inference library for ARM CPUs - targeting the performance ceiling of mobile devices. FeatherCNN employs three key techniques: 1) A highly efficient TensorGEMM (generalized matrix multiplication) routine is applied to accelerate Winograd convolution on ARM CPUs, 2) General layer optimization based on custom high performance kernels improves both the computational efficiency and locality of memory access patterns for non-Winograd layers. 3) The framework design emphasizes joint layer-wise optimization using layer fusion to remove redundant calculations and memory movements. Performance evaluation reveals that FeatherCNN significantly outperforms state-ofthe-art libraries. A forward propagation pass of VGG-16 on a 64-core ARM server is 48, 14, and 12 times faster than Caffe using OpenBLAS, Caffe2 using Eigen, and NNPACK, respectively. In addition, FeatherCNN is 3.19 times faster than the recently released TensorFlow Lite library on an iPhone 7 plus. In terms of GEMM performance, FeatherCNN achieves 14.8 and 39.0 percent higher performance than Apple's Accelerate framework on an iPhone 7 plus and Eigen on a Samsung Galaxy S8, respectively. The source code of FeatherCNN library is publicly available at https://github.com/tencent/feathercnn.
Published in: IEEE Transactions on Parallel and Distributed Systems ( Volume: 31, Issue: 3, 01 March 2020)
Funding Agency:

Tencent AI Lab, Shenzhen, China
Haidong Lan received the BS and PhD degrees in computer science from the Shandong University, Jinan, China, in 2013 and 2018 respectively. He is currently an engineer at Tencent AI Platform Department. His research interests include high performance computing, especially in the areas of bioinformatics and deep learning.
Haidong Lan received the BS and PhD degrees in computer science from the Shandong University, Jinan, China, in 2013 and 2018 respectively. He is currently an engineer at Tencent AI Platform Department. His research interests include high performance computing, especially in the areas of bioinformatics and deep learning.View more

Tencent AI Lab, Shenzhen, China
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Jintao Meng received the BS and MS degrees in computer science from the Central China Normal University, Wuhan, in 2005 and 2008 respectively, and the PhD degree in computer architecture from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2016. From 2008 to 2016, he was an engineer with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Since 2017, he has been a s...Show More
Jintao Meng received the BS and MS degrees in computer science from the Central China Normal University, Wuhan, in 2005 and 2008 respectively, and the PhD degree in computer architecture from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2016. From 2008 to 2016, he was an engineer with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Since 2017, he has been a s...View more

Parallel and Distributed Architectures Group, Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
Christian Hundt received the diploma degree in theoretical physics for the analysis of quantization maps on curved manifolds and the PhD degree in computer science for the efficient subsequence alignment of time series on CUDA-enabled accelerators from the University of Mainz, Germany, in 2010 and 2015. In his current position, as a postdoctoral researcher at the Parallel and Distributed Architectures group, he investigat...Show More
Christian Hundt received the diploma degree in theoretical physics for the analysis of quantization maps on curved manifolds and the PhD degree in computer science for the efficient subsequence alignment of time series on CUDA-enabled accelerators from the University of Mainz, Germany, in 2010 and 2015. In his current position, as a postdoctoral researcher at the Parallel and Distributed Architectures group, he investigat...View more

Parallel and Distributed Architectures Group, Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
Bertil Schmidt (M'04-SM'07) is a tenured full professor and a chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that, he was a faculty member at Nanyang Technological University (Singapore) and at the University of New South Wales (UNSW). His research group has designed a variety of algorithms and tools for Bioinformatics (mainly focusing on the analysis of large-scale sequence...Show More
Bertil Schmidt (M'04-SM'07) is a tenured full professor and a chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that, he was a faculty member at Nanyang Technological University (Singapore) and at the University of New South Wales (UNSW). His research group has designed a variety of algorithms and tools for Bioinformatics (mainly focusing on the analysis of large-scale sequence...View more

Tencent AI Lab, Shenzhen, China
Minwen Deng received the BS and MS degrees in computer science from the Sun Yat-Sen university and Institute of Computing Technology, Chinese Academy of Sciences, in 2006 and 2009 respectively. From 2009 to 2010, He has worked as an engineer in Alibaba cloud. Since 2010, He is a senior engineer in Tencent leading the work on AI infrastructure construction.
Minwen Deng received the BS and MS degrees in computer science from the Sun Yat-Sen university and Institute of Computing Technology, Chinese Academy of Sciences, in 2006 and 2009 respectively. From 2009 to 2010, He has worked as an engineer in Alibaba cloud. Since 2010, He is a senior engineer in Tencent leading the work on AI infrastructure construction.View more

Tencent AI Lab, Shenzhen, China
Xiaoning Wang received the BS and MS degrees in computer science from the Shandong University, Jinan, China, in 2015 and 2018 respectively. She now is joined in Tencent AI Platform Department as an Engineer. Her research interests include high performance computing and deep learning.
Xiaoning Wang received the BS and MS degrees in computer science from the Shandong University, Jinan, China, in 2015 and 2018 respectively. She now is joined in Tencent AI Platform Department as an Engineer. Her research interests include high performance computing and deep learning.View more

Shandong University, Jinan, China
Weiguo Liu received the bachelor's and master's degrees from the Xi'an JiaoTong University, China, in 1998 and 2002, and the PhD degree from the Nanyang Technological University (NTU), Singapore, in 2006. He is currently a professor and the director of the High-performance Computing & Big Data Processing Lab at Shandong University and published more than 50 articles. He was nominated as ”Taishan Scholar” in Shandong provi...Show More
Weiguo Liu received the bachelor's and master's degrees from the Xi'an JiaoTong University, China, in 1998 and 2002, and the PhD degree from the Nanyang Technological University (NTU), Singapore, in 2006. He is currently a professor and the director of the High-performance Computing & Big Data Processing Lab at Shandong University and published more than 50 articles. He was nominated as ”Taishan Scholar” in Shandong provi...View more

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yu Qiao received the bachelor's eng. in automation and master eng. degrees in control theory and engineering from Chongqing University, in 2000 and 2003, respectively, and the PhD degree from the University of Electro-Communications, Tokyo, Japan. He is the director of the Institute of Advanced Computing and Digital Engineering in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. After that, He had ...Show More
Yu Qiao received the bachelor's eng. in automation and master eng. degrees in control theory and engineering from Chongqing University, in 2000 and 2003, respectively, and the PhD degree from the University of Electro-Communications, Tokyo, Japan. He is the director of the Institute of Advanced Computing and Digital Engineering in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. After that, He had ...View more

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Sheng-Zhong Feng received the bachelor's degree from the University of Science and Technology of China, in 1991, and the PhD degree from the Beijing Institute of Technology, in 1997. He is the director of National Supercomputing Center in Shenzhen, and special director assistant of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Before joining SIAT in 2009, he has worked as an associate professor ...Show More
Sheng-Zhong Feng received the bachelor's degree from the University of Science and Technology of China, in 1991, and the PhD degree from the Beijing Institute of Technology, in 1997. He is the director of National Supercomputing Center in Shenzhen, and special director assistant of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Before joining SIAT in 2009, he has worked as an associate professor ...View more

Tencent AI Lab, Shenzhen, China
Haidong Lan received the BS and PhD degrees in computer science from the Shandong University, Jinan, China, in 2013 and 2018 respectively. He is currently an engineer at Tencent AI Platform Department. His research interests include high performance computing, especially in the areas of bioinformatics and deep learning.
Haidong Lan received the BS and PhD degrees in computer science from the Shandong University, Jinan, China, in 2013 and 2018 respectively. He is currently an engineer at Tencent AI Platform Department. His research interests include high performance computing, especially in the areas of bioinformatics and deep learning.View more

Tencent AI Lab, Shenzhen, China
Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Jintao Meng received the BS and MS degrees in computer science from the Central China Normal University, Wuhan, in 2005 and 2008 respectively, and the PhD degree in computer architecture from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2016. From 2008 to 2016, he was an engineer with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Since 2017, he has been a senior engineer with Tencent. He is the author of SWAP-Assembler, and published 20 articles, and 18 inventions. His research interests include high performance computing, bioinformatics, and graph computing.
Jintao Meng received the BS and MS degrees in computer science from the Central China Normal University, Wuhan, in 2005 and 2008 respectively, and the PhD degree in computer architecture from the Institute of Computing Technology, Chinese Academy of Sciences, Beijing, in 2016. From 2008 to 2016, he was an engineer with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Since 2017, he has been a senior engineer with Tencent. He is the author of SWAP-Assembler, and published 20 articles, and 18 inventions. His research interests include high performance computing, bioinformatics, and graph computing.View more

Parallel and Distributed Architectures Group, Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
Christian Hundt received the diploma degree in theoretical physics for the analysis of quantization maps on curved manifolds and the PhD degree in computer science for the efficient subsequence alignment of time series on CUDA-enabled accelerators from the University of Mainz, Germany, in 2010 and 2015. In his current position, as a postdoctoral researcher at the Parallel and Distributed Architectures group, he investigates the design and parallelization of algorithms in the field of bioinformatics.
Christian Hundt received the diploma degree in theoretical physics for the analysis of quantization maps on curved manifolds and the PhD degree in computer science for the efficient subsequence alignment of time series on CUDA-enabled accelerators from the University of Mainz, Germany, in 2010 and 2015. In his current position, as a postdoctoral researcher at the Parallel and Distributed Architectures group, he investigates the design and parallelization of algorithms in the field of bioinformatics.View more

Parallel and Distributed Architectures Group, Institute of Computer Science, Johannes Gutenberg University Mainz, Mainz, Germany
Bertil Schmidt (M'04-SM'07) is a tenured full professor and a chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that, he was a faculty member at Nanyang Technological University (Singapore) and at the University of New South Wales (UNSW). His research group has designed a variety of algorithms and tools for Bioinformatics (mainly focusing on the analysis of large-scale sequence and short read datasets) and Data Mining. For his research work, he has received a CUDA Research Center award, a CUDA Academic Partnership award, a CUDA Professor Partnership Award, and the Best Paper Award at IEEE ASAP 2009 and IEEE ASAP 2015. He is a senior member of the IEEE.
Bertil Schmidt (M'04-SM'07) is a tenured full professor and a chair for Parallel and Distributed Architectures at the University of Mainz, Germany. Prior to that, he was a faculty member at Nanyang Technological University (Singapore) and at the University of New South Wales (UNSW). His research group has designed a variety of algorithms and tools for Bioinformatics (mainly focusing on the analysis of large-scale sequence and short read datasets) and Data Mining. For his research work, he has received a CUDA Research Center award, a CUDA Academic Partnership award, a CUDA Professor Partnership Award, and the Best Paper Award at IEEE ASAP 2009 and IEEE ASAP 2015. He is a senior member of the IEEE.View more

Tencent AI Lab, Shenzhen, China
Minwen Deng received the BS and MS degrees in computer science from the Sun Yat-Sen university and Institute of Computing Technology, Chinese Academy of Sciences, in 2006 and 2009 respectively. From 2009 to 2010, He has worked as an engineer in Alibaba cloud. Since 2010, He is a senior engineer in Tencent leading the work on AI infrastructure construction.
Minwen Deng received the BS and MS degrees in computer science from the Sun Yat-Sen university and Institute of Computing Technology, Chinese Academy of Sciences, in 2006 and 2009 respectively. From 2009 to 2010, He has worked as an engineer in Alibaba cloud. Since 2010, He is a senior engineer in Tencent leading the work on AI infrastructure construction.View more

Tencent AI Lab, Shenzhen, China
Xiaoning Wang received the BS and MS degrees in computer science from the Shandong University, Jinan, China, in 2015 and 2018 respectively. She now is joined in Tencent AI Platform Department as an Engineer. Her research interests include high performance computing and deep learning.
Xiaoning Wang received the BS and MS degrees in computer science from the Shandong University, Jinan, China, in 2015 and 2018 respectively. She now is joined in Tencent AI Platform Department as an Engineer. Her research interests include high performance computing and deep learning.View more

Shandong University, Jinan, China
Weiguo Liu received the bachelor's and master's degrees from the Xi'an JiaoTong University, China, in 1998 and 2002, and the PhD degree from the Nanyang Technological University (NTU), Singapore, in 2006. He is currently a professor and the director of the High-performance Computing & Big Data Processing Lab at Shandong University and published more than 50 articles. He was nominated as ”Taishan Scholar” in Shandong province, and received numerous awards (e.g. ACM Gordon Bell Prize award in SC 2017, Fraunhofer IGD Best Paper award, and CCF HPC China Best Paper award). He is also the committee of the high performance computing of China Computer Federation. His research interests include high-performance computing, bioinformatics, and data mining. His research group has designed tools and algorithms for applications in data processing and computational science using parallel computing technologies such as CUDA-enabled GPUs, CPU/GPU/Xeon Phi clusters, and supercomputers.
Weiguo Liu received the bachelor's and master's degrees from the Xi'an JiaoTong University, China, in 1998 and 2002, and the PhD degree from the Nanyang Technological University (NTU), Singapore, in 2006. He is currently a professor and the director of the High-performance Computing & Big Data Processing Lab at Shandong University and published more than 50 articles. He was nominated as ”Taishan Scholar” in Shandong province, and received numerous awards (e.g. ACM Gordon Bell Prize award in SC 2017, Fraunhofer IGD Best Paper award, and CCF HPC China Best Paper award). He is also the committee of the high performance computing of China Computer Federation. His research interests include high-performance computing, bioinformatics, and data mining. His research group has designed tools and algorithms for applications in data processing and computational science using parallel computing technologies such as CUDA-enabled GPUs, CPU/GPU/Xeon Phi clusters, and supercomputers.View more

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Yu Qiao received the bachelor's eng. in automation and master eng. degrees in control theory and engineering from Chongqing University, in 2000 and 2003, respectively, and the PhD degree from the University of Electro-Communications, Tokyo, Japan. He is the director of the Institute of Advanced Computing and Digital Engineering in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. After that, He had been a project assistant professor in the Graduate School of Information Science and Technology, The University of Tokyo, Japan. From 2010, Prof. Qiao is a professor with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, where I co-direct the Multimedia Research Center. He was a scholar supported by the “Hundred Talents Program” of the Chinese Academy of Sciences. He received the “Lu Jiaxi Young Researcher Award” from the Chinese Academy of Sciences in 2012. His research interests include Computer Vision, Deep Learning, Pattern Recognition, Speech Processing, Robotics etc. Recently, He focuses on develop novel deep learning techniques for action recognition, scene understanding, facial analysis and recognition, object detection with applications in surveillance, robotics and human-machine interface. His ultimate goal is to enable machine to perceive and understand complex visual scenes like human.
Yu Qiao received the bachelor's eng. in automation and master eng. degrees in control theory and engineering from Chongqing University, in 2000 and 2003, respectively, and the PhD degree from the University of Electro-Communications, Tokyo, Japan. He is the director of the Institute of Advanced Computing and Digital Engineering in Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. After that, He had been a project assistant professor in the Graduate School of Information Science and Technology, The University of Tokyo, Japan. From 2010, Prof. Qiao is a professor with the Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, where I co-direct the Multimedia Research Center. He was a scholar supported by the “Hundred Talents Program” of the Chinese Academy of Sciences. He received the “Lu Jiaxi Young Researcher Award” from the Chinese Academy of Sciences in 2012. His research interests include Computer Vision, Deep Learning, Pattern Recognition, Speech Processing, Robotics etc. Recently, He focuses on develop novel deep learning techniques for action recognition, scene understanding, facial analysis and recognition, object detection with applications in surveillance, robotics and human-machine interface. His ultimate goal is to enable machine to perceive and understand complex visual scenes like human.View more

Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences, Shenzhen, China
Sheng-Zhong Feng received the bachelor's degree from the University of Science and Technology of China, in 1991, and the PhD degree from the Beijing Institute of Technology, in 1997. He is the director of National Supercomputing Center in Shenzhen, and special director assistant of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Before joining SIAT in 2009, he has worked as an associate professor with the Institute of Computing Technology in Chinese Academy of Sciences and as visiting professor with the University of Toronto. His research interest includes high performance computing, cloud computing and bioinformatics, etc. He is a member of the general expert group on the National Key Research and Development Program of HPC, committee of the high performance computing of China Computer Federation. He, as the technology leader, is responsible for the establishment of the National Supercomputing Center in Shenzhen. He also participated in the development of Dawning series supercomputer and as the principle investigator of many national level projects, such as 863 program, NSFC projects, knowledge innovation project of the Chinese Academy of Sciences, etc. He has received many awards, such as Hundred-Talent Program from Chinese Academy of Sciences, Outstanding Science and Technology Progress Award from Chinese Academy of Sciences, second prize of the National Science and Technology Progress Award and so on. He has published more than 60 research papers which are indexed by SCI/EI, and applied more than 20 patents in resent 5 years.
Sheng-Zhong Feng received the bachelor's degree from the University of Science and Technology of China, in 1991, and the PhD degree from the Beijing Institute of Technology, in 1997. He is the director of National Supercomputing Center in Shenzhen, and special director assistant of Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences. Before joining SIAT in 2009, he has worked as an associate professor with the Institute of Computing Technology in Chinese Academy of Sciences and as visiting professor with the University of Toronto. His research interest includes high performance computing, cloud computing and bioinformatics, etc. He is a member of the general expert group on the National Key Research and Development Program of HPC, committee of the high performance computing of China Computer Federation. He, as the technology leader, is responsible for the establishment of the National Supercomputing Center in Shenzhen. He also participated in the development of Dawning series supercomputer and as the principle investigator of many national level projects, such as 863 program, NSFC projects, knowledge innovation project of the Chinese Academy of Sciences, etc. He has received many awards, such as Hundred-Talent Program from Chinese Academy of Sciences, Outstanding Science and Technology Progress Award from Chinese Academy of Sciences, second prize of the National Science and Technology Progress Award and so on. He has published more than 60 research papers which are indexed by SCI/EI, and applied more than 20 patents in resent 5 years.View more