MobileNetV3 for Image Classification | IEEE Conference Publication | IEEE Xplore

MobileNetV3 for Image Classification


Abstract:

Convolution neural network (CNN) is a kind of deep neural networks, which extracts image features through multiple convolution layers and is widely used in image classifi...Show More
Notes: As originally published text, pages or figures in the document were missing or not clearly visible. A corrected replacement file was provided by the authors.

Abstract:

Convolution neural network (CNN) is a kind of deep neural networks, which extracts image features through multiple convolution layers and is widely used in image classifications. With the increasing number of image data processed by mobile devices, application of neural network for mobile terminals becomes popular. However, these networks need massive computation and advanced hardware support, making them difficult to adapt to mobile devices. This paper demonstrates that MobileNetV3 can get a superior balance between efficiency and accuracy for real-life image classification tasks on mobile terminals. In our experiments, classification performances are compared among MobileNetV3 and several other commonly used pre-trained CNN models on various image datasets. The chosen datasets are all good representatives of the application scenarios of mobile devices. The result shows that as a lightweight neural network, MobileNetV3 achieved good accuracy performance in an effective manner compared to other large networks. Furthermore, ROC confirmed the advantages of MobileNetV3 over other experimented models. Some conjectures are also brought out about the characteristics of image datasets that are suitable for MobileNetV3.
Notes: As originally published text, pages or figures in the document were missing or not clearly visible. A corrected replacement file was provided by the authors.
Date of Conference: 26-28 March 2021
Date Added to IEEE Xplore: 02 April 2021
ISBN Information:
Conference Location: Nanchang, China

I. Introduction

Convolutional Neural Network (CNN) is recently given great attention because of its extended applications in image classification [1], segmentation, and other computer vision problems. CNN usually consists of 2 parts: features extracting part – made of convolutional layers and pooling layers, and classifying part – which contains lots of stacked fully connected layers. In the first part, kernels in convolutional layers scan the input image step by step, multiplying the weights in each kernel by the pixels’ values and combining the sum to create a new image passed to the next layer. Pooling layers play a role in down-sampling to reduce the number of data and save computational resources. In the second part, the image first passes through a flatten layer to be converted to a one-dimensional array. The following fully connected layers use this array as input and produce the predicted label by applying the linear combination and the non-linear activation function.

Contact IEEE to Subscribe

References

References is not available for this document.