Generative Data Augmentation for Automatic Meter Reading Using CNNs

While smart meters are still not widely installed in many countries, automatic reading of traditional-type meters is useful from the perspective of both cost and safety. Although convolutional neural network (CNN) showed a high potential for automatic meter reading under unconstrained environment, it is facing various challenges. One is the difficulty of collecting a sufficient amount of training dataset since some digits of a meter may take a long time to update. Another challenging issue is how to recognize the transitional state between two consecutive numbers. To solve these problems, we propose a new data augmentation technique that can automatically generate annotated images of numbers, including the transitional states. By taking advantage of the state-of-the-art generative neural network model, the generated numbers resemble the local appearance of those in the original meter images. Evaluation experiments confirm that our proposed generative data augmentation techniques improve the robustness of the recognition model and achieve outstanding results when compared to the previous work.


I. INTRODUCTION
Despite of the huge advantages, smart meters are still not widely installed in many countries, especially in the developing one [1]- [3]. Image-based Automatic Meter Reading (AMR), which uses computer vision technology to automatically read the traditional type meters from their captured images, is useful for reducing the labors and errors caused by manual readings [3]. AMR can be used for retrofitting traditional type meters with image processing and communication add on, to avoid replacing the already installed meters [1], [2] . It can also be adapted to other scenarios such as using with the inspection robot in a plant that has dangerous locations [4].
AMR has been studied by many researchers in the past decade. Early works use handcrafted features and need specific algorithm design and parameter adjustment for a particular kind of image degradation for recognizing meter images captured under unconstrained environment [5]- [11]. Recently, researchers started exploring the potential of deep learning-based approaches to AMR [12]- [14]. However, deep learning needs a large quantity of training data to generalize the model and yield a high classification accuracy for unseen data, while the meter datasets are usually not publicly available because the images belong to service companies [15]. Therefore, in real applications, it is usually necessary to collect and annotate sufficient training data for the meters installed in a particular environment, which is expensive in terms of both human effort and time. Especially in the case of rotating meters, as shown in Fig. 1(a), it may take a long time, such as several years, to update the significant digits, hence making it impossible to collect sufficient training data for those digits in a practically acceptable period. Laroca et al. [15] proposed a data augmentation technique for increasing the variation of numbers at all digits using character permutation, that is, by swapping among digits. However, because different digits of a meter may have different appearances depending on the lighting and shooting conditions for capturing the meter images, such a method may fail to generate training samples that realistically resemble the real images. Fig. 1(b) shows the examples generated with Laroca et al.'s method from the meter image in Fig. 1(a). These generated numbers do not have the appearance of the original number "0". Moreover, due to the limited number of digits on each meter, the variation in the generated number for the significant digits is also limited. Another challenging issue with the automatic reading of a rotating meter is the recognition of a state between two consecutive numbers. As the same state can look different depending on the camera's shooting angle, it is necessary to collect training data for a particular installation to achieve a high recognition accuracy. However, the manual labeling of continuing varying states between every two consecutive numbers can be highly expensive.
To challenge the above-mentioned issues, this paper proposes a novel data augmentation technique. We employ a generative deep neural network called Y-Autoencoder (Y-AE) [16] to generate the images of new numbers, along with transferring the local appearance of a meter. As shown in Fig.  1(c), the proposed method can generate a number from "1"-"9" for the first digit, resembling the appearance of the original meter image ( Fig. 1(a)), from one single image ( Fig.  1(a)) of number "0". To challenge the second issue, a novel method for automatically generating the annotated states between two consecutive numbers is also presented. Some annotated transitional states automatically generated with the proposed method for the first (most significant) digit are shown in Fig. 1(d). Fig. 1(e) shows some final recognition results by our automatic meter reading framework trained with the dataset including the augmented images generated by the proposed data augmentation technique. The major contributions of this paper can be summarized as follows: 1) A novel data augmentation method called appearance preserving number generator for automatically generating annotated images of numbers resembling the local appearance of the target digits using a generative neural network. 2) A novel data augmentation method called rotating digit generator for automatically generating annotated images of transitional states between two consecutive numbers. 3) A new algorithm to create a training dataset consisting of a well-balanced amount of all numbers for each digit that can be adapted to any rotating meter image only by collecting a few images from the assumed environment. 4) Evaluation experiments using an originally designed deep learning-based AMR framework to demonstrate the effectiveness of the proposed generative data augmentation techniques by comparing with previous works. The remainder of the paper is organized as follows: Section II reviews the related works. The generative augmentation technique is introduced in Section III. The automatic meter reading framework is presented in Section IV. The evaluation experiments and the results are described in Section V. Finally, Section VI offers our conclusions and future research directions.

II. RELATED WORKS
Pioneering works on AMR mainly rely on traditional image processing techniques combined with handcrafted features. Recently, methods that leverage the advancements of convolutional neural networks (CNNs) have been developed. In this section, we review the previous works considering three aspects related to ours: automatic meter reading, meter image data augmentation, and generative neural networks.

A. AUTOMATIC METER READING (AMR)
With early AMR implementation, many traditional image processing techniques such as color-based [5], [6] and the projection-based methods [7], [8], have been employed for locating meter area and segmentation. Some works proposed for pointer-based meter used the region growing method [9], [10] and matrix border grayscale detection technique [11].
However, these methods can be easily affected by various factors such as blur, noise, distortions, meter font, light/shadow.
More recently, researchers started exploring the potential of deep learning-based approaches to AMR, with their impressive results on real-world scenarios and advantage to perform digit recognition in a segmentation-free way (the trained model can predict all digits simultaneously).
For counter detection, object detectors were trained to detect the counter region in meter images. S. Ren et al. conducted the experiments to compare Faster R-CNN [17] with other models, such as YOLO [18], RetinaNet [19], Fast R-CNN and SSD in [20]. Even though Faster R-CNN achieved the best accuracy in detecting counters against other models, its computational complexity is higher than others. Therefore, most of recent studies adopted YOLO family models for their outstanding speed and being light weighted [3], [15], [18], [21]. Our AMR framework uses YOLOv5 [22], the newest YOLO model for the counter detection.
For recognition stage, early works used a small network (3-layer artificial neural network) [12]- [14]. Then Tesseract [23], an open source OCR engine, have been widely used [6], [24], [25], though the recognition results are not quite satisfied. On the other hand, BLSTM [26], [27] and FCSRN [28] employed CNN architecture for sequence recognitions and achieved impressive recognition rates. However, their methods were evaluated with manually cropped counter images and hence its effectiveness for the real-world scenario is unclear.
Laroca et al. [15] designed a two-stage approach AMR that employs YOLOv4 object detector for fast counter detection. For number recognition, three different CNNbased approaches, CR-NET, Multi-Task Learning, and Convolutional Recurrent Neural Network (CRNN), were evaluated with their public dataset consisting of 2000 meter images. Recently, they have upgraded their work by introducing bigger meter dataset with 12,500 fully-annotated images [3]. Their research also introduced a preprocessing step in the AMR pipeline which rectified the corner of meter to improve the reading. Although their AMR framework trained with the large dataset captured under various environments achieved plausible recognition rate, the dataset is released to academic use only and it is usually difficult to collect dataset with well-balanced amount of images for all numbers in real applications since it may take very long time to update some digits of the meters. To solve such problem, we propose a novel technique to automatically generate any desired amount of high-quality training data just from a very small number of images collected in the assumed capturing environment. The proposed data augmentation method makes it possible to apply the state-of-the-art deep learningbased AMR technologies to real applications with ease.
One challenging issue of deep learning-based approaches for meter recognition is how to improve the accuracy when reading the transitional state between two consecutive numbers. Several experiments [5], [15], [26] reported that errors mainly occur with such situations. Although, some researchers tried to address this problem by considering the transitional states as separate classes [28], no particular solution has been given to this problem since it is difficult to collect and annotate such images.
We solve the transitional state digit problem by proposing a novel technique that can generate annotated transitional state images in a fully automatic way. To validate the effectiveness of the proposed data augmentation techniques, we also implemented a deep learning-based AMR framework by making an extension to YOLOv5 [22], which is the most recently version of YOLO detection model, for counter detection and using CR-NET [29], which achieved an impressive recognition rate in [15], for the recognition stage.

B. CHARACTER AUGMENTATION
In the image-based AMR literature, even though impressive accuracy was achieved in several CNN approaches, many models were trained with private datasets [6], [26], [30]. A few meter datasets are available to the public, but either limited to academic purpose [3] or with a small size [25]. In a real application, it is usually necessary to collect and annotate sufficient training data for the meters installed in a particular environment, which is expensive in terms of both human effort and time.
Data augmentation is the technology of deriving new data from existing data. Many traditional data augmentation approaches, such as translation, rotation, and zoom in/out, have been implemented for AMR [31]. These methods, however, cannot increase the variety of characters or numbers, which is required to address the problem of the lack of a sufficient variation in numbers for particular digits, such as the most significant digit of rotating meters. To the best of our knowledge, such a problem was only addressed in [15]. They proposed a technique called character permutation, which was originally proposed for license plate augmentation [32], to obtain a balanced amount for all letters. The method controls the frequency of each character by replacing a letter with a high probability of occurrence with a letter with a low probability of occurrence in the same meter images. Although promising results have been reported in [15], this method cannot be applied to meter images in which the appearance varies across different digits, as with that shown in Fig. 1(a). Moreover, due to the limited number of digits on each meter, the variation in the generated number is also limited.
Beside the meter recognition, data augmentation has been studied for other digit recognition scenarios. V. Kukreja et al. [33] proposed the license plate augmentation method to solve the noise issue by applying Generative Adversarial network (GAN) to create high-resolution images from a lowresolution image. However, their method did not address the limited character and number variation problem. Another license plate recognition research also relies on the permutation method [34]. The authors proposed to eliminate the bias of permutation method by creating synthetic license plate images to train the recognition network in the fully connected layers. However, the details about the way to create synthesis image as well as the image amount and training process were not given.
In this paper, we propose a novel augmentation method that automatically generates annotated images of numbers resembling the local appearance of the target digits. By taking advantage of the generative deep neural network, we are able to produce a larger variation in numbers and control the frequency of numbers at each digit in the training data in a fully automatic way.

C. GENERATIVE NETWORK
For generating new numbers with the appearance of a given digit, we need a generic interpretable representation and a bidirectional network to reconstruct images of these new numbers. Over the last decade, the most outstanding generative models are the Variational Autoencoder (VAE) [35] and the GAN [36]. However, both GAN and VAE originally perform unconditional generation which is not suitable to our approach. Therefore, for better control of the generated results, other extension versions were reviewed.
Kingma et al. [37] introduced CVAE, an extension of their VAE, by constraining a condition to control the output . In addition, [38] introduced a framework using partiallyspecified graphical model structures and semi-supervised learning in the domain of VAEs to perform a conditional generation. However, CVAE performs poorly for the style transfer of number images compared with other methods in [16].
Similarly, the extension version of GAN, called InfoGAN, was proposed in [39]. InfoGAN can learn interpretable and disentangled representations and gain control over the content and style of the generated images. CVAE-GAN [40], which combines VAE with GAN to build a conditional generative model, can take the fine-grained category label as the input and generate images in a specific category. However, both VAEs and GANs have difficulty in exploiting a prior structure and are difficult to train.
Patacchiola et al. [16], succeeded in achieving impressive results with a simple architecture called Y-AE. Without changing the structure of conditional Autoencoder (cAE), the authors presented a new training procedure that allows the disentanglement of representation information in the latent space without using variational methods or adversarial losses. In their experiments, they compared Y-AE with other generative networks (cAE, cVAE, adversarial-AE [41] and beta-VAE [42]), in many aspects to verify the effectiveness and show the possibility of use in a large variety of domains with minimal adjustments. Considering their simplicity of training and practicability for our problem, we choose Y-AE for generating new numbers, along with transferring the local appearance of target digit.

III. GENERATIVE AUGMENTATION
In this section, we describe the proposed generative augmentation that consists of the three main stages, (i) appearance preserving number generator, (ii) rotating digit generator, and (iii) counter image composition, step-by-step in detail. Lastly, we explain the balanced dataset generating algorithm.

A. APPEARANCE PRESERVING NUMBER GENERATOR
To solve the insufficient training data problem, we need to generate annotated images of all possible numbers (from "0" to "9") for a digit without altering the local appearance of the digit. For this task, we employ Y-AE [16] to transfer the local appearance of a digit to an arbitrary number. Although other generative models, such as GAN [36], are also gaining attention for style transferring, Y-AE can adapt to a large variety of image domains with minimal adjustments.
The Y-AE architecture is based on cAEs that contain an encoder and a decoder. Fig. 2 depicts the training procedure of Y-AE. The encoding phase of a Y-AE is identical to a standard cAE, which is called the input branch in this paper, but the reconstruction is unique, consisting of two branches called the sample branch and target branch, respectively.
In the input branch, the input image is encoded by applying two types of activation functions. Considering that content information can be represented as discrete latent units and style information needs continuous presentation, the SoftMax function is used to define the content and the Sigmoid function is used for style. Then, the output of the encoder is split into two paths by giving the style information as the input to the two branches, whereas the sample content information (cS) is used to calculate the loss (L1) with the sample label (lS) using cross-entropy loss, by calculating a separate loss for each class per observation (j) and summing the results to identify the generated content from the input image: The sample branch does the reconstruction by taking the style information produced by the input branch together with the sample label to generate the sample label image before re-encoding it. The sample label image (sS) is used to calculate the standard least-squared error reconstruction loss (L2) with the input sample image (sI) to ensure appropriate reconstructions: (2) On the other hand, the target branch takes the same style information and the target label as the input. At the training stage, a random number generated with a uniform random function is used as an input of the decoder to obtain the random label image for re-encoding. Then, cross-entropy loss ( ' ) is calculated to verify the target branch reconstruction content ( ( ) with the label ( ( ): After finishing the above-mentioned two branch reconstruction phases, a loss based on the Euclidean distance is computed to confirm the style information has not been changed, remaining consistent in the two branches. That is, the final loss ( ) ) of the style information is computed from the sample branch ( " ) and the target branch ( ( ) as follows: The losses defined above are then integrated into the global loss function ( * ), where the content loss ( ' ) and style loss ( ) ) can be controlled by altering explicit weight ( + ) and implicit weight ( , ) respectively.
It was shown in [16] that the structure and training procedure of Y-AE could successfully separate the content and style in the latent space for difference tasks.
After training, the trained model can be used to generate the image for any number while keeping the local appearance of a particular input image. Fig. 3 shows some of the generated images in our experiment.

B. ROTATING DIGIT GENERATOR
By taking advantage of digit style transfer, we are able to generate images of all numbers for each digit of the meter image. To generate the annotated images simulating a rotating meter, we first generate a master image by arranging the images of numbers "0" to "9" vertically, as shown in Fig.  4(a).
To stitch the images of consecutive numbers seamlessly, the biharmonic function-based image inpainting technique [43], which aims to smooth images in a specific area by using information from the surrounding mask, was adopted to blend borders between numbers as shown in Fig. 5. Fig. 4(b) shows an example of a master image after applying the image inpainting technique to Fig. 4(a).  Once a master image has been produced, we apply our rotating digit generator to generate images mimicking a real rotating meter. First, we generate a random floating number ( ∈ [0.0, 1.0]). Then is used to define the position, denoted as -./0. , for cropping the master image, as shown in Fig. 4(c), to obtain an image mimicking a particular state of a rotating digit. Then, the label for the cropped image is set as: = ( × 10) Note that we make the master image satisfying periodical condition, that is, if the end position of the cropping position exceeds the bottom of the master image, then the exceeded part will be taken from the top of the master image as shown in Fig. 4(d). Such periodic operation also mimics a real rotating meter digit. The detailed procedure is summarized in Algorithm 1. This algorithm can be adapted for generating digit with specific labels simply by limiting the range of random number.

C. COUNTER IMAGE COMPOSITION
The final step of generative augmentation is to set the generated images of each digit to the counter images. We resize the cropped digit image to that of the original counter image. Then, we apply seamless cloning [44], which is a technique for copying an image region from a foreground image onto a background image naturally without visible seams. In this way, we are able to create a new training dataset of sufficient and well-balanced annotated meter images from few sample images. Some generated meter images are shown in Fig. 6.

D. GENERATING A BALANCED DATASET
We also present an algorithm to create a training dataset consisting of a well-balanced amount of all numbers for each digit, using the proposed generative augmentation. The basic idea is to use weighted random selection to select a number from the 10 numbers "0"-"9" based on a given probability distribution. We define the relative frequency , 1 of a number ( = 0, 1, … , 9) at a digit as With the images of all digits obtained, we composite them into a new counter image and update the current relative frequency , 1 ( = 0, 1, … , 9) for all digits. To achieve balanced amounts among all numbers, the target frequency distribution , 1 ( = 0, 1, … , 9) should be box shaped for all digits. The generation process is repeated until the desired number of counter images has been generated. The detailed procedure is summarized in Algorithm 2.  In the algorithm, get_relative_frequency is the function for computing the relative frequency ( , 1 ) of numbers "0"-"9" in all digits from the input data distributions, while calculate_probability is a function for calculating the distribution of probabilities ( , 1 ) of numbers at digit k for weighted random selection. Then, we use the number probabilities to select number ( ) randomly from "0"-"9" following their probability using weighted_random_choice function.
The rotating_digit_generator and counter_image_composition are functions of generative augmentation for generating rotating digits and compositing the digit images into a counter image, respectively, to get a new counter image together with their labels. Finally, after we add a generated counter image to the dataset, we update the current frequency distribution for all digits using update_frequency function. Fig. 7 depicts the framework of our end-to-end automatic meter reading. We adopt a two-stage approach: the counter detection stage takes a whole meter image as the input and locates the counter region and the digit recognition stage performs digit segmentation and recognition simultaneously on the cropped counter image. CNN-based approaches are employed for both stages. For counter detection, we employed the YOLOv5 [22] model, a state-of-the-art deep learning based object detection model that is suitable for detecting objects occupying a small portion on an image. To read values from the meter image, we made an extension to CR-NET [29], a recognition model based on YOLO object detectors [45]. The details of each part of the framework are given in the remainder of this section.

A. COUNTER DETECTION
YOLOv5 [22] was released in June 2020 with a significant improvement made over its predecessors in both accuracy and speed. Moreover, the size of the weights trained from YOLOv5 network is small, which is nearly 90% smaller than YOLOv4 [22], and hence is suitable for deployment to the embedded devices to implement real-time detection [46]. YOLOv5 structure consists of three components: backbone network, neck network and detect network. By performing detection at three detection layers for detecting the objects of different sizes, YOLO5 has yielded a remarkable performance on small object detection. On the other hand, to detect only one class, Laroca et al. [15] stated that very deep models are not necessary. Therefore, we decided to use a smaller model, YOLOv5s (YOLOv5-small), at the detection stage.

B. DIGIT RECOGNITION
We employ CR-NET, proposed by S. M. Silva et al. [29], for the recognition stage. CR-NET is a YOLO-based model proposed for license plate character detection and recognition, and it achieves remarkable results, not only in license plate recognition [29], [32], but also in AMR recognition [15]. Laroca et al. [15] compared it with two other CNN-based approaches and showed it could achieve a promising recognition result. The network architecture is shown in Table 1, and we have adjusted the number of filters to 75 at the last convolutional layer of the CR-NET architecture, as we want to predict only 10 classes (numbers "0"-"9").
We apply a non-maximal suppression (NMS) algorithm to eliminate some digits that might be detected more than once by the network. Furthermore, we consider only the digits recognized with the highest confidence on each overlap. Some cases show that over five characters can be recognized.

V. EXPERIMENT
In this section, we describe the experiments for evaluating the effectiveness of the proposed generative augmentation technique together with the performance of the automatic meter-reading framework. First, we investigate the performance of counter detection, as the regions used in the following stages are extracted from the detection results. Then, we present the results of digit recognition based on the proposed generative data augmentation techniques.

1) DATASET
As mentioned in the previous sections, a few meter datasets are available to the public. The relative large dataset from [28], which contains 6,000 water meter images with cropped counters, is not suitable for our framework, as it requires the whole meter image to be processed beforehand. Therefore, we decided to use the UFPR-AMR dataset [15], which consists of 2,000 fully annotated images of various meters. The images were captured in a warehouse of a service company with different conditions with a resolution between 2,340 × 4,160 and 3,120 × 4,160 pixels, and they contain a well-defined evaluation protocol to assist in the development and evaluation of AMR methods: 800 images for training, 400 images for validation, and 800 images for testing.
To evaluate the performance of our new augmentation technique in dealing with the transitional states of rotating digits, we manually identified the images with transitional states of rotating digits in the testing image set. As shown in Table 2, the images with transitional states of rotating digits account for a big proportion.

2) EXPERIMENTAL SETUP
We performed our experiments on a CPU with a 12-Core AMD Ryzen Threadripper 1920x 3.5GHz processor, 64GB of RAM, and an NVIDIA Titan RTX GPU. Both the detection model and recognition model were trained using the Darknet framework [47]. The parameters for training the models for counter detection, digit recognition, and generative augmentation are shown in Table 3.

3) EVALUATION METRICS
For evaluating counter detection, we measure the accuracy by computing the intersection over union ( ) between the detected bounding box and the ground truth of the counter area. This approach is often used for evaluation in object detection challenges, such as the PASCAL VOC challenge [49] and MS COCO challenge [50]. It is also used in other AMR works [15], [24], [25]. In the experiment, we design to evaluate the detection accuracy following the MS COCO challenge [50] that measure mAP over different IoU thresholds, from 0.5 to 0.95, to indicate how precise that the predicted bounding boxes aligned with the ground truth.
To evaluate the accuracy of digit recognition, we use the following Digit Recognition Accuracy ( ) metric: = ℎ

,4!
Here, N is the number of classes. We also introduce a metric called Average Maximum Structural Similarity Index Measure (AMSSIM) which is computed as the average of the maximum Structural Similarity Index Measure (SSIM) [51]:

,4!
Denoting N is number of image pairs (x, y) that SSIM were computed. SSIM is a well-known metric for measuring the perceptual similarity between two images. The SSIM result is a decimal value between 0 and 1, value 1 indicates perfect structural similarity of two images, while a value of 0 indicates no structural similarity.

B. EVALUATION
We first assess the counter detection to verify the performance of the modified YOLOv5 in detecting counter regions. Afterward, the evaluation of the recognition stage, based on the generative data augmentation techniques, is conducted through three tests. The first test focuses on a particular digit and validates the effectiveness of the proposed appearance-preserving number generator to solve the issue of the lack of a sufficient variation in numbers for particular digits. The second test focuses on the transitional state problem to validate the effectiveness of the proposed rotating digit generators to address this issue. The last aims to validate comprehensively whether our generative augmentation technique combined with the AMR framework can be used to create large, balanced quantities of training data and to achieve a high recognition accuracy.
In all three tests, we compare our method with the existing character permutation method [15] and also with a standard augmentation technique in the third test.

1) COUNTER DETECTION
With the YOLOv5s model, an impressive detection result was achieved, even though the quality of some images is quite low and counters occupy a small portion in the image. Table 4 shows the results of mAP over different IoU thresholds(from 0.5 to 0.95), following COCO's standard metric [50]. The last column is the average. YOLOv5s model performs well even with high IoU threshold. The results show that the predicted bounding boxes aligned precisely with the ground truth. Moreover, when set IoU > 0.5 following the approach from the Pascal VOC Challenge [49], the model correctly detected all counter regions in the UFPR-AMR dataset.

2) EVALUATION OF APPEARANCE PRESERVING NUMBER GENERATOR
We conducted the experiment to validate the effectiveness of our method in the case of lacking a sufficient variation in numbers for a particular digit. Because the last (least significant) digit is likely to have more variation in numbers in the test dataset, we therefore create the scenario by extracting the meter images whose last digit contains one single number in the training dataset and use these images only to create the training dataset consisting of other numbers at the last digit. We generate 400 training images with two augmentation methods: our method and the character permutation method [15]. The amount of images extracted for all 10 numbers "0"~"9" and the DRA of the  models trained with the 400 images generated from the extracted images with the two augmentation methods are shown in Fig. 8. We can see that number "8" and "3" have the smallest and largest amount of images, respectively. The proposed method outperforms the existing permutation method when any number is used for data augmentation. DRA could reach as high as 85.62 % when use less than 100 images consisting of a single number only for training. Table 5 shows the results when using the image with number "1" at the last digit for generating the training dataset. The recognition results are greatly improved when taking advantage of data augmentation. Our method achieved a recognition accuracy of 85.47%, which outperforms the existing method by 8.99%. Furthermore, when the generated image increased to 1,000 images, the accuracy of permutation method dropped. Because the original image dataset only has 89 images, the permutation method has a limit of number variety for swapping which caused the unbalanced numbers on each digit position. Fig. 9. shows the distribution of 10 numbers over the 5 digits in the training set (1,000 images) generated by each method, which proves that our proposed method can generate more balanced dataset than the permutation method especially when the original dataset is small and hence result in better recognition result.
The accuracy of the proposed method become saturated from 1,000 training images and starts to drop at 2,500 training images. This is because all the training images are actually generated from only 89 images with limited style variations, which causes the overfitting problem and make the accuracy dropped.   Table 6 shows the AP (area under PR curve) of each class by digit position when using the two methods for generating 1,000 training images. We can see that our method could particularly improve the AP of the classes with smaller number of training images, such as the class of number "6", when using permutation method. In overall, our method could improve mAP for all 5 digit positions. As expected, both methods got the lowest mAP at the last digit position, which is caused by the fact that only images consisting of number "1" is used for training or for generating the training dataset.
To further validate the effectiveness of the proposed appearance preserving number generator, we newly introduce a metric called AMSSIM. Structural Similarity Index Measure (SSIM) is a well-known metric for measuring the perceptual similarity between two images. The proposed appearance preserving number generator aims at generating the image of new numbers resembling the local appearance of a particular digit. Therefore, if the data augmentation went successfully, for each number in the test image, it should be possible to find an image with similar appearance, that is, having high SSIM value in the training set. Therefore, to evaluate the quality of the training set generated from a particular number, for each of the remaining 9 numbers in test dataset, we compute the SSIM with all images of the same number in the training set and find out the image with maximum SSIM. Then AMSSIM for each of the 9 number is computed as the average of this maximum SSIM of the images in the test dataset. Table 7 compare AMSSIM of the proposed method and the character permutation method when using the 89 images with "1" at last digit to generate 400 images as the training set. We can see that the proposed method achieved much higher AMSSIM than the existing method on every generated number, which explain from another perspective on why the proposed method could achieve higher recognition accuracy than the existing method, in addition to the data balance issue.

3) EVALUATION OF ROTATING DIGIT GENERATOR
Another challenging issue of the reading stage is the recognition of transitional states between two consecutive numbers. First, we selected 250 images with a transitional state from the test image set to evaluate the proposed augmentation method. For comparison, we generated images using two augmentation methods and applied them to train the recognition model.  The recognition result in Table 8 shows that the proposed generative augmentation yielded the best recognition results, especially for the transitional state digits. The average accuracy of our proposed method is higher than that of the existing method by 1.76%. For the transitional state digits, our proposed method outperforms the existing method by 16.68%.
This evaluation demonstrates the effectiveness of the proposed rotating digit generator for automatically generating annotated images of transitional states.

4) COMPREHENSIVE EVALUATION
We first conducted an experiment on a small dataset with the aim to mimic the problem of data insufficiency. Half of the UFPR-AMR training dataset was randomly selected, and we generated training images from these 400 images with three augmentation methods, our proposed method, the character permutation method and traditional augmentation method provided by [31], to obtain a training dataset with 800 images.
The traditional image data augmentation technology adopted in [35] is to synthesis various image degradation factors and have been widely used for enhancing the robustness of detection or recognition models against unconstrained environment. We implemented the image synthesis method to mimic the following artifacts: 1) Crop and pad each digit to mimic the occlusion digit on the meter image that may occur by the camera's shooting angle 2) Blur and noise 3) Uneven lighting by randomly adjusting brightness and contrast 4) Geometric transform, such as stretching transformation, affine transformation, rotation, and down sampling. The results obtained with the 3 methods are shown in Table 9. Our generative augmentation technique achieved the best result, outperforming the character permutation [15] by 2.53% and the traditional augmentation method by 2.92%.
In the last test, we compared our augmentation method with the existing character permutation method and traditional augmentation technique for generating a large quantity of training data. We have conducted the experiment by generating 100,000-200,000. Fig. 11 shows the recognition accuracy when increasing the training data from 100,000 to 200,000. All methods' accuracies become saturated from 110,000 images. The traditional augmentation which is unable to generate new number has a lowest accuracy while a gap between permutation method and our proposed method remaining. The superiority of our method over the permutation method is considered mainly lying on two reasons. One is that a part of the remaining errors occurred with the intermediate state digits. When the augmentation proceeds, our method can generate more intermediate state training data and hence reduced the recognition errors. The other reason is the unbalance of the amounts of different numbers which may cause the accuracy dropping of permutation method in the latest period (when increase data size from 150,000 to 200,000 images). Our algorithm (Algorithm II given in Section III) can generate balanced number, hence contributed to the stability of the accuracy. Although the permutation method was also implemented in a way trying to generate meter images with a uniform distribution of all numbers as possible, it is inherently impossible to get a precise control over the amount of all numbers for each digit position.   Nevertheless, the dataset used for this test consists of the images which is taken from various meters under various environments. Even though the first digit position has an unbalanced data, other positions still have enough number to use for swapping among digits. For such data, permutation method usually can take advantage of increasing the data size. However, these public datasets are available for academic use only. On the other hand, although it is easy to collect a large number of images in industry application, the image amount for each number is usually severely unbalanced by digit positions. The significant digits usually take long time to be updated and it is difficult to get the images of different numbers for these digits. This is the problem drove us to develop the generative augmentation method which can generate different numbers for those digits from just very few numbers.

C. DISCUSSION AND LIMITATION
The proposed generative augmentation, which mainly consists of three stages, can successfully transfer the local appearance of a digit to an arbitrary number by taking advantage of generative network. It can also mimic a particular state of a rotating digit. The experiments show that the proposed method can generate up to 2000 training images from only 89 imbalanced sample images, which improves the recognition accuracy by about 26% from nonaugmentation dataset. The accuracy starts to drop with 2,500 training images due to the overfitting problem caused by using the images with only "1" as the last digit. However, in the comprehensive experiment that increases the training dataset to more than 100,000 images from the original dataset consisting of 800 images, the recognition accuracy of the proposed method continued to improve and reached to 97.12% at 200,000 images, though it almost saturated from 110,000 images. Fig. 12 shows some results by the existing method and our proposed method. The numbers shown on the top row represent the ground truth, and the middle and bottom rows show the recognition results from the existing method and the proposed method, respectively. The failed digits are shown in red. As can be seen, the proposed method succeeds even in challenging cases, such as rotating states and low image quality, and it failed only in highly difficult cases that were even difficult to recognize by a human, as shown in the last row of Fig. 12. Fig. 13 shows some cases in which the images generated by the proposed method do not resemble the original one perfectly. On the first and second examples, dirt and light on the input images were not correctly transferred onto the generated images. For the third and fourth examples, there are blur and noise on the generated images. However, these kinds of failure cases account for just a small proportion. In future work, we intend to improve the appearance preserving number generator by exploring new deep generative networks. For the fourth images, the font of meter is not successfully transferred. This is mainly due to the fact that the training dataset of generative network has few images (0.003% only) of the similar font as the fourth meter image. The generative network failed to learn the font and transfer it to the generated image precisely but instead used the font of some meter which has larger proportion in the dataset. Currently this is the main limitation of the proposed data augmentation method. However, compared to capturing conditions, a slight difference in font style usually does not affect the recognition accuracy. Therefore, for the purpose of AMR, the proposed technique, which can transfer the appearance of numbers captured under unconstrained environments, should have sufficient significance in real applications.

VI. CONCLUSION
In this paper, we presented a new automatic rotating meterreading system featuring a novel data-augmentation technique called generative augmentation. This augmentation technique takes advantage of a generative deep neural network to generate new annotated digit images, along with transferring the local appearance of a target digit to cope with the difficulty in collecting training data for digits requiring a long time to update. With this augmentation algorithm, we are able to control the frequency of each digit and construct a balanced training dataset from a few sample images. Moreover, our proposed data augmentation technique can automatically generate annotated images of transitional states between two consecutive numbers, which greatly improves the robustness of the recognition for analog rotating meters.
The experiments were conducted in various scenarios considering real-world AMR applications. All of the evaluation experiments confirm that our method improved the reading accuracy compared with using the original dataset, and the data expanded with the existing augmentation technique. The proposed augmentation technique can be combined with any state-of-the-art AMR framework for any  real rotating meter image reading applications only by collecting a few images from the assumed environment.
The proposed method can be easily applied for reading the images sequence sent from a remotely installed camera. By using the coherence of numbers between consecutive frames, the reading accuracy can be improved even further.