DeblurGAN-CNN: Effective Image Denoising and Recognition for Noisy Handwritten Characters

Many problems can reduce handwritten character recognition performance, such as image degradation, light conditions, low-resolution images, and even the quality of the capture devices. However, in this research, we have focused on the noise in the character images that could decrease the accuracy of handwritten character recognition. Many types of noise penalties influence the recognition performance, for example, low resolution, Gaussian noise, low contrast, and blur. First, this research proposes a method that learns from the noisy handwritten character images and synthesizes clean character images using the robust deblur generative adversarial network (DeblurGAN). Second, we combine the DeblurGAN architecture with a convolutional neural network (CNN), called DeblurGAN-CNN. Subsequently, two state-of-the-art CNN architectures are combined with DeblurGAN, namely DeblurGAN-DenseNet121 and DeblurGAN-MobileNetV2, to address many noise problems and enhance the recognition performance of the handwritten character images. Finally, the DeblurGAN-CNN could transform the noisy characters to the new clean characters and recognize clean characters simultaneously. We have evaluated and compared the experimental results of the proposed DeblurGAN-CNN architectures with the existing methods on four handwritten character datasets: n-THI-C68, n-MNIST, THI-C68, and THCC-67. For the n-THI-C68 dataset, the DeblurGAN-CNN achieved above 98% and outperformed the other existing methods. For the n-MNIST, the proposed DeblurGAN-CNN achieved an accuracy of 97.59% when the AWGN+Contrast noise method was applied to the handwritten digits. We have evaluated the DeblurGAN-CNN on the THCC-67 dataset. The result showed that the proposed DeblurGAN-CNN achieved an accuracy of 80.68%, which is significantly higher than the existing method, approximately 10%.


I. INTRODUCTION
Character recognition is a sub-process of text recognition systems used to recognize handwritten and printed texts within document images, such as historical documents, memoranda, and archival material. Therefore, when the main objective is to focus on the effects of handwritten character recognition, the factors that affect are as follows. 1) Writing styles; the distinctions of writing in each era, the diversity of individual writing styles, and even writing types of equipment [1], [2]. 2) Degradation of historical documents; this maybe due to The associate editor coordinating the review of this manuscript and approving it for publication was Donato Impedovo . a lack of expert staff and the humidity of a storage location. 3) Digital transformation; blurred and noisy document images were created when using low-quality equipment and taking the picture with a camera without adequate lighting. 4) Limitations of data; an insufficient and uncovered dataset of handwritten character images in the training process. These factors need to be considered when recognizing handwritten text images.
The factors mentioned above directly affect machine learning, leading to decreased recognition performance. In the case of noise when digitizing ancient documents, Su et al. [3] experimented with noise generation using the differential evolution method to determine the optimal position for VOLUME 10,2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ digitization. Adding one pixel to the original image (called a one-pixel attack) logically is the trick that causes the convolutional neural network (CNN) models to be misrecognized. Their experiments showed that adding one pixel could harm the CNN model by increasing the recognition errors. Mei et al. [4] demonstrated that blurred images affect the recognition rate. Subsequently, the DeepDeblur algorithm was invented to transform blurred into sharp images before sending the sharp images for recognition. Also, the sharp images caused the model to increase its recognition performance.
Recently, CNN has replaced traditional machine learning [5] and is widely used in handwritten character recognition. Since the CNN method is an automatic algorithm that consists of feature extraction techniques and image recognition, it is currently used in character recognition in many languages, such as Latin, Arabic, Bangla, Korean, Chinese, and Thai [1], [2], [6], [7], resulting in increased character recognition efficiency. However, if the training images are low quality and noisy, they will significantly reduce recognition efficiency [3], [4].
Furthermore, deep learning techniques, including CNN, auto-encoder, and generative adversarial network (GAN), have also been proposed to improve image restoration and denoising. Dong et al. [8] proposed the image restoration technique using the CNN technique. The objective of their study was to transform the low-resolution images into high-resolution images. They proposed the super-resolution CNN method, which is a lightweight deep learning architecture that quickly restores and reconstructs quality images. Zhang et al. [9] presented feed-forward denoising CNNs, which integrate single residual learning into the CNN architecture for denoising images and to manipulate blind Gaussian noise without unknown noise levels. Further, Gondara [10] proposed a convolutional denoising autoencoder to denoise the signal from the medical images and Souibgui et al. [11] proposed an encoder-decoder architecture based on vision transformers, called DocEnTr, to enhance degraded document images.
The GAN architecture is widely used in many domains, especially for image restoration and deblur [12], [13], [14]. The GAN architecture is designed as a generator that is capable of learning from many images and recreating a new image. The adversarial loss function in the GAN architecture is used to create a robust model that aims to create highquality images during regeneration. DeblurGAN [14] was first employed by using the learning process of the WGAN-GP [15] and used perceptual loss [16], allowing the model to deblur images in the form of blind motion blur that can be caused by camera movement during a photograph. Consequently, GAN is designed to solve the problems of document images, such as cleaning noisy backgrounds, deblurring text in the documents, and regeneration of damaged characters into the complete characters [17], [18], [19], [20].

A. CONTRIBUTION
This research presents the DeblurGAN-CNN architecture that aims to solve the recognition problems of noisy handwritten character images. The proposed DeblurGAN-CNN architecture improved the image quality and resulted in higher performance of handwritten character recognition on various handwritten character and noisy character datasets. The contributions of our research are the following.
1) This paper proposes a new standard noisy Thai handwritten character dataset, called the n-THI-C68 dataset, to challenge other researchers to reconstruct sharp and clean handwritten characters. The noisy handwritten character images were synthesized by adding five noisy methods: low resolution, low contrast, additive white Gaussian noise, motion blur, and mixed noise. The n-THI-C68 dataset includes 68 classes and contains 11,592 character images in the training set and 14,290 character images in the test set. 2) We propose the deblur generative adversarial networks (GANs) combined with the convolutional neural network (CNN) architectures, called the DeblurGAN-CNN architecture, to reconstruct highquality handwritten characters from noisy handwritten characters and simultaneously enhance the accuracy of the handwritten character recognition systems. In the DeblurGAN-CNN architecture, DeblurGAN is proposed to learn from the noisy images and regenerate the new sharp and clean handwritten character images. Hence, the reconstructed handwritten character images are assigned to the CNN architecture for recognition.

B. PAPER OUTLINE
This paper is organized as follows. Section II presents handwritten character recognition, convolutional neural network, and generative adversarial network. The proposed DeblurGAN-CNN architectures are described in detail in section III. The handwritten character (THI-C68 and THCC-67) and noisy handwritten character (n-THI-C68 and n-MNIST) datasets that were used in the experiments are described in Section IV. Section V reports the experimental results. The performance of the proposed method is discussed in Section VII. Finally, conclusions and future work are addressed in section VI.

II. RELATED WORK A. HANDWRITTEN CHARACTER RECOGNITION
In the last two decades, handwritten character recognition (HCR) has been well-studied and has become fundamental to research in image recognition. Many handwritten datasets have been collected from real-world data that aim to improve the quality of the characters and enhance the recognition performance. The most well-known dataset is the MNIST dataset [21], which collected many digits written on envelopes and has 70,000 handwritten digits in total.
To recognize the digit images from the MNIST dataset, 90134 VOLUME 10, 2022 LeCun et al. [21] proposed the first convolutional neural network that included five convolutional layers, called LeNet-5, to address problems of the MNIST dataset. Their method achieved an accuracy of 99.20%. Belongie et al. [22] proposed shape context to discover the correspondence points on the digit images and then match two shapes using the bipartite graph method. Hence, the minimum cost between the shape of the query image and training images was the best matching. As a result, an accuracy of 99.37% was achieved from their method. Surinta et al. [23] proposed the histograms of oriented gradients (HOG) and bag of visual words (BOW), called HOG-BOW, to first extract the local features from the sub-images. Second, local features were sent to the K-means clustering algorithm to construct the codebook and used as the BOW features. Finally, the L2-regularized support vector machine (L2-SVM) was proposed as the classifier. The HOG-BOW combined with L2-SVM achieved an accuracy of 99.43%. Maas et al. [24] proposed dual codebooks that were constructed from the features extracted using pixel intensity and HOG method, called dual-BOW. The dual-BOW method achieved an accuracy of 99.17%. Furthermore, Abdulhussain et al. [25] used orthogonal polynomials and moments to extract the gradient and smooth from the digit images. These features were sent to the SVM to classify the digit images of three datasets: Roman, Arabic, and Devanagari, achieving an accuracy of 100%, 99.32%, and 99.28%, respectively. For the Thai character dataset, Surinta et al. [1] collected isolated Thai handwritten characters that contained 68 classes and consisted of consonants, vowels, tones, and special symbols. They also proposed two local descriptors: scale-invariant feature transform descriptor (siftD) and HOG, to extract the robust features from the Thai character images. The robust features were sent to classify using SVM and K-nearest neighbor (KNN) methods. The Thai handwritten character dataset was divided into training and test sets for evaluation. The best method was the siftD method which combined SVM with the radial basis function (RBF) kernel. The siftD+SVM method achieved 98.93% with 10-fold cross-validation and 94.34% on the test set. Furthermore, Inkeaw et al. [26] proposed the gradient features of discriminative regions (HOGfoDRs) and SVM to recognize Thai characters. The HOGfoDRs+SVM method achieved 98.76% with 5-fold cross-validation. For the updated Thai handwritten dataset, Onuean et al. [27] collected Thai handwritten characters, called Burapha-TH, that consisted of 10 digits, 68 characters, and 320 syllable classes. They also created a CNN model using a VGG architecture with a batch normalization layer containing 13 layers, called VGG-13, evaluated on the Burapha-TH dataset, and which achieved 92.29%, 95.00%, and 96.16% accuracy on the digit, character, and syllable classes, respectively.
In this section, we focused on various approaches which used the traditional methods, including feature extraction methods and machine learning techniques for handwritten character recognition. For the feature extraction method, many state-of-the-art methods were investigated, such as siftD, HOG, HOG-BOW, dual-BOW, HOGfoDRs, orthogonal polynomials, moments, and shape context. Some state-of-the-art methods, including siftD and HOG, focus on extracting the feature from the invariant key points when the image is resized and rotated. Other methods, such as HOG-BOW and dual-BOW, cluster the robust feature that is extracted by the feature extraction methods into a codebook using clustering algorithms. Then, encoding the codebook from the input images and using them as robust features. For the machine learning techniques, two techniques: SVM and KNN, are proposed to create a robust model using the robust features. We have seen that simple machine learning techniques, such as the KNN algorithm, could obtain a high recognition rate when the robust features are extracted. However, complex computation processes are required when extracting the robust features.

B. GENERATIVE ADVERSARIAL NETWORK
The generative adversarial network (GAN) was first presented by Goodfellow et al. [12]. GAN is an unsupervised learning model that automatically learns from the regularities of input images and is then capable of creating a new image that is similar to the original image. Therefore, GAN has been applied in a wide range of applications, such as natural transfer style, image super-resolution, face generation, image restoration, and even image deblurring [14], [28], [29], [30].
Since GANs have generative ability and style transformation, they were applied in the data augmentation technique [7], [29] to improve recognition performance for document images. Fogel et al. [31] proposed ScrabbleGAN, which is semi-supervised learning by using unlabeled and labeled samples during the training process, to synthesize different Latin and French handwritten text styles. In addition, Eltay et al. [7] proposed adaptive data augmentation based on the ScrabbleGAN architecture to recognize Arabic handwritten text. The adaptive method generated more balanced characters in training samples.
Moreover, many issues in documents, such as blurred image, noisy background, salt-and-pepper, and faded text, lead to the document being unreadable to humans, significantly decreasing the recognition performance of the text algorithms [17], [18], [19], [20]. To solve these problems, Bhunia et al. [18] proposed two networks, including texture augmentation and binarization networks, to binarize the degraded document images. First, the texture augmentation network was designed to create multiple textual contents with diverse noisy textures to increase the size of the document binarization dataset. Second, the binarization network generated new images, which are the clean binary document images. Sharma et al. [17] used CycleGAN to remove the noise from the documents resulting in cleaned documents. The CycleGAN model was employed to map noise to clean documents and clean to noisy documents using the cycle consistency loss function. Their experiment showed that the CycleGAN provided acceptable results. In terms of document VOLUME 10, 2022 enhancement, Souibgui and Kessentini [20] applied conditional GAN, which is a single GAN network, to restore various problems of mixed document degradations, including tasks of document clean up, binarization, deblurring, and watermark removal.
Furthermore, Wu et al. [32] applied Wasserstein loss to the CycleGAN that improved the CycleGAN algorithm to deblur text images into clear text images. Also, Zhao et al. [33] used the GAN model to optimize the distortion of input images before feeding the rectified images to the text recognizer.
Since the first GAN architecture was presented in 2014 [12] to regenerate a new image similar to the original image, many GAN architectures have been proposed to solve the problems, for example, noisy images, degraded documents, and blur text, in the domain of document images. We then have the concept of using the GAN architecture to denoise the handwritten character images before recognizing them using the CNN architecture.

C. CONVOLUTIONAL NEURAL NETWORK
The convolutional neural network (CNN) architecture method was first proposed in 1998 by LeCun et al. [21] to recognize handwritten digit images. In 2012, CNN began to gain attention and significant influence on image recognition research when Krizhevsky et al. [34] proposed a new CNN architecture that contains eight weighted layers, five convolutional layers and three fully-connected layers, to train on one million images from the ImageNet dataset with 1,000 classes [35] and win the LSVRC competition. In 2014, Simonyan and Zisserman [36] presented very deep CNN architecture to the depth of 16-19 weight layers, called VGGNets. These CNN architectures are called plain networks.
Consequently, we have seen that the design of the CNN architectures has very deep architecture, such as GoogLeNet which had 22 layers and ResNet which had more than 100 layers. However, when designed with deep weight layers, the weight parameters also decrease and require high computation. However, the new architectures were also proposed with new convolution techniques. For example, GoogLeNet proposed an inception architecture [36] which calculated with the small filter size of 1 × 1, 3 × 3, and 5 × 5. The dimension reduction technique was employed in the inception modules to reduce the weight parameters and then, the inception module was stacked on top of each module. A Residual connection [37] and InceptionResNet [38] architectures were proposed according to the time-consuming in the plain network like VGGNets. The residual connections were added to the plain network, which can be operated when input and output are the exact sizes. ResNets also has only one fullyconnected layer with 1,000 nodes, While the VGGNets have three fully-connected layers with 4,096, 4,096, and 1,000 nodes, respectively.
Furthermore, the concept of connecting the current layer to other layers in a feed-forward direction was proposed and called DenseNet [39]. In the DenseNet architecture, the weight layers were then reused entirely in the network to make the model more compact. The DenseNet architecture could define the network with more than 200 weight layers. For MobileNetV1 [40], the lightweight architecture was proposed by using depthwise separable convolution, which is the operation of depthwise and pointwise convolutions proposed to reduce the dimension of the feature map. Subsequently, the inverted residual modules were proposed in MobileNetV2 [41]. The concept of the automatic discovery of the CNN architecture using reinforcement learning and recurrent neural networks (RNN) was invented and was called neural architecture search (NASNet) [42], which is a scalable architecture. Many convolution operations were selected using the controller RNN and recursively constructed convolutional cell blocks.
For the use of CNN in digit handwritten character recognition, Cireşan et al. [43] proposed multi-column deep neural networks (MCDNNs) in which the input image was first trained by different DNN blocks. Hence, the outputs of each DNN block were classified by averaging individual predictions. The MCDNN yields high performance with 99.77% accuracy on the MNIST dataset. Furthermore, Savita et al. [44] discovered the best hyperparameters of the CNN architecture, including the number of layers, kernel size, padding, stride, and receptive field. They also trained the CNN architecture with various optimization algorithms (SGDM, Adam, Adagrad, and Adadelta). The results showed that the CNN architecture with the Adam optimizer achieved an accuracy of 99.89% on the MNIST dataset.
Tang et al. [45] proposed two CNN architectures that included 6 layers (4 convolutional layers and 2 fullyconnected layers) and 8 layers (5 convolutional layers and 3 fully-connected layers). The first CNN architecture was trained on printed Chinese characters. Hence, the pre-trained model of the first CNN architecture was used as a transfer learning to the second CNN architecture. The second CNN model was trained on historical Chinese characters. The accuracy was increased from 79.2% to 88.56% when using the transfer learning technique. Alom et al. [2] used various state-of-the-art CNN architectures: VGGNet, Network in Network, ResNet, and DenseNet, to recognize handwritten Bangla characters. The result showed that the DenseNet achieved the best accuracy with 98.31%. Gonwirat and Surinta [6] used the pre-trained model of the VGGNet instead of training from scratch. The result showed that the transfer learning of the VGGNet achieved 99.20% on the Thai handwritten character dataset.
Although the CNN architectures achieved high efficiency on image classification problems, Su et al. [3] demonstrated an image generation technique that only added one pixel into the target image based on the differential evolution technique. With only one attack pixel, the accuracy performance significantly decreased. For handwritten character recognition, many noisy methods were applied to the character images, such as motion blur, low contrast, and additive Gaussian white noise (AGWN) [46], to demonstrate that the noise images could significantly reduce the recognition performance. Consequently, to increase the recognition efficiency, a synthesized image technique was introduced to remove noise before sending images to recognition.
CNN architectures have been proposed for image classification purposes. The CNN architectures combine two main tasks (feature extraction and machine learning) into one architecture to specifically reduce the complex feature extraction processes. Many state-of-the-art CNN architectures have been proposed and have become successful in many domains. For example, AlexNet, GoogLeNet, VGGNets, MobileNets, ResNet, DenseNet, NASNet, and EfficientNet. However, the latest CNN architectures operate with more deep layers, convolution operations (i.e., 1D, 2D, 3D convolution [47], [48], and depthwise separable convolution), and extra layers (i.e., global average pooling, inception module, reduction cell) [42] to compute the robust spatial features from the image. Therefore, the researcher could propose new CNN architecture, invent new operations, and combine them with the existing CNN architectures.
From related work above, we found that the GAN architecture could be used to solve the problems of noisy images, while various CNN architectures could propose to recognize the noisy handwritten character images. The proposed denoising and recognition framework is described in-depth in the following section.

III. THE PROPOSED DENOISING AND RECOGNITION FRAMEWORK
The performance of the handwritten character recognition is always affected by noise. Consequently, we proposed the DeblurGAN-CNN architecture to address the noise problems. Although, many robust CNN architectures achieved high accuracy in every domain, even on handwritten character images. However, the accuracy decreases when affected by many types of noise, such as blur, low resolution, and low contrast. In this research, we first studied the effect of the noisy character images that harm the performance of handwritten character recognition. Second, data augmentation techniques were applied while training the CNN model to increase new patterns of the handwritten character images. The data augmentation methods could generalize the CNN model when the noise was not adequately high. Hence, the performance decreased after adding a high noise level. Third, we discovered that the Deblur-GAN could transform the noise into new clean handwritten characters. Finally, DeblurGAN architecture and the robust CNN architecture were combined to enhance the recognition performance of the handwritten character images, called DeblurGAN-CNN.
There are several methods for improving image quality, for example, super-resolution, image restoration, and deblurring images. However, some noise appears in the handwritten character images while transforming the document papers into digital format. Consequently, we considered two GAN architectures (DeblurGAN and CycleGAN) to address our problems because these two GAN architectures are designed for deblurring images. However, the CycleGAN is mainly used for a style transfer that transforms from one style to another style. In comparison, many forms of noise occur in handwritten character images, which means CycleGAN is not appropriate for these problems. Furthermore, we used the DeblurGAN architecture that could deal with many-to-one style transfer.
In this paper, we proposed the DeblurGAN-CNN framework that combines two state-of-the-art deep learning architectures to denoise and recognizes the noisy handwritten characters into one architecture. The proposed framework contains a generator of generative adversarial network (GAN) and convolutional neural network (CNN) architectures, as shown in Figure 1.
In the following subsections, the details of the DeblurGAN-CNN framework are described. 1) DeblurGAN is employed as a denoising network. 2) DenseNet121 is the convolutional neural network architecture performed as a recognition VOLUME 10, 2022 network. 3) We describe the DeblurGAN-CNN architecture and training strategy that is used for training the proposed framework.

A. DEBLURGAN
Kupyn et al. [14] proposed the GAN architecture to automatically deblur blurred images from any unknown blur function, called DeblurGAN, which can synthesize sharp images (I S ) from blurred images (I B ). The DeblurGAN uses the generator (G θ G ) and the discriminator (D θ D ) to distinguish between real and generated images.
The generator architecture of the DeblurGAN is shown in Figure 2. The beginning part of the network consists of three convolutional blocks that are designed to downsample the feature maps. The middle of the network is a sequence of nine residual blocks. In the last part of the network, the transposed convolution blocks are constructed to upsample feature maps to the original size as an input image. Moreover, the global skip connection is also proposed for this architecture by adding input to the output image. The global skip connection makes the network converge faster and yields better output results.
In the DeblurGAN, the PatchGAN architecture [13] is used as the discriminator. The PatchGAN architecture has downscale convolutional layers followed by instance normalization and leaky rectified linear unit (LeakyReLU) with α = 0.2.
Consequently, as shown in Equation (1), the loss function is presented in the DeblurGAN that includes adversarial (L GAN ) and content loss (L X ) that is weighted by λ, where λ is a parameter that controls the relative of two objectives: adversarial and content loss. The WGAN-GP [15], which is the critic function to determine the completeness of the generator result, is used as the adversarial loss, as shown in Equation (2). Also, the content loss is the perceptual loss [16] to compare the style-transfer, called reconstructed image, with the original image using the L2 loss function. (1)

B. DENSENET
In the early architecture, a residual connection using elementwise input (x) with an output building block (F(x)) [37] was proposed, called ResNet. The benefit of the ResNet architecture was that the network could construct with deep convolutional layers and still obtain better results in terms of speed and performance. However, the DenseNet architecture [39] was designed to include the maximum information flow by concatenating all feature maps (x p n ) from the previous convolutional layers, called a dense block. DenseNet was proposed to deal with the reuse of the features, reduce the architecture parameters, and eliminate gradient problems. The equation of the DenseNet is shown in Equation (3). ). An overview of the DenseNet is shown in Figure 3(a). The DenseNet architecture consists of three main parts.: 1) A convolutional layer with a kernel size of 7 × 7. The convolutional block includes BN, ReLU, and Conv layers, with a stride of 2 and followed by a 3 × 3 max pooling layer with a stride of 2. 2) Four dense blocks and transition layers.
3) The global average pooling (GAP) and classification layers with a softmax function.
Details of the DenseNet architecture, are as shown in Figure 3(b). The dense block is concatenated with the output of bottleneck layers, which is expanded N times, proposed to decrease the parameters of the architecture. Each bottleneck layer consists of 1×1 Conv and 3×3 Conv layers, as shown in Figure 3(c). The transition layer (see Figure 3(d)) is proposed to reduce the feature map width and height by 2 × 2 average pooling with a stride of 2 and θ parameter applies to compress the network where a range of a parameter is 0 < θ ≤ 1.
In this paper, we proposed to use DenseNet121 since it is the smallest size appropriate for handwritten character recognition.

C. DEBLURGAN-CNN SETTING AND TRAINING SCHEME
In this section, we provide the construction and training strategy of the proposed framework, as shown in Algorithm 1. Also, the details of the setting and training strategy of the DeblurGAN-CNN framework are described in the following.

1) DEBLURGAN TRANING
DeblurGAN was designed for deblurring images. However, in our problems, DeblurGAN was applied to reconstruct the sharp handwritten character images from the various noisy styles, such as low contrast, motion blur, and white Gaussian noise. To train the DeblurGAN architecture, the dataset then includes the pairs of noisy and sharp handwritten character images, (x Step 3) Create CNN of pretrained weight from the Ima-geNet dataset.
Step 4) Train CNN using P epochs with the dataset (X D , Y D ) and save the best model based on the loss function in Equation (4).
Step 5) Construct a DeblurGAN-CNN network as the following: -Load the G (x) network in the step 2).
-Load the CNN network in the step 4).
-Combine G (x) and CNN with the intermediate layer.
Step 6) Fine tune the DeblurGAN-CNN network training using P epochs with the dataset(X D , Y D ) and the loss function in Equation (4). The training steps consist of two steps as the following: -Freeze the part of G (x) in the network and train using P/2 epochs.
-Unfreeze and train all layers in the network using P/2 epochs. Output: The DeblurGAN-CNN network of the reconstructed handwritten character images using the discriminator and the loss function as shown in Equation (1).

2) CNN TRANING
We employed the CNN architectures to train on a handwritten character dataset that consisted of the pairs (x i , y i ), where i = 1, 2, . . . , n, x i is handwritten character of character i and y i is label of character i. To improve the efficiency performance, we proposed the transfer learning method [6] with convolutional kernels of prior knowledge for faster convergence in a few epochs. The pre-trained CNN model was modified in the classification layer and then fine-tuned in the network. Furthermore, we trained the CNN models with the data augmentation techniques with noisy handwritten character images (x ) and f noisy (x) is the generator function of a synthesized noisy image. We trained the CNN model to classify images using categorical cross-entropy loss function as shown in Equation (4).
where N is the number of training images and p CNN (x i ; θ) is the probability distribution of CNN output, where x i is an input image and θ is weight parameters.

4) DEBLURGAN-CNN FINE-TUNING
The DeblurGAN-CNN network is still an incomplete merge network since a part of CNN has inexperienced generator output. Thus, fine-tuning the DeblurGAN-CNN network is an approach to improvement. In the first step, we only trained the CNN by freezing the DeblurGAN generator for stable network training and retraining the output as sharp images.
In the second step, we trained the DeblurGAN-CNN network with unfrozen whole layers. The proposed DeblurGAN-CNN network was trained with a few training epochs. We trained only ten epochs in each frozen step and each unfrozen step.

IV. HANDWRITTEN CHARACTER DATASETS
In this section, we briefly describe the handwritten character datasets used in the experiments, including two Thai handwritten character datasets: THCC-67 [49] and THI-C68 [1], and two noisy handwritten character datasets: n-MNIST [46] and n-THI-C68. An overview of the handwritten character datasets is shown in Table 1.

A. THE NECTEC THAI HANDWRITTEN CHARACTER CORPUS (THCC-67)
The National Electronic and Computer Technology Center (NECTEC) presented a Thai handwritten character corpus (THCC) of consonants, vowels, and tones that contains 67 classes, called THCC-67. The THCC-67 dataset has 9,012 characters that were rescaled to 32 × 32 pixels. In this research, we used it as an independent test. The THCC-67 dataset is shown in Figure 4(a).

B. THE ALICE OFFLINE THAI HANDWRITTEN CHARACTER DATASET (THI-68)
The THI-C68 dataset containing 28 classes was proposed by Surinta et al. [1]. The THI-C68 dataset was collected from 150 university students aged 20-23 years old. Students wrote the Thai characters on a form with a white background that was scanned with a resolution of 200 dpi. Image transformation was used to rescale the aspect ratio to avoid distortion and images were stored in grayscale format. The THI-C68 dataset has 14,490 character images containing consonants, vowels, and tones. An example of the THI-C68 is shown in Figure 4(b).

C. NOISY THI-C68 (N-THI-68)
In this research, we propose a new noisy Thai handwritten character dataset, called noisy THI-C68 (n-THI-C68). We synthesized new noisy character images using five different noisy techniques: low resolution, additive white Gaussian noise (AWGN), low contrast, motion blur, and mixed noise. We randomly selected one noisy technique to synthesize each character image according to the THI-C68 dataset with 11,592 training images and 2,898 test images. We obtained 11,592 noisy character images for the training set that were randomly applied with noisy techniques with various adjustment values. For the test set, we increased the size from 2,898 character images up to 14,290 noisy character images by randomly applying five noisy techniques to the original character images.
As shown in Figure 5(a), noisy Thai handwritten character images were synthesized as follows. 1) Low resolution with a low level at 8-12 pixels. 2) AWGN with increasing noise with a peak signal to noise ratio (PSNR) of 9.5. 3) Low contrast with reduced color gradient in range of 0.15-0.5 based on the original images. 4) Motion blur with two blur methods: directional motion blur [46], [50] and random motion blur [51]. 5) Mixed noise between four noisy methods.

D. NOISY MNIST (N-MNIST)
Basu et al. [50] proposed the noisy MNIST (n-MNIST), which is the extended version of the MNIST dataset [21] that applied three noisy methods: AWGN, motion blur, and combinations between reduced contrast and AWGN. The n-MNIST dataset contains 10 classes (0-9) and has 180,000 training samples and 30,000 test samples due to applying three noisy techniques to the original images. Figure 5(b) shows noisy digits were applied as follows. 1) AWGN using increase noise with RSNR of 9.5. 2) Motion blur using linear motion filter with a size of 5 pixels and rotation with 15 degrees, and a combination between reduced contrast and AWGN with a PSNR of 12.

V. EXPERIMENT RESULTS
In this section, we evaluated the performance of the proposed DeblurGAN-CNN architecture on the handwritten character datasets and noisy handwritten character datasets. We then investigated the effective recognition of CNNs and the quality of image restoration by the generative adversarial networks (GAN). In this study, we trained the CNN and GAN models on Linux operating systems with Nvidia GeForce GTX1080ti 8G GPU, Intel(R) Core i5-7400 Processor 3.00GHz CPU, 32GB DDR4 RAM.

A. EVALUATION OF THE CNN ARCHITECTURES ON THI-C68 DATASET 1) COMPARISON OF STATE-OF-THE-ART CNNS
We evaluated four CNN architectures: VGG19, Inception-ResNet, MobileNetV2, and DenseNet121 on the Thai handwritten character dataset to find the best CNN architecture. We divided the THI-C68 dataset into a training set and test set with 80% and 20% ratios, with 13,041 training images and 1,449 test images. Hence, the training set was evaluated using 5-fold cross-validation. The test set was an independent holdout set for final evaluation.
Furthermore, we focused on three training methods: 1) scratch learning (SL), 2) transfer learning (TL), and transfer learning with noisy data augmentation techniques (TL-nDA).  We proposed four noisy data augmentations: low resolution, AWGN, motion blur, and mixed noise, which were generated as a training set of the n-THI-C68 dataset.
The hyperparameters in CNN models were defined as follows: training epochs = 100 epochs, batch size = 32, VOLUME 10, 2022 FIGURE 6. Illustration of the noisy images of (a) low resolution, (b) AWGN, (c) low contrast, (d) motion blur, and (e) mixed noise, as shown in the first row and reconstructed images using DeblurGAN architecture, as shown in the second row. Note that the high PSNR value presents better performance accuracy, and the high SSIM value presents the most similar character images between the reconstructed and original images. stochastic gradient descent (SGD) optimizer, learning rate = 0.001, decay rate = 0.0001, momentum = 0.9, and image size = 128 × 128 pixels which is the smallest input of the InceptionResNet architecture. In transfer learning, we also used the pre-trained CNN model that learned on the ImageNet Dataset [35].
The accuracy results of CNN architectures are shown in Table 2. The accuracy performance of the CNNs was above 97% accuracy. The VGG19 architecture achieved the lowest performance on the THI-C68 dataset with an accuracy of 96.93% when training from scratch. On the other hand, the DenseNet121 architecture achieved the best performance in all learning methods with an accuracy of 99.48% when using transfer learning.
Furthermore, we demonstrate that noisy data can decrease the recognition performance of the CNN architectures. This experiment then applied four noisy data augmentation techniques while training the CNN model using the transfer learning method. It clearly showed that the accuracy of DenseNet121 was slightly decreased from 99.48% to 99.28% when training with noisy images. Subsequently, we proposed the DeblurGAN-CNN architecture to address the problems of noisy images. The result of the DeblurGANs is shown in the Section B.

2) COMPARISON OF THE CNNS AND OTHER STUDIES
According to previous experiments, we selected two CNN architectures, DenseNet121 and MobileNetV2. In this study, two CNN architectures were used and hand-crafted feature extraction combined with machine learning, namely SiftD-SVM [1] and HOGFoDRs-SVM [26], were evaluated and compared on the THI-C68 dataset.
To consider a fair comparison between CNN architectures and previous studies, we provided two shuffled random subsets of the THI-C68 dataset according to the experiments of Surinta et al. [1] and Inkeaw et al. [26]. The first subset (Set-I) had 11,592 training samples and 2,898 test samples. The second subset (Set-II) had 13,041 training samples and 1,449 test samples. Note that, Set-I and Set-II were compared with the HOGFoDRS-SVM and the SiftD-SVM methods.
The results reported in Table 3 show that the DenseNet121 architecture with transfer learning (DenseNet121-TL) outperformed every CNN architecture on both sets with 5-fold cross-validation. Consequently, DenseNet121-TL outperformed the HOGFoDRs-SVM method by 0.51% on Set-I and outperformed the SiftD-SVM method by 0.37%. Also, MobileNetV2 with transfer learning (MobileNetV2-TL) achieved the highest performance on the independent test set of Set-II with 99.31% accuracy. MobileNetV2-TL significantly outperformed the SiftD-SVM method by 4.97%.
From the results above, the CNN architectures with transfer learning impact improving the performance of handwritten character recognition. Consequently, the CNN models achieved better accuracy than the hand-crafted features [1], [26] on the THI-C68 dataset.

B. DENOISING PERFORMANCE OF DEBLURGAN ON THE N-THI-C68 DATASEST
In this experiment, the input images were the noisy images of the n-THI-C68 dataset with 128 × 128 pixels. We first reconstructed the denoise character images with 128 × 128 pixels using Wasserstein and content loss functions. The hyperparameters of DeblurGAN were applied as follows: the optimization algorithm is Adam, learning rate = 0.0001, momentum = 0.9 and 0.999, training epochs = 200, and batch size = 32.
To study the reconstruction quality of the denoise images, we evaluated the DeblurGAN architecture with two well-known image quality metrics called the peak signal to noise ratio (PSNR) and the structural similarity index (SSIM) on the n-THI-C68 dataset. The noise images with different noise methods and reconstructed images are shown in Figure 6. We reported the PSNR and SSIM values obtained when evaluating the different noise methods. High PSNR and SSIM values represent better accuracy and reconstruction of the image, respectively. We achieved the best PSNR and SSIM when using DeblurGAN to reconstruct the character images from noisy images of the low contrast, low resolution, and AWGN, respectively. However, motion blur and mixed noise were the most difficult to reconstruct.
The DeblurGAN architecture adds the residual blocks and global skip connection in the generator, making the DeblurGAN only learn a residual correction to transform the noisy images. The DeblurGAN could be more generalized in reconstructing the denoise images generated by multiple generations or from the unknown kernel. Importantly, the DeblurGAN [14] uses the WGAN-GP and perceptual loss when reconstructing denoise images, while the traditional neural networks use L1 and L2 optimization algorithms when reconstructing denoise images.

C. DENOISING PERFORMANCE OF DEBLURGAN ON THE N-THI-C68 DATASEST
This section presents the DeblurGAN-CNN architectures to perform on the n-THI-C68 dataset. In response to the experimental results, as shown in Section A, we selected two CNN architectures, DenseNet121 and MobileNetV2, as the CNN models. Hence, we connected DeblurGAN with CNN architecture, called DeblurGAN-DenseNet121 and DeblurGAN-MobileNetV2. Consequently, we compared the DeblurGAN-CNN architectures with the traditional CNN architectures to recognize the noisy character images, as shown in Table 4. Table 4 shows that the CNN architecture achieved low accuracy when using MobileNetV2-TL. It attained 77.24% accuracy when recognizing the noisy images with low resolution. The worst performance of only 13.80% accuracy was achieved when recognizing low-contrast images. However, we found that when training the CNN model using transfer learning with noisy data augmentation techniques (TL-nDA), the accuracy increased from only 13.80% to 95.62% when using MobileNetV2-TL-nDA. The overall performance accuracy of MobileNetV2-TL-nDA and DenseNet121-TL-nDA was 94.21% and 94.33% respectively.
The results show that the DeblurGAN-CNN architectures could address the problems of the noisy character images by achieving higher performance above 97% accuracy on all noise methods. Subsequently, the DeblurGAN-DenseNet121 achieved 98.53% accuracy and slightly outperformed the DeblurGAN-MobileNetV2 that achieved an accuracy of 98.47%. Moreover, the DeblurGAN-CNN architectures significantly outperformed the DenseNet121-TL-nDA and MobileNetV2-TL-nDA (The result was significant at p <.05). The misclassified characters are shown in Figure 7.
We concluded that only training the CNN models using the transfer learning with noisy data augmentation techniques could achieve accuracy above 90% on the n-THI-C68 dataset, although, very high accuracy is required in the handwritten character tasks to reduce the error while using the output data. Importantly, we recommend using the DeblurGAN-CNN architectures as this study yielded promising and outstanding results.

D. COMPARISON OF THE DEBLURGAN-CNN ARCHITECTURE AND OTHER APPROACHES
We selected two DeblurGAN-CNN architectures: DeblurGAN-MobileNetV2 and DeblurGAN-DenseNet121, to evaluate generalization ability on other noisy datasets n-MNIST and THCC-67. Comparisons of results on the n-MNIST and THCC-67 datasets with the GAN-CNNs and other approaches are presented in Table 5 and Table 6. Table 5 presents the comparison results between the proposed DeblurGAN-CNN architectures and other approaches on the n-MNIST dataset. As a result, the accuracy of the DeblurGAN-MobileNetV2 slightly outperformed the DeblurGAN-DenseNet121. The DeblurGAN-MobileNetV2 achieved the best accuracy on the n-MNIST dataset using AWGN and AWGN+Contrast noise methods.
The experimental results on the n-MNIST dataset showed that the optimal CNN-Hopfield network achieved an accuracy of 99.18%, 99.74%, and 97.53% when the AWGN, motion blur, and AWGN+Contrast noises were applied, respectively.

E. COMPARISON OF THE DEBLURGAN-CNN ARCHITECTURE AND OTHER APPROACHES
We selected two DeblurGAN-CNN architectures: DeblurGAN-MobileNetV2 and DeblurGAN-DenseNet121, to evaluate generalization ability on the other noisy datasets n-MNIST and THCC-67. Comparisons of results on the n-MNIST and THCC-67 datasets with the GAN-CNNs and other approaches are presented in Table 5 and Table 6. Table 5 compares the results between the proposed DeblurGAN-CNN architectures and other approaches on the n-MNIST dataset. As a result, the accuracy of the DeblurGAN-MobileNetV2 slightly outperformed the DeblurGAN-DenseNet121. The DeblurGAN-MobileNetV2 achieved the best accuracy on the n-MNIST dataset using AWGN and AWGN+Contrast noise methods.
The experimental results on the n-MNIST dataset showed that the optimal CNN-Hopfield network achieved an accuracy of 99.18%, 99.74%, and 97.53% when the AWGN, motion blur, and AWGN+Contrast noises were applied, respectively.
On the other hand, the DeblurGAN-MobileNetV2 achieved 98.93%, 99.36%, and 97.59% accuracies when applying the AWGN, motion blur, and AWGN+Contrast noises, respectively. Further, the DeblurGAN-MobileNetV2   architecture outperformed the optimal CNN-Hopfield network on the n-MNIST dataset when applying AWGN+ Contrast noise.
Undoubtedly, the DeblurGAN-CNN architectures demonstrated the highest accuracy performance compared with other methods on the n-MNIST dataset when AWGN+ Contrast noise was applied. Table 6 evaluated the DeblurGAN-CNN architectures on the THCC-67 dataset and compared them with the HOGFoDRs-SVM method. We showed that the proposed DeblurGAN-CNN architectures significantly outperformed the existing method by more than 10%. Consequently, we achieved only 80.68% accuracy with the DeblurGAN-DenseNet121.
We illustrated the misclassified characters recognized using the DeblurGAN-DenseNet121, as shown in Figure 8.
Also, there is still scope to increase the performance of this dataset. Indeed, the proposed DeblurGAN-CNN architectures could be applied to classify the noisy image datasets, even with the THCC-67, the unseen noisy dataset.

VI. DISCUSSION
We observed the training loss between the DenseNet121-TL-nDA and DeblurGAN-DenseNet121, as shown in Figures 9(a) and 9(b). The improvement of validation loss is shown in Figure 9(c). It can be seen that the training loss of the DeblurGAN-DenseNet121 is relatively low in the early epochs due to the transferring of pre-trained weights. The training loss of the DeblurGAN-DenseNet121 is always lower than the DenseNet121-TL-nDA.
As shown in Figure 10, we found that the DenseNet121 model with TL (DenseNet121-TL) achieved unsatisfactory performance when evaluated on the noisy images. The accuracy of DenseNet121-TL quickly dropped when the PSNR value was increased. The result shows that DenseNet121-TL-nDA obtained much better performance than DenseNet121-TL. However, the accuracy of DenseNet121-TL-nDA was quickly decreased when the PSNR value was higher than 20. Furthermore, the DeblurGAN-DenseNet121, when training using TL-nDA methods, achieved high accuracy even when the PSNR value was increased more to than 26, with an accuracy above 90%. We also discussed in-depth the proposed DeblurGAN-CNN architecture and the optimal CNN-Hopfield network on the n-MNIST dataset in terms of accuracy. Therefore, the optimal CNN-Hopfield network [53] outperformed our proposed architecture because the optimal CNN-Hopfield network is an ensemble method that combines many CNN outputs to achieve better recognition. The ensemble method has been reported to guarantees better accuracy in much published research [54], [55], [56]. On the other hand, the DeblurGAN-CNN architecture is a deep learning architecture that combines GAN and CNN architectures. So, only one output is recognized from the proposed architecture. Consequently, the optimal CNN-Hopfield network achieved an accuracy of 62%, 92%, and 97.52% when recognized using one, two, and three CNN models. In comparison, our proposed method achieved an accuracy of 98.93% using only one model and given an accuracy higher than 6% compared to the optimal CNN-Hopfield network that uses three CNN models.
Furthermore, finding texts that appear in natural scene images is challenging. To solve this challenge, object and scene text detection in the wild should be first applied to obtain the region of interest, which is the area of texts. Second, we could employ the DeblurGAN-CNN method to denoise and recognize the text in the natural scene images. This solution could enhance the recognition performance. In future work, we will concentrate on finding and recognizing text that appears in natural scene images.

VII. CONCLUSION
The performance of the handwritten character recognition systems decreases in consequence of many problems, such as handwriting styles, degradation of the documents, and noise appearance while transforming documents into a digital format. This research mainly focused on the denoise and recognition of noisy handwritten character images. Consequently, the robust generative adversarial network (GAN) combined with the convolutional neural network (CNN) architecture, called DeblurGAN-CNN, was proposed to synthesize new clean handwritten characters from noisy handwritten characters and recognition with improved handwritten character performance. For the CNN architecture, we combined two state-of-the-art CNNs: MobileNetV2 and DenseNet121, with the DeblurGAN, called DeblurGAN-MobileNetV2 and DeblurGAN-DenseNet121. The DeblurGAN-CNN architectures were trained using the transfer learning technique and applying the noisy data augmentation techniques to create a robust model. The most beneficial aspect of the DeblurGAN-CNN models was that they could learn and generalize from many noisy methods, including low resolution, additive white Gaussian noise (AWGN), low contrast, motion blur, and mixed noise.
To evaluate the denoise model, the DeblurGAN produced significant output that achieved a high peak signal to noise ratio (PSNR) and structural similarity index (SSIM) values. As a result, the DeblurGAN architecture could remove various noises from the noisy handwritten character images. For the accuracy performance, the results show that the DeblurGAN-CNN architectures generated strong handwritten character images and achieved the highest performance on the n-MNIST and n-THI-C68 datasets when compared with other existing methods. Also, both DeblurGAN-DenseNet121 and DeblurGAN-MobileNetV2 presented significant performance and outperformed the HOGFoDRs-SVM on the THI-C68 and THCC-67 datasets. The DeblurGAN-CNN architectures achieved an accuracy above 98%, 97.59%, and 80.68% on the n-THI-C68, n-MNIST, and THCC-67 datasets. Subsequently, the DeblurGAN-CNN architectures, which used the DenseNet121 and MobileNetV2 as the CNN architectures, achieved high handwritten character recognition performance with and without noisy handwritten characters.
In the future, we plan to work on the ensemble CNNs technique and combine the DeblurGAN-CNN architecture as a part of the ensemble CNNs technique [54], [55] to achieve much higher accuracy. Another direction for future work is creating new DeblurGAN-CNN architecture by searching for efficient CNN architectures with lightweight models. We will embed DeblurGAN-CNN with the recurrent neural networks (RNNs) [57] or vision transformers [11], [58] to recognize word and sentence images. Finally, finding the text from the natural scene images using the object detection methods [59], [60] and recognition by our DeblurGAN-CNN is also another direction we wish to pursue.