Channel Attention GAN Trained With Enhanced Dataset for Single-Image Shadow Removal

Even today, where many deep-learning-based methods have been published, single-image shadow removal is a challenging task to achieve high accuracy. This is because the shadow changes depending on various conditions such as the target material or the light source, and it is difficult to estimate all the physical parameters. In this paper, we propose a new single-image shadow removal method (Channel Attention GAN: CANet) using two networks for detecting shadows and removing shadows. Intensity change in shadowed regions has different characteristics depending on the wavelength of light. In addition, the image acquisition system of the camera acquires an image in a state where the RGB values influence each other. Therefore, our method focused on the physical properties of shadows and the camera’s image acquisition system. The proposed network has a structure considering the relationship between color channels. When training this network, we modified the color and added some artifacts to the training images in order to make the training dataset more complex. These image processing are based on the shadow model, considering the camera image acquisition system. With these new proposals, our method can remove shadows in all ISTD, ISTD+, SRD, and SRD+ datasets with higher accuracy than the state-of-the-art methods. The code is available on GitHub: https://github.com/ryo-abiko/CANet.


I. INTRODUCTION
When the light is blocked by some objects, the amount of light reflected from the target is reduced, and this becomes a shadow. It is very difficult to estimate the shadow-free image from the shadowed image since the property of the shadow changes by various conditions such as the material of the object, type of the light source, distance from the light source [1]. Images containing shadows degrade the accuracy of application methods such as face recognition, segmentation, and tracking [2]- [5]. Thus, removing shadows from images is an important task in computer vision.
In the early methods, shadow removal was based on a physical shadow generation model [8]- [11]. Those approaches require estimating various parameters such as reflectance for different materials and colors, light intensities for different locations, and direct and indirect light intensities. It is difficult to estimate these parameters accurately from a single image. Moreover, it cannot be said that the shadow models used in the papers are the perfect model of the actual shadow.
The associate editor coordinating the review of this manuscript and approving it for publication was Qingli Li . Therefore, even if the parameters can be estimated accurately, the performance of shadow removal based on the shadow model was limited.
On the other hand, the shadow removal method based on deep neural networks has recently produced good results [6], [7], [12]- [17]. There are two types of methods using deep neural networks, one is trained with a pair of shadow images and ground truth images [6], [7], [12]- [14], [16], [17], and the other is trained with a single image as a feature transformation task [15]. Although MaskShadowGAN [15] is a typical method that uses unpaired images for training, this method has a problem that images become blurred due to feature transformation, which has been improved by using GAN, but there is room for improvement. For methods that use paired images for training, the datasets (ISTD [6], SRD [12]) consisting of many paired images that are open to the public make it easier to train natural shadow features. In our method, we use these paired images for training.
In this paper, we propose a deep learning-based method that focuses on the correlation of each color channel of shadows. Shadows are strongly influenced by the wavelength of light [1], and the intensity of light that can be acquired FIGURE 1. Our network is divided into the shadow detection part and the shadow removal part. This structure is based on ST-CGAN [6] and we use the same structure in our previous method [7]. We train this network by using our enhanced dataset, which includes train images with artifacts.
by the camera changes depending on the color of the target. Therefore, in general, it is meaningless to estimate how weak the light intensity of the shadow region is without considering the color of the target. Considering the principle of shadow in nature, each channel should be trained individually so that the physical characteristics of each wavelength can be effectively trained. However, in the image acquisition system of the camera, the image is processed using the information of multiple channels when adjusting the white balance or changing the color space [18]. Since this process mixes the information of each color channel, it is not effective to perform training for each color channel, and it is necessary to consider the correlation of each RGB channel and train them together. Actually, we experimentally found that estimating the RGB color channels separately causes a decrease in the accuracy of shadow removal. In our method, good results were obtained by the newly proposed network structure which considers the correlation between each channel.
In the ISTD and SRD datasets, there is a bias in the situation when collecting training images, and the number of the images is insufficient for general-purpose shadow removal. Therefore, we added some artifacts to the training image based on the shadow model so that the training images could have a more widespread distribution of features. By randomly changing the parameters for adding artifacts during training, we became able to train the network with images with a wide range of features.
Furthermore, by training the network using a loss function focusing on the boundary between the shadow region and the non-shadow region, a natural image can be generated.
The proposed method is based on our previously published method [7]. The overveiw of our method is shown in Fig. 1. The point that the proposed method uses two networks for shadow detection and shadow removal is the same as our previous method. However, in this method, the dataset for training is processed so that more diverse features can be trained. We also devised the network structure based on theory and experiment to obtain better results. As a result, the proposed method is more versatile than the past method that left good results only for specific images. The training is performed using the ISTD+ dataset [14], and the result from the fine-tuned network using the ISTD [6], SRD [12], and SRD+ [14] datasets are also proposed for comparison with other methods. Our method produces better results in RMSE, PSNR, and SSIM than the state-of-the-art methods for all datasets. The main contributions of our method (Channel Attention GAN: CANet) are summarized below: • We proposed a new deep learning-based method for shadow removal, focusing on the correlation between each color channel.
• In order to obtain more complex training images, artifacts are added to the training image based on the shadow model.
• By using the loss function that focuses on the boundary of the shadowed area, it is now possible to generate a more natural image.
• For all ISTD, ISTD+, SRD, and SRD+ datasets, our method shows better results than the state-of-the-art methods.
The studies and theories underlying our method are given in section II and III. Section IV describes the details of our proposed method, and section V describes a comparison of experimental results between our method and the other methods.

A. PHYSICAL MODELING-BASED METHODS
In traditional methods, shadow removal is performed by formulating a physical model of the shadow and performing an inverse operation based on these models [8]- [11]. In these methods, to solve the problem more easily, some parameters of the shadow model are regarded as fixed parameters, and approximated expressions are used. Since the shadow model is not sufficient to represent natural shadows, or the image processing is performed using an approximated shadow model, it has been difficult to accurately restore colors even if the assumed parameters can be accurately estimated. A method that requires user-guided information has been proposed [11], but since giving prior information by the user increases the time and effort of the user and makes it impossible to process a large number of images, shadow removal by a single image is an important task.

B. DEEP LEARNING-BASED METHODS
Shadow removal methods based on deep learning have produced better results than traditional methods. The release of large-scale datasets ISTD [6] and SRD [12], which includes pairs of shadowed and non-shadow images, makes it easier to train networks, and many deep learning-based methods using these datasets have been proposed. Some deep learning-based VOLUME 10, 2022 method trained with unpaired images was also proposed [15], but there are problems such as some textures being lost in the process of feature translation. Regarding the methods which use paired images for training, Qu et al. [12] proposed a method that predicts the ratio of the amount of change in pixel values in the shadow region (DeshadowNet). Along with this method, they released the SRD dataset consisting of 3088 paired images, which include shadowed and nonshadow images. Wang et al. [6] proposed a method that performs shadow detection and shadow removal in different networks (ST-CGAN). They also released the ISTD dataset consisting of 1870 triplet images, which include shadowed, non-shadow, and shadow mask images. Our proposed method is based on ST-CGAN. Hu et al. [19] proposed a shadow detection method (DSC), focusing on the direction of shadows. In their later study [13], the results of shadow removal using this method are also shown. SID [14] was proposed by Le et al., whose method presumes parameters in a shadow model by using neural networks. In addition, they proposed a dataset called ISTD+ with adjusted color tones of the ground truth images. Cun et al. proposed a method called DHAN [16] that focuses on two layers, the feature layer and the attention layer. They also proposed a network (SMGAN) for generating artificial shadow images in order to train the DHAN with more various images. The shadow removal method proposed by Fu et al. (AEF [17]) generates fusion weight maps for multiple over-exposed images. They also proposed a post-processing network (boundary-aware RefineNet) to reduce artifacts at the boundary between the shadow area and the non-shadow area. We also have already proposed a shadow removal method [7] based on ST-CGAN and the proposed method in this paper is based on our previous method.

A. SHADOW MODEL
In this paper, we use the same shadow model as Shor et al [20]. The light intensity I at the position x is described as below: where L is the illumination and R is the reflectance. Note that since I , L, and R depend on the wavelength λ and position x, they are expressed as a function of λ and x. Looking at this equation in more detail, it can be divided into two terms related to direct light and indirect light as shown in the following equation.
where L d is the direct illumination and L a is the indirect illumination. In the shadowed region, L d should be 0 because the direct light is blocked by some objects. Therefore, the intensity I shadow in the shadowed region can be expressed as: We use these equations for generating new training images and artifacts. The detail of our method is explained in section IV.

B. IMAGE PROCESSING IN IMAGE ACQUISITION SYSTEM
When the camera acquires an image, various processes are performed on the obtained data. Demosaicing, noise removal, white-balancing, color space conversion, gamma curve application, etc. are performed [18]. For example, the gamma curve application is expressed by the following equation: Many of the processes which are performed in the image acquisition system are non-linear processes and can be a problem in image processing based on physical theory. Therefore, when artificial shadow images are generated based on a physical model, more realistic shadow images can be generated by considering the influence of these nonlinear processes.

IV. PROPOSED METHOD
Our method is a learning-based method that removes shadows from a single image. We show that shadows can be effectively removed by the proposed network structure and the extended training dataset.

A. OVERVIEW
Our method is divided into two parts, the shadow detection network G D and the shadow removal network G R . This structure is based on ST-CGAN [6], and the overview of our method for training is shown in Fig. 2. Since our method is a GAN-based method, discriminator D D and D R are defined for the generators G D and G R . The input to the G D is only the shadow image, and the output of G D is the estimated shadow mask. For the discriminator D D , the total of 4 channels of shadow image and shadow mask is the input. On the other hand, the input to the G R is a total of 6 channels, which includes the shadow image, and the element product of the shadow image and the shadow mask generated by the G D . For the discriminator D R , the input data for G R and the generated image by the G R are used as the input data. This method is the same as our previous method [7].

B. NETWORK STRUCTURE
The network structure of G D and G R is based on Unet++ [24]. It consists of multiple nodes, upsampling, and downsampling. Bilinear upsampling with 2× magnification is used for the upsampling, and the max pooling layer is used for the downsampling. Each node is a residual block consisting of convolutional layers, batch normalization layers [22], mish activation function [21], and scSE block [23]. Details of the node structure are shown in Fig. 3b.
The shadow detection network G D has an additional structure after the UNet++ based network to limit the output to the range 0 to 1 (Fig. 3a). In G R , which is the shadow removal network, ColorBlock (Fig. 3c) that weights each color channel is added after the UNet++ based network. G R is designed to learn the difference between the input image and the ground truth image.
Since G R is a network that estimates the difference between the input image and the ground truth image, the content that G R should estimate can be written as follows based on Eq.2, 3: The function P indicates the image processing during the camera's image acquisition system. For the sake of simplicity, we do not consider P in this sub-section. Since the image has only three color channels composed of R, G, and B, it can be approximated that the L d (x, λ) of Eq.5 is not a function of λ but a function of R, G, and B. In addition, assuming that the intensity of the direct light from the light source is constant for the entire image, L d (x, λ) will not be a function of x but will be a constant value in each color channel. From the above, Eq.5 can be approximated as Eq.6: Since the ColorBlock has the role of weighting each color channel with a specific value as the final process, it can be interpreted as estimating L d R , L d G , and L d B for each image. By estimating the weight of each channel using the fully connected layer, the value of L d R,G,B can be estimated while being influenced by each other channel. ColorBlock is effective for training the relationships between color channels because the convolutional layer cannot efficiently train the relationships between channels. By training L d R,G,B and R R,G,B (x) individually, it is expected that our network is trained as a robust estimation network for changes in image tint.
The discriminator consists of convolutional layers, batch normalization layers, and Mish layers. These layers are connected in sequence. By setting the stride of the convolutional layer to 2 for every two convolutional layers, the size of the final output is 16 × 16. The L2 loss is used to calculate the loss of the discriminator.
Generator G D is the network for shadow detection. The shadowed image I input consists of 3 RGB channels input.   Fig.3b. Specially when y = 0, the Channels_in will be 16 × 2 x . As a more special case, Channels_in depends on the number of channels of input data when x = 0, y = 0.
The shadow mask M est can be estimated as follow: where θ G D is the weight of the generator G D . In order to train G D , we have to minimize the loss function G D is the pixel loss and calculates the L2 norm between two images. Pixel loss calculates the difference between the generated shadow mask M est and the ground truth shadow mask M gt on a pixel-by-pixel basis. Applying MSE loss when using GAN is useful to prevent the gradient vanishing problem [25]. L MSE G D is expressed as: where N is the batch size. L adv G D is the adversarial loss calculated using the discriminator D D . Networks trained using only pixel-by-pixel MSE loss are trained towards an average solution across the dataset [26], which is likely to produce blurred or unnatural images. To solve this problem, a network structure called Generative Adversarial Networks (GAN) consisting of a generator and a discriminator was proposed [27]. Adversarial loss is used to train these networks. By applying this loss, it becomes possible to train the network to generate a natural image that more closely matches the image distribution of the training data set. L adv G D is expressed as: where V real is a random valued matrix that follows Gaussian distribution with an average of 0.5. L feat G D is a feature loss calculated using our shadow removal network G R . The purpose of applying this loss is to adjust the output of G D so that G R can generate more realistic images. The best shadow removal performance may not always be achieved when the ground truth mask image is used as the input to G R . This is because the ground truth mask image does not always represent the shadow part correctly and might be misaligned. Therefore, instead of training the two networks completely separately, we train them while influencing each other. L feat G D is expressed as: where θ G R is the weight of G R and it is fixed while training G D .
Using the Eq.8-10, the loss function L G D (θ G D ) is expressed as follows: where b is the number of the batches trained. If the influence of L adv G D is too large, it may lead to a decrease in the RMSE of the final output image, so L adv G D is set to reduce the weight as the training progresses. These weights are determined experimentally.

2) GENERATOR G R
Generator G R is the network for shadow removal. The input data is 6-channel data obtained by concatenating the input image and the additional image given by multiplying the input image and the estimated shadow mask. The shadow-removed image is generated as follows: Similar to G D , the loss function L G R (θ G R ) needs to be minimized to train the generator G R . The loss function L G R (θ G R ) consists of three loss functions L MSE G R , L adv G R , and L edge G R . L MSE G R and L adv G R are applied in L G R (θ G R ) for the same reason as described in section IV-C1. They are expressed by the following formula: 12326 VOLUME 10, 2022 where D R is the discriminator for G R . The difference between L G R (θ G R ) and L G D (θ G D ) is the adoption of a new loss function L edge G R . L edge G R is a loss function that focuses on the boundary between the shadow and non-shadow regions. In order to calculate this loss function, we prepared a boundary mask M edge which emphasizes the boundary part of the ground truth mask image M gt . M edge is calculated by the following formula: where Gauss() is the Gaussian filter (σ = 4) and Edge() is the Sobel filter. An example of M gt and M edge is shown in figure 4. L edge G R is calculated by using M edge as follows: Applying L edge G R made the boundary between the shadow area and the non-shadow area natural, which greatly contributed to the improvement of shadow removal performance. The effectiveness of L edge G R is described in section V-C. Using the Eq.12-16, the loss function L G R (θ G R ) is expressed as follows: where b is the number of the batches trained. We set these weights experimentally.

3) DISCRIMINATORS
Discriminators D D and D R are one of the components of GAN for calculating adversarial loss. As mentioned in section IV-B, L2 loss is used for training the discriminators. The loss function L D D (θ D D ) for discriminator D D and L D R (θ D R ) for D R are calculated as follows: V real is a random valued matrix that follows Gaussian distribution with an average of 0.5 and V fake is the same as V real but its average is −0.5. We train our discriminator D D and D R by minimizing L D D (θ D D ) and L D R (θ D R ).

D. TRAINING DATASET
We used the ISTD+ [14] dataset to train our proposed network. The ISTD+ dataset is a color-corrected dataset, based on the ISTD dataset [6]. The ISTD+ dataset includes 1870 triplet images, and it is divided into 1330 triplet training images and 540 triplet test images. In our method, all images were randomly processed at each epoch in order to increase the complexity of the training dataset. We applied flipping, rotating, scaling and clipping, changing colors, and inserting artifacts. When applying changing colors and inserting artifacts, we propose a new method based on the camera's image acquisition system [18] and shadow model so that we can generate more natural images. The execution result of each image processing is shown in Fig. 5. The effectiveness of this enhanced training dataset is shown in Sec. V-C.
The image is enlarged by a factor of 2 using the Nearest neighbor method and clipped at random positions to be the same size as the original image. This process is applied with a probability of 33% during the training.

4) CHANGING COLORS AND INSERTING ARTIFACTS
In order to increase the variety of the training dataset, it is necessary to acquire images that cover various situations such as many kinds of target objects and light sources. However, it is quite time-consuming and difficult to collect images by changing the target object and the light source in the real world. Therefore, in our method, we modified the features of the captured object based on the shadow model to increase the diversity of the training dataset. From Eq.1, we can understand that R(x, λ) changes when the target objects change. Since L(x, λ) is a parameter related to direct light and indirect light, it is not affected by the target object. From Eq.2 and 3, the ratio of the light intensity in the shadow region between the shadowed image I input and the non-shadow image I gt can be considered as follows: From the above equation, we can see that the ratio of the light intensity in the shadow region does not depend on the type of the target object, but depends on the parameters related to the light falling on the object. Therefore, if we consider that the object has been changed while the light parameters remain the same, we can generate a new shadow region using the following equation: In the ideal state, the non-shadow area should show the same value for I input and I gt , so the whole image I input_new can be expressed as follows by applying Eq.23.
where I gt_new is the new ground truth light intensity. Our new I gt_new is based on I gt . When creating I gt_new , each color channel was multiplied by a random coefficient, and the color of a part of the image was randomly changed to add an artifact. The artifacts are incorporated to train the situation in which there are many objects in the shadow region. It should be noted that Eq.24 is an equation expressing the intensity of light. The images we handle are converted in a nonlinear color space to make them look natural to the human eyes [18]. Therefore, when image processing based on the shadow model is performed in a non-linear color space, an unnatural shadow image may be generated. In our method, to reduce the unnaturalness of the synthetic image generated in an inappropriate color space, the color space is converted before the image processing is performed. Specifically, before generating an image based on Eq.24, the color space of the image is converted using the following equation: After performing image processing in a new color space, reverse processing is performed to return it to the original color space. This series of processing is applied with a probability of 60% during the training.

E. ROTATE AVERAGING PROCESS
Since our network is not rotation-invariant, the results may change as the rotated images are processed. Our method uses a rotation averaging process that averages multiple output images generated from multiple rotated input images to achieve improved results. We have used the same process in our previous method [28], and this process can be used as a general-purpose process.

F. IMPLEMENTATION AND TRAINING
Our method is implemented using Python and PyTorch [29]. The training is done using the ISTD+ dataset [14]. The proposed networks are optimized using Adam optimizer [30]. The input image has been resized to 256 × 256 with nearest-neighbor interpolation and has been randomly pre-applied with image processing as shown in Sec.IV-D. The batch size is set to 4 and the initial learning rate of the Adam optimizer is set to 0.0002. Each network is trained by 3000 epochs, and the weight with the lowest RMSE for the test data is used as the result of this paper. In addition, fine-tuning [31] was performed using multiple data sets for comparison with other methods. The datasets used for fine-tuning are ISTD [6] (including 1330 training images), SRD [12] (including 2680 training images), and SRD+ (including 2680 training images). The image adjusting method [14] proposed by Le et al. is also applied to the SRD dataset to create the SRD+ dataset. For fine-tuning, the dataset was changed and trained for 100 additional epochs, and the weights with the best results were used as the results in this paper.

A. EVALUATION METRIC
We use RMSE in the Lab color space for quantitative comparisons, similar to the conventional methods. As SID author Le [14] mentions on their project page, the evaluation code used in the traditional methods including SID [14] calculates MAE instead of RMSE. Therefore, in this paper, instead of comparing with the numerical values described in the papers of the conventional methods, the results were recalculated using the experimental result images published by the author of each method. For the color space conversion, we used the rgb2lab function available in Python's scikit-image, which converts sRGB colorspace (IEC 61966-2-1: 1999) to CIE Lab colorspace. In the Lab color space, the range of values that can be taken for each channel is different. Since each Lab channel shows a different dimension of information, our paper shows the value obtained by calculating RMSE for each channel and finally adding them together. This paper also describes the results of comparisons using PSNR [32] and SSIM [32] in the non-converted sRGB color space. The code for quantitative comparison is available on our GitHub page (https://github.com/ryo-abiko/CANet).

B. SHADOW REMOVAL EVALUATIONS
In this section, we compare our proposed method (CANet) with other notable methods, including Yang [9], Guo [10],  [17] did not publish the results from SRD+.
Quantitative evaluation is performed using ISTD, ISTD+, SRD, and SRD+ datasets. As in the conventional methods [16], [17], individual network weights are prepared for each dataset to evaluate the shadow removal performance. In this paper, the comparison method is quantitatively evaluated using the shadow removal result images published by the authors of each paper. Therefore, the shadow removal results which are not open to the public are not compared in this paper. For the results of Yang [9], Guo [10], and Gong [11], we used the results published by the authors of ST-CGAN [6]. The results are shown in Table 3 for the ISTD+ dataset, Table 4 for the ISTD dataset, Table 5 for the SRD dataset, and Table 6 for the SRD+ dataset.
Since our method is trained using the ISTD+ dataset, we first check the results in Table 3. Table 3 shows that our method performed best for RMSE in the Lab color space and also for SSIM in the sRGB color space. Also from Table 4, 5, and 6, we can see that our method outperforms the conventional method in all categories for the ISTD, SRD, and SRD+ datasets. To allow for easy comparison, Table 2 shows the average value of the results from the ISTD, ISTD+, and SRD datasets. SRD+ is not used for this statistic because the author of AEF [17] did not publish the results from them. This result shows that our method does not have good performance only for a specific dataset, but can be generically applied for various datasets.
For qualitative evaluation, we use the ISTD+ and SRD+ datasets. The results for ISTD+ are shown in Fig. 6 and the results for SRD+ are shown in Fig. 7. From Fig. 6, we can see that our method can remove shadows even when there are areas where the color is monotonous or where the color changes significantly in the shadow regions. This result shows the effect of training on a dataset that has been extended to include a variety of colors and artifacts. In addition, even in a smooth region, the boundary between the shadow region and the non-shadow region is not noticeable. It is considered that this is because the training was performed using the edge loss, and the network was focused on the boundary of the shadow region. Since our method generates a shadow mask once, it is possible to clarify the area where  the shadow should be removed. From Fig. 7, we can see the advantage of the structure of our network, and we can confirm that shadow removal can be done with little color change in the non-shadow areas.

C. EFFECTIVENESS OF OUR NEWLY PROPOSED COMPONENTS
In this paper, we mainly proposed three new elements.
(1) We adopted a structure called ColorBlock to effectively R. Abiko, M. Ikehara: Channel Attention GAN Trained With Enhanced Dataset for Single-Image Shadow Removal  train the correlation between color channels. (2) To increase the complexity of the training dataset, the color of the training image was randomly changed and artifacts are added. (3) We introduced Edge Loss, which is a loss function focusing on the edge of the shadow region.
In order to confirm the effectiveness of these new proposals, the results without applying these proposals are shown in Table 7. The transition in shadow removal performance for the test dataset during the training is also shown in Fig.8. The horizontal axis in Fig.8 shows the number of epochs of training, and the vertical axis shows the minimum RMSE of the shadow removal results for the test dataset at that point. As can be seen from Table 7, the results are improved when each method is applied. In particular, we can see that the enhancement of the training dataset has a significant effect. Fig.8 also shows that when the training dataset was not extended, the network was overtrained and did not produce good results.

D. RUNNING TIME
We used different environments for network training and evaluation. It is because the GeForce RTX 2080 Ti GPU ran out of memory to train our proposed networks in batch   (3) is Edge Loss.  4. Our training environment consists of Intel(R) Xeon(R) CPU E5-1620 v4 @ 3.50GHz, 64 GB RAM, and GeForce GTX 1080 Ti. It takes about 320 hours to train our proposed networks for 3000 epochs. Our evaluation environment consists of Intel(R) Xeon(R) Silver 4208 CPU @ 2.10GHz, 384 GB RAM, and GeForce RTX 2080 Ti. It takes about 0.13 seconds to process a 256 × 256 image.

E. DATABASE DEPENDENCY
When we use SRD and SRD+ datasets for the training, our method sometimes produces blurred results. The blurred image produced by our method is shown in Fig.9. From the qualitative evaluation, this tendency is not observed when using the ISTD and ISTD+ datasets, so these results may depend on the characteristics of the training data.
However, the qualitative evaluation shows that our method performs better than the comparison methods, indicating that the shadow removal is sufficient.

VI. CONCLUSION
In this paper, we proposed a new method called CANet for single-image shadow removal. Our method is a combination of a network that detects shadows and removes shadows. By introducing ColorBlock, which can learn the correlation for each channel effectively, the accuracy of final shadow removal images has been improved. In addition, in order to increase the complexity of the training data set, the color of the training image was changed and artifacts were added while considering the shadow model and the image acquisition system of the camera. By using Edge Loss, which focuses on the edges of the shadow region, we have achieved natural shadow removal even in regions with little texture. Since our method uses two Unet++ based networks, the number of parameters is large. However, within the scope of our study, this is the network structure that gives the best performance. As a result, our method outperforms existing methods in multiple datasets, both quantitatively and qualitatively.