Single-Image Reflection Removal Using Deep Learning: A Systematic Review

Images captured through the glass often consist of undesirable specular reflections. These reflections detected in front of the glass remarkably reduce the quality and visibility of the scenes behind it. The process of reflection removal from images through the glass has many important applications in computer vision projects. Recently deep learning-based methods are being utilized for reflection removal so widely. In this article, we proposed a systematic literature review on the topic of single-image reflection removal using deep learning methods which were published between the years 2015 to 2021. A total number of 1600 research papers were extracted from five different online databases and digital libraries (IEEE Xplore, Google Scholar, Science Direct, SpringerLink and ACM Digital Library). After following the study selection procedure, 25 research papers were selected for this systematic review. The selected research papers were then analyzed to answer 7 key research questions that we have come up with to comprehensively explore the use of deep learning and neural networks for single-image reflection removal. After reading this article, future researchers will have a solid idea in the research field and will be able to work on their own research. The results provided in this proposed systematic review illustrate the main challenges that are encountered by researchers in this field and recommend encouraging directions for future research work. This review will also be helpful for researchers in discovering accessible datasets that can be used as benchmarks for comparing their proposed deep learning techniques with other studies in this research area.


I. INTRODUCTION
When we capture images through transparent material especially glass, the images frequently consist of some undesirable reflections.These reflections will lower the quality and visibility of the images [1], [2].Some photographers may make a dark condition or modify the location of the camera but this approach is not efficient for reflection removal due to the limitations on space.These reflections have two main problems: first, they will lower the quality of the scenes behind the glass and second, they have some effects on the results of applications like segmentation or classification [1].

Therefore, this problem may cause most computer vision
The associate editor coordinating the review of this manuscript and approving it for publication was Tomasz Trzcinski .
algorithms presumably to fail.The aim of reflection removal is to improve the visibility of the scene behind the glass while removing the reflections.Reflection removal is one of the most challenging projects in computer vision and this originates from its ill-posed character.The features of the background and reflections are so much similar.Therefore, these similarities will make it more complicated to remove the reflections and retrieve the scene behind the glass [2].Although it is easy for the human vision system (HVS) to identify and distinguish the transmitted scene from reflection but it is a complicated task for computers even if blurry and ghosting artifacts are easy to notice [5].In recent years, many approaches are proposed as a solution for reflection removal and as it is shown in figure 1, they are categorized into two groups: conventional mathematical methods (non-learning methods) and deep learning-based algorithms.Non-learning methods used handcrafted priors as a solution for this problem.[3].Many of these proposed non-learningbased reflection removal algorithms work under special circumstances.In other words, these non-learning solutions are efficient when their assumptions are fully satisfied [2].Most of the time these non-learning methods are based on the presumption that the image taken through the glass is a linear combination of transmitted scene and reflection [5].
In some conditions, when the design, patterns and properties of the background scene show similarities to those of the reflections, it is difficult for non-learning-based algorithms to remove the reflections and retrieve the background at the same time [4].And also when a non-learning algorithm tries to solve the linear equation of background B and reflection R without any prior knowledge, it will face an infinite number of solutions for the processing of the single image [5].Reflection removal approaches could also fall into 2 main categories: Multiple image reflection removal and singleimage reflection removal.Multiple image reflection removal approaches can be classified into four different groups: • Multiple polarized images • Focus and defocus image pairs • Flash and non-flash image pairs • Video sequences More details are shown in figure 2. Mobile phone users and photographers tend to take a single image instead of taking a series of images because multiple image reflection removal may not be possible due to the limitations.Many different models have been proposed to remove the reflections and these models are efficient to some extent but they still have limitations [5].According to the promising results of deep learning in high-level and low-level computer vision tasks, its detailed modeling capability also profits reflection removal projects.Recently deep learning-based algorithms are being proposed to capture the reflection properties more efficiently and comprehensively.Deep learning-based algorithms which have been proposed recently, illustrate enhanced modeling capability that detects a wide range of reflection image characteristics [4].Deep learning algorithms rely on mapping images to high-dimensional features using deep learning techniques.On other hand, conventional methods work based on handcrafted features and complex mathematical analysis like edges and gradient [5].
Fan et al. [6] presented the first approach which uses a convolutional neural network (CNN) for the single-image reflection removal problem.With two cascaded networks, they first utilized a CNN to predict edges of the background layer.This predicted edge is used as guidance for reconstructing the background layer by the second CNN.The semantic structure is not considered because they use pixel-wise loss function in the procedure of training.Generative Adversarial Networks (GAN)-based algorithms and techniques have also shown distinguished results in reflection removal problems.PL Net [7] was proposed by Zhang et al. which is trained by exclusion loss, adversarial loss and loss function composed of feature loss.Their network and loss function concentrate on both low and high-level image information.Yang et al. [8] proposed a method that is based on the prediction of the reflection layer and background layer.ERR Net [9] was proposed by Wei et al. which is able to be trained by misaligned data.[1].In addition to these papers that we mentioned in the current paragraph, many other research papers are explored in this systematic review to find comprehensive answers for our proposed questions.
In the end, our proposed Systematic Literature Review (SLR) intends to collect, analyze and introduce research papers in the field of single-image reflection removal using deep learning.Our systematic review also illustrates research directions for new researchers in the reflection removal field by emphasizing weak areas of existing reflection removal algorithms based on deep learning that needs further explorations.In this SLR, we also examined available datasets/databases used by researchers for validation of their proposed algorithms.Different algorithms of deep learning for single-image reflection removal were also examined.Due to the fact that the number of published research papers in the area of reflection removal task is huge, we limit this SLR to address only research papers that utilized deep learning algorithms.Therefore, our paper is the first systematic literature review that focuses on the topic of single-image reflection removal using deep learning techniques.
The rest of this systematic review is arranged as follows.Section II is a discussion of the utilized research methodology in this SLR.This section includes planning, conducting and reporting.Section III is a comprehensive analysis of the considered studies which discusses the results and findings of our studies.Our SLR is concluded in section IV and section V is acknowledgment.Finally, section VI is the references.

II. RESEARCH METHODOLOGY
The review conducted in this article is based on a systematic review presented by Kitchenham and Charters method [10].A Systematic Literature Review (SLR) is a very comprehensive method for evaluation and interpretation of all available research related to a particular research question or field.SLR is a well-defined, trustworthy, rigorous, and auditable way of presenting a fair assessment of a research topic [10].In this systematic literature review, we followed Kitchenhamm and Charters [10] guidelines.As the presented method in Kitchenham and Charters recommends, this SLR has been done in three phases: Planning, conducting and reporting.Each phase is divided into several sub-sections that we have explained below.

A. PLANNING 1) A NEED FOR SYSTEMATIC REVIEW
Early literature reviews show that to the best of our information, no systematic literature review had been published so far in the field of single-image reflection removal using deep learning.Furthermore, our conducted survey is one of the earliest reviews conducted so far in this field.Therefore, we believe it is the best time to collect and analyze the available related research projects in the domain.

2) DEVELOPMENT OF THE RESEARCH PROTOCOL
The review protocol is the most significant difference between a systematic review and a traditional review [10].In the process of conducting this systematic review, we searched for related articles in several databases.Secondly, we reduced the number of articles by applying quality assessments, inclusion and exclusion criteria.Finally, some research questions were used to propose this SLR.

3) RESEACH QUESTIONS
The major objective of our proposed systematic review is to identify and inspect research papers that used deep learning for single-image reflection removal.Related research papers were searched in various databases to reach this objective and the following research questions arose.Table 1 presents the research questions and their descriptions.

4) INCLUSION AND EXCLUSION CRITERIA
Raising some inclusion and exclusion criteria guarantees that merely research papers that are related to our topic are being studied.We set up 5 inclusion and 5 exclusion criteria.The inclusion, exclusion are shown in the following Tables 2 and 3. Our search based on keywords end in 1600 research papers relevant to our topic.Note that we removed all of the duplicate papers at first.After a complete review of the research papers, we included papers according to the criteria mentioned in Table 2. Then we excluded the rest of the papers according to the criteria mentioned in Table 3.Eventually, we removed some papers based on quality assessment rules which are explained later in this paper.

5) QUALITY ASSESSMENT RULES
Quality Assessment Rules (QARs) is a concept based on the principle to decide about the overall quality of the selected research articles [59].Applying these criteria will be helpful to select the most relevant research articles with high quality.The 10 quality assessment rules applied to the selected research articles are mentioned in Table 4.Each of these QARs is worth 1 point out of 10.We determined the points of each question as follows: when completely and comprehensively answered points = 1, when the answer is above normal points = 0.75, when the answer is average points = 0.5, when the answer is below average points = 0.25, and when the question is not answered at all points = 0. Finally, we add all of these points and the summation shows the points of each research paper.The points of 7 or more express that we included that research paper in this systematic review.

B. CONDUCTING
In the second phase, we searched online databases and digital libraries using search strings and keywords, in order to find the most related articles and research studies for systematic review.This phase includes extracting research articles and selecting those according to inclusion and exclusion criteria and also quality assessment rules.More details are illustrated in figure 3.
The first step in conducting phase is to search through digital libraries, databases and scientific search engines in order to explore the most relevant articles.For this aim, we searched through the most popular online databases and digital libraries such as IEEE Xplore, Science Direct, Springerlink, Google scholar and ACM Digital Library.The terms selected for this search were: With the keywords defined above, multiple keywords and search strings that are made with logical operators were constructed in order to enhance the search results.
The multiple keywords and search strings used for the SLR are mentioned below: • ''reflection removal'' AND ''deep learning'' • ''reflection removal'' AND (''deep learning'' OR ''neural network'') • ''reflection removal'' AND ''CNN'' • ''reflection removal'' AND ''convolutional neural network'' • ''reflection removal'' AND ''deep neural network'' • ''reflection separation'' AND ''deep learning'' • ''reflection separation'' AND ''deep neural network'' • ''reflection separation'' AND ''convolutional neural network'' • ''reflection separation'' AND ''CNN'' Moreover, we manually searched through other sources including reference lists and citations of the selected studies.The publications based on search strings and keywords were made on December 9, 2021, totaling 1600 research papers.As it is illustrated in figure 3, after applying inclusion and exclusion criteria, quality assessment rules and removing duplicate items, finally there were remaining 25 research papers.These 25 papers chosen for this SLR are dependable as they all existed in high-impact digital libraries.The authors of these 25 papers have a significant reputation in the reflection removal research field.On the other hand, these selected papers have a good number of citations.The citation count of selected research papers is shown in figure 4.

C. REPORTING
The last and final stage of our proposed systematic literature review is reporting the results.In this final step, we studied all the considered research articles comprehensively and analyzed them in order to answer the 7 key research questions that are mentioned in Table 1.Finally, we report our findings in section III which is results and findings.The analyzed and considered research articles and the number of their citation (according to the Google Scholar database) are mentioned in Table 5.The given ID in Table 5 will be utilized whenever one of these papers is needed to be mentioned for the rest of our systematic review.

III. RESULTS AND FINDINGS
After reviewing and examining all the related research articles, we provide the following answers to all research questions which were mentioned in Table 1.
In order to answer question RQ1, we analyzed and examined all the 25 research articles with a high focus on the distribution of publications per year.Figure 5 illustrates the number of published research articles between 2015 and 2021.According to Figure 5, there has been a growth in using deep learning techniques for reflection removal in the last seven years.The highest number of articles were published in the year 2019 over the last 7 years.Table 6 shows the publishing year of each selected article in this review.
In figure 6, the number of published research articles is presented per country.Most of the studies were from China with 11 published articles followed by researchers from Singapore with 4, South Korea with 3 and Taiwan with 2 articles.We also found only one research article followed by researchers from Australia, Japan, India and the United States.This fact shows that researchers from these countries have been focusing more on single-image reflection removal via various deep learning techniques.Figure 7 indicates the study distribution per type.28% of the articles were published in the journal while 72% were published in conferences.As it is obviously significant, the percentage of published research articles in conferences is larger than the percentage of published research articles in journals.Most of the considered research articles in this systematic review are published in well-known journals and conferences like CVPR, ICCV and ECCV.According to this merit of the selected articles, we strongly believe that this systematic review can be beneficial in order to find the latest trends for future research studies in the field of reflection removal.

B. RQ2. WHAT DATASETS/DATABASES ARE USED TO TRAIN THE PROPOSED DEEP LEARNING MODEL IN CONSIDERED PUBLICATIONS?
To answer the question RQ2, this section provides extracted information from the considered research papers about the datasets used for training the reflection removal algorithms.According to the data-hungry character of deep learning algorithms, the availability of a large dataset containing reflection images is so necessary for the procedure of training the algorithm [12].In order to train and test every deep learning technique, many different datasets were used in considered research articles.Most researchers used publicly available datasets to train and test their proposed algorithm.Due to the fact that it is not an easy process to gain a great number of images containing reflections, a substitute way is to synthesize reflection images via two images without reflections as background and reflection [12].This method has been utilized in a great number of previous mathematical techniques for quantitative evaluations [5], [40], [60], [61], [62].In the process of synthesizing the dataset, they randomly flipped the images and another image is utilized as a background layer and another image is utilized as the reflection layer [1].After reviewing all of the considered articles, we summarized the datasets which were used with the purpose of training the deep learning algorithm for each research paper in Table 7.The results of our survey show that researchers tend to use synthetic datasets for training their proposed algorithm rather than using real-world reflection removal datasets.This may be because of the shortage of real-world datasets in this research field.Synthetic datasets have been used in almost all of the selected research papers for training the deep learning technique.In most cases, the synthetic dataset is developed by adding two random images from one or two publicly available datasets.More information is available in Table 7.

C. RQ3. WHAT DATASETS/DATABASES ARE USED FOR CONDUCTING EXPERIMENTS ON SINGLE-IMAGE REFLECTION REMOVAL?
To answer the RQ3, we have extracted all of the datasets in which researchers evaluated their proposed deep learning technique and conducted experiments on them in the considered articles.After this process, we have summarized our results in Table 8.In Table 8, the IDs of the publications and the list of benchmarking datasets are mentioned.Due to the extracted information from all of the considered articles, the experiments mostly use the SIR 2 benchmark dataset to evaluate their proposed deep learning algorithm for single image reflection removal.Single Image Reflection Removal dataset 'SIR 2 ', proposed by Wan et al. [38] contains a total of (20 + 20) × (7 + 3) × 3 + 100 × 3 =1500 images.SIR 2 dataset contains 40 controlled indoor scenes with complicated appearance and quality.Each of these 40 scenes consists of a triplet of images (compound image, background and reflection) under three controlled thicknesses of transparent material and seven different depths of field.SIR 2 dataset also contains 100 wild scenes with various thicknesses of transparent material, uncontrolled lightings and different camera settings [38].These images provided in SIR 2 are all captured via a NIKON D5300 camera with a 3 × 10 2 millimeter lens.The camera settings were all in manual mode.Also, all of the captured images have a resolution of 1726 × 1234.In the process of capturing a triplet of images in the SIR 2 dataset, Wan et al. [38] captured the mixture image through the transparent material at the first step.The mixture image contains both the background and the reflection.At the second step, they captured just the reflection using a piece of dark paper at the back of the transparent material.At the third step, they captured just the background image by taking the glass away [38].Due to Table 8 presented for this section, 64% of the considered research articles used the SIR 2 benchmark dataset as one of the main datasets for conducting experiments and evaluating their proposed reflection removal technique.On the other hand, almost 36% of the considered research articles did not use the SIR 2 dataset at all for doing experiments and evaluation.Almost 16% of the considered research articles only used the SIR 2 dataset for conducting experiments and evaluation.Near 92% of the research articles used real-world datasets, and 8% of the research articles only used synthetic datasets for conducting experiments and evaluation.The results show that most of the researchers tend to use real-world benchmarking datasets for quantitative evaluation.More results are shown in figure 8 and 9.

D. RQ4. WHAT ARE THE ARCHITECTURES OF THE PROPOSED REFLECTION NETWORKS IN
This section is merely dedicated to answering the research question RQ4.According to different types of single-image reflection removal networks based on deep learning which are utilized in each publication, we analyzed all the considered research articles with a high focus on the proposed network architectures and deep learning algorithms.In the following, we summarized the proposed deep learning architectures which are utilized for single-image reflection removal.-In P1, Fan et al. [6] proposed the CEILNet, which has two cascaded sub-networks: E-CNN which is designed for predicting the edges and I-CNN which is designed for reconstructing the image.In the E-CNN subnetwork, they applied a CNN to the source image I s in order to learn an edge map E t of the target image I t .
To make the computations easier, they augment the source image I s with the edge map E s as a new input channel.Predicting the edges of the target image is the purpose.The proposed sub-section E-CNN estimates the following function f.This f function inputs I s and E s and outputs E t .I-CNN, as the second cascaded sub-network of the CEILNet approximates the following function g and reconstructs the target image I t by learning the procedure of processing the input image I s given the target image map E t which the first sub-network (E-CNN) predicts the g function.This g function inputs I s and E t , and then outputs I t [6].-In P2, Fu et al. [13] proposed an algorithm that solves the reflection removal issue in two phases.In the first phase, they used an instance segmentation network to obtain the region-aware map M from the input image.
In the first phase, they used the Mask R-CNN network [70].In the second phase, they input both the original image and also M together into a CNN which performs as an encoder-decoder in order to erase the reflection B from the original image [13].-In P3, Wan et al. [2] proposed CoRRN which contains three sub-networks for estimation of the background and reflections in a cooperative way.These three sub-networks are CencN, idecN and GdecN.Context encoder network (CencN) is designed based on VGG16 [37] and is responsible for suppressing the sparse reflection residues and then extracting information related to the scenes from different layers of images.Another responsibility of CencN is facilitating the training networks.Gradient decoder network (GdecN) is also responsible for learning a mapping from I to ∇B. and finally, image decoder network (IdecN) which is a multi-tasking learning network is responsible for learning a mapping from I to B and R. In the framework of CoRRN, the related gradient features from GdecN sub-network guides the IdecN sub-network [2].-In P4, Wan et al. [4] proposed the CRRN with a multitask learning approach that evaluates the background and the reflection of the image using ∇B.The proposed framework of the CRRN contains two cooperative blocks: first, the gradient interference network (GiN) which is designed in order to estimate and evaluate gradients of the background image and second, the image interference network (IiN) which is designed to estimate and evaluate the background and reflection layers of the image.The GiN is fed with the blend image and corresponding gradients as a tensor with four channels.IiN is also fed with a mixture of images that consists of reflections.The phase of up-sampling in the IiN is guided by some gradient features that are provided in GiN with the same resolution.The IiN block has two feature extraction layers for extracting the features of scale-invariant correlated with the background image.The output of the IiN block will be the estimated background and reflection image and the output of the GiN block will be the estimated gradient of the background image [4].-In P5, Zheng et al. [11] proposed a network that aims to remove droplets of rain and reflections in the glass or transparent material from a single-image.The two sub-networks g and h are both created of two CNN layers with the size of two for the process of downsampling, 6 residual units [31] and also two interpolating convolutional neural layers with the size of two for the process of up-sampling that has instance normalization [32].Each of the g and h sub-networks consists of 1 and 3 output channels [11].-In P6.Zhang et al. [12] proposed an algorithm that contains two cascaded subnetworks: decomposition network (D-Net) and refinement network (R-Net).
The D-Net sub-network is decomposing the input image which is called I into the background layer and the reflection layer.R-Net sub-network the output image.This procedure in the R-Net is done by adding back the details that are ignored by the D-Net sub-network with the help of an attention strategy.Also, Zhang et al. add a former sub-network that involves human interaction into the proposed cascaded network.They request the user to show the reflection and background areas in the considered image.The guidance input of the user is then transformed to a pyramid of edge maps to recognize the difference between the reflection and background areas of the image [12].-In P7, Chou et al. [15] proposed a network which is a GAN with two generator blocks to model the reflection areas separately that has different intensity [15].-In P8, Chang et al. [16] proposed a network that con- , [72] in the discriminator unit to apply the algorithm in arbitrary-sized images [17].-In P10, Dong et al. [29] proposed an algorithm that consists of two phases.In the first phase, the algorithm predicts the RCMap and the reflection layer.In the second phase, the algorithm predicts the transmission layer CBAM [29].-In P11, Yang et al. [8] proposed BDN network architecture that has three sub-networks G0(•), H(•) and G1(•).Ronneberge et al. used a kind of U-net [33], [34] which has an encoder and a decoder part for the implementation of these three sub-networks.In this network architecture, the second and third sub-networks contain the same network structure but with different parameters.The first sub-network has 14 convolutional layers, in contrast, the second and third sub-networks have 10 convolutional layers.Note that G0(•), H(•) and G1(•) are all cascaded sub-networks [8].-In P12, Abiko and Ikehara [1] proposed an algorithm based on GAN with gradient constraint.This algorithm has three sub-networks Generator, discriminator, and feature extractor that are operating based on CNN.The proposed sub-network generator is performing based on UNet++L 4 [35] that is a mixture of CNN layers, leaky ReLU layers, max-pooling layers, batch normalization layers and bilinear interpolation layers.Due to the fact that the authors implement a deep supervision network, four outputs are available in this network.
In this paper, Abiko and Ikehara [1] utilized Bˆfor the main output of the proposed network and the other three outputs as a tool for computing pixel loss.The discriminator block includes some CNNs, batch normalization and leaky ReLU layer.And eventually, they used the L 2 difference for the output of the discriminator block which has a size of 16× 16 in order to calculate the adversarial loss [1].-In P13, Wen et al. [19] proposed SynNet which is operating based on an encoder-decoder architecture and has six channels for image input.The encoder and decoder, both consist of three CNN layers, and between these two encoders and decoders, nine residual blocks are added to improve the illustration of reflection properties.InstanceNorm layer [36] and ReLU activation function are included in all CNN layers except the last one which is followed by the Sigmoid activation function in order to scale the output.Eventually, the proposed network contains a three-channel output with an alpha blending mask [19].-In P14, Wei et al. [9] proposed a network for singleimage reflection removal.They trained a feed-forward CNN G θ G .In the first step, a VGG-19 [37]  -In P16, Chang and Jung [5] proposed a network based on an encoder-decoder architecture for reflection removal tasks.encoder sub-network dimensional features via CNN layers and the pointreversed procedure begins to operate.The decoder subnetwork is connected to the E sub-network in order to map the high dimensional features reversely to natural images using some deconvolutions.Also, three skips are appended between E and D sub-networks to save the resolution of the outputs [5].-In P17 [81], the proposed network first separates the extracted features into two branches using feature learning: background constituent and reflection constituent.The contrastive feature decomposition subnetwork achieves more accurate feature decomposition by their proposed contrastive supervision algorithm.
In the end, the dense feature refinement sub-network tries to refine the details of restored images in order to accomplish images with high standard including both the background and the reflection images [81].-In P18, Han and Sim [23] proposed a reflection removal architecture that contains an interpreter and three multiscale generators.All of the designed generators predict both transmission image Tˆk and also the image with reflection Rˆk.Then they are up-sampled and concatenated to the input image to become the input image of the generator at the better image scale k+1.In the proposed architecture, three generators are included with the exact same structure.These generators supply the product images at the scales of 0.25, 0.5 and 1.0 in comparison to the primary input image with reflections.The proposed network also utilizes an encoder-decoder algorithm, and two CNN layers followed by ReLU and batch normalization activation functions [23].-In P19, Cheng et al. [24] presented a network that contains two parts: a generator and a 5-layer discriminator.
The generative sub-network also contains two subparts: an encoder-decoder block and a FCN block.The discriminator is operating with SNGAN loss [24].-In P20, Chang et al. [25] proposed a single image reflection which at first, edge estimator ε is being trained.In this process, it takes the reflection contaminated image I as input and estimates the edge map of the transmission layer T.Then, reflection classifier C is being trained in order to be used to recognize whether the reflection is available or not.The classifier also takes the reflection layer R and concatenates it with the reflection contaminated image I or the transmission layer T as input pair and eventually outputs the matching label of the pair.In other words, the proposed network takes I as the input of the architecture, then it produces the estimated transmission and then in the second stage, obtains the ultimate prediction of the transmission layer by recurrent decomposition [25].-The proposed network architecture in P21 [26] has two sub-networks: SP-net, a major separation sub-network and BT-net, a backtrack network for the back scene reflection.The SP-net separates the input image (image with a reflection into T * and R˜ * (with glass-effect).
In the second phase, R˜ * is given to the BT-net in order to erase glass/lens effect such as ghosting, distortion, defocusing and attenuation.R * is the output of the BT-net.R * must be glass/lens effect-free.R * is also utilized to provide more error computation for SP-net.All inputs of the proposed sub-networks are appended with extracted features from the famous VGG-19 [37] network [26].-In P22, Zhang et al. [7] proposed a single-image reflection separation network.To train the network using a semantic understanding of the input image, they defined hyper-column features [74] by extracting these features using VGG-19 [37] network.This VGG-19 was pre-trained on the ImageNet dataset [75].And finally, the proposed network f is a FCN that is so much resembling to context aggregation networks [76], [77].
In other words, the network is using a FCN trained thoroughly with losses that make use of both low and high level information available in the image [7].-In P23, Heo and Choe [27] proposed a single-image reflection removal that operates using conditional GANs.The proposed reflection removal network in P23 [27] mainly adopts the basic network of pix2pix [34] and uses the ADAM optimizer with a learning rate 0.0002.The Generator block uses the proposed U-Net [33] and the discriminator block employs a classifier that operates based on FCNs.The ADAM optimizer [73] is also applied with a learning rate 2 × 10 −4 .The momentum parameters are β 1 = 0.5 and β 2 = 0.999 [27].-In P24, an image of resolution H × W with reflection is given to the network as input.The proposed network determines the number of scales N. In the next stage, a N-scale space representation of the input image is constructed from the scale space in two phases: Low scale sub-network (LSSNet) and b) Progressive Inference (PI) of the higher scales using Convolutional Guided Filter up-sampling (CGF) and High Scale Subnetwork (HSSNet) [79].-In P25, I and OT are given to the proposed subnetwork g as inputs and then the sub-network outputs e pre and e' pre .In the second phase, the proposed subnetwork h takes the input which is the concatenation of I and spatially-replicated e pre , and gives T pre as the output [80].

E. RQ5. WHAT QUANTITATIVE METRICS HAVE BEEN USED TO ACCREDIT THE EFFICIENCY OF THE PROPOSED DEEP LEARNING TECHNIQUES?
To answer RQ5, we gathered all the relevant information about the used quantitative metrics in each selected research article.Many evaluation metrics were utilized in research articles to assess and evaluate the performance of the proposed methods.Figure 10 presents the number of studies using different quantitative metrics including FSIM, LMSE, NCC, sLMSE, PSNR, r , and SSIM to evaluate their proposed method.All of the papers used multiple quantitative metrics for performance validation, reporting the results of the proposed method and comparing their method with other research works.According to figure 10, almost 96% of the selected research articles used SSIM quantitative metric in order to validate the performance of their research work.In the second stage, near 88% of the selected research articles used PSNR as a quantitative metric.Therefore, it is obvious that SSIM and PSNR are the most utilized quantitative metrics in the selected research articles.Among these 25 articles, 16% of the selected articles used SI metric, 12% used SSIM r , 8% used FSIM, LMSE, NCC, SI r , and at the end, 4% used sLMSE.Due to the fact that SSIM and PSNR metrics are broadly popular in the research field, they can be used to compare the results of each newly proposed technique with previous research works.Finally, in Table 9, we presented more details about the results of our analysis over the metrics for each paper.

F. RQ6. WHAT ARE THE CURRENT LIMITATIONS AND CHALLENGES IN THIS RESEARCH FIELD?
In this section, we summarize the weakness and limitations represented by the deep learning-based single-image reflection removal algorithms as recorded by researchers to be resolved in the future of nominated research works.In this SLR, we have extracted all the relevant information about the limitations, challenges and weaknesses from the limitations section of the considered research articles.According to the TABLE 9. Analysis of each selected paper according to the utilized quantitative metrics for performance validation.
selected papers, the limitations and challenges are explained as follows: • It is possible to encounter an image inpainting problem that is more challenging and more complicated because of the loss of the background information in some parts of the image that contains saturated reflections.Due to this problem, almost all of the proposed techniques cannot perform effectively in order to remove the reflections from the image.[2].
• With the existence of many convolutional layers in the proposed networks, the color-shifting problems are one of the challenges for the proposed reflection removal networks [2], [4].
• Some of the proposed networks did not consider outlooks that are vertical to the glass in their proposed algorithm and also they did not take into account curved glass or glass with a particular shape.In some other networks, the diversity of scenarios and capturing settings for the images which are included in the synthetic dataset needs to be improved.These problems in data generation may restrict the generalization ability of dataset [2], [26].synthesized and images in some datasets of the proposed algorithms are all indoor scenes that may not suitable for outdoor scenes.These proposed networks were not very effective and almost fail in outdoor scenes [16].
• When the whole images are dominated by reflection or ghosting reflection which makes it so hazy and blurry, or the reflection layers and background are overlapped, the effectiveness of the proposed networks may drop and these networks may not be able to completely remove the reflections and the evaluated background still remains visible residual edges.Also, the proposed technique may have some problems with gradient disappearing when the deep learning technique is trained directly on the images [4], [12], [16].
• According to the fact that some of the presented networks are operating based on the extracted edges, these algorithms may not work properly whenever there is a loss of edge information or the edge information is lowconfident [12].
• Some of the proposed single-image reflection removal networks have some limitations in the process of performing on real-world images.This limitation happens with of lack of a real-world training dataset [27].

G. RQ7. WHAT ARE THE POSSIBLE FUTURE WORKS AND DIRECTIONS IN THE SINGLE-IMAGE REFLECTION REMOVAL FIELD VIA DEEP LEARNING TECHNIQUES?
Due to the current bibliographic review, we encounter the potential directions and future work recommended by researchers as mentioned below: • Applying the proposed networks on more image processing and image layer decomposition projects.
• Improving the proposed networks in some cases in which the whole image is dominated by reflections in order to have better results.Continuing the work on the proposed networks to enhance the capability for dealing with challenging images with reflections [4], [7], [11], [15], [24].
• Some of the proposed networks rely on hand-crafted features.Proposing and designing a more hand-free and automated reflection removal algorithm than the proposed ones which can free users from guidance and suppress reflection with high quality can be mentioned as a future direction and it is expected to successfully deal with the limitations in challenging reflection removal tasks [12], [23].
• The dataset used for training the proposed networks is mostly built based on the screen blending method as the researchers assert that it is more precise to depict the real reflection than the mostly utilized summative method [15].
• Simplifying the proposed networks to reduce the number of existing parameters and upgrade the presumption rapidity for its usages on the mobile computing programs [29], [80].
• Different kinds of networks including generative models inspired by GANs can be applied to this single-image reflection removal task to achieve better results.Moreover, due to the uncertain nature of the reflection layer, generative models based on an uncertainty map can be used in future work [27].

IV. CONCLUSION
In this systematic review, we aimed to explore and analyze the academic research papers in the field of single-image reflection removal using deep learning and neural network techniques.For this purpose, we performed a complete and comprehensive statistical analysis on this topic by extracting the answers to 7 research questions which are provided below: • RQ1: What is the distribution of the selected research articles over the last seven years and their types?
• RQ2: What datasets/databases are used to train the proposed deep learning model in considered publications?
• RQ3: What datasets/databases are used for conducting experiments on single-image reflection removal?
• RQ4: What are the architectures of the proposed reflection removal networks in each paper?
• RQ5: What quantitative metrics have been used to accredit the efficiency of the proposed deep learning techniques?
• RQ6: What are the current limitations and challenges in this research field?
• RQ7: What are the possible future works and directions in the single-image reflection removal field via deep learning techniques?By the use of our research methodology, we investigated almost 1600 research articles published between the years 2015 and 2021.Twenty-five of them have been selected in this SLR to be discussed.In order to answer these proposed questions, our research team identified the distribution of the selected articles (RQ1), datasets and databases used for training, testing and experiments (RQ2 and RQ3), structures and architectures of the proposed algorithms (RQ4), quantitative metrics for performance validation (RQ5) and finally, limitations, challenges and future work of each research studies (RQ6 and RQ7).
In the process of conducting this SLR, we observed that the current trend in the reflection removal field is to use convolutional neural networks and deep learning techniques.The advances in this field are mostly related to using different CNN structures and architectures and also using datasets with more images.One of the expected results of this SLR is that in spite of the very advanced deep learning algorithms utilized for this reflection problem, there is still a lot of work left to do in this research field in the future.We note that it in detail the RQ.7 about the possible future works.Another observation is that there are few studies for single-image reflection removal using dep learning (25 studies were identified).We also observed that there is a lack of real-world training and benchmark single-image reflection removal datasets in order to train or evaluate the deep learning algorithms.Further studies can be on making a comprehensive training and benchmark dataset.
In conclusion, our research team hopes that due to the quite wide topic coverage, this systematic review can be beneficial and helpful for the many AI researchers and engineers who are beginning research in the field of single-image reflection removal and deep learning.In spite of the fact that reflection removal using deep learning techniques has newly been a trend among researchers, we believe that fast progress and advances in this field can be very important to reach a higher level of services in computer vision and image processing applications.

FIGURE 1 .
FIGURE 1. Classification of reflection removal algorithms (learning and non-learning methods).

FIGURE 3 .
FIGURE 3. The process of searching and selecting research articles.

FIGURE 4 .
FIGURE 4. Citation count of selected research papers.

FIGURE 5 .
FIGURE 5. Distribution of publications per year.

FIGURE 6 .
FIGURE 6. Distribution of publications per country.

FIGURE 7 .
FIGURE 7. Distribution of publications per type.

FIGURE 8 .
FIGURE 8. Number of papers conducted experiments on SIR 2 dataset.

FIGURE 9 .
FIGURE 9. Percentage of papers conducted experiments on real-world dataset.
sists of 3 sub-networks.They used the first four units of VGG-19 as the encoder.The RNN and DEN are the two sub-networks that are designed as decoders.The image with reflection I is given to the encoder and the output of this sub-network is given to RRN and DEN subnetworks.RRN is the proposed sub-network for the reflection removal task that outputs the evaluated transmission layer which is called O.The evaluated reflection R is obtained by I-O.And DEN sub-network is designed for depth estimation which has the transmission depth D T as the output.rD T is also the refined depth map [16].-In P9, Ma et al. [17] proposed a network that consists of two mapping functions.Generator G: (B, R) → M and separator S: M → (B, R, E), which M is a real-world mixed image, B is background and R is reflection.Generator G and the separator S both share similar architectures.They both have a down-sampling unit with two CNN layers for increasing the receptive field size and a feature extraction unit that has 9 residual blocks for the process of extracting robustly and an up-sampling unit that consists of two transposed CNN layers.The generator G consists of two downsampling units in order to get the input of reflection and background.The separator S also consists of a multitasking unit and three up-sampling units to make more accurate results.They have utilized 70 × 70 Patch-GANs [71] sub-network is used.Then the output of VGG-19 is up-sampled and fed to the CNN G θG network.The CNN G θ G contains 3 layers of Conv-ReLU pairs, 13 residual blocks, Conv-ReLU pair and Pyramid pooling [9].-In P15, Li et al. [21] proposed IBCLN which contains two blocks: a transmission-prediction block G T and a reflection-prediction block G R .Both of these two proposed blocks are designed based on convolutional LSTM networks.G T block learns the transmission T and G R tries to learn the residual reflection R ∼ .Both block use an encoder that contains 11 layers of convolutional LSTM layers for extracting the features of the input, and a decoder that contains 8 CNN layers that generate the predicted residual reflection layer of the transmission layer.The LSTM layers have a Sigmoid or Tanh activation function.All of the other convolutional layers are followed by a ReLU activation function [21].

FIGURE 10 .
FIGURE 10.Number of research papers using FSIM, LMSE, NCC, sLMSE, PSNR, SIr, SSIMr, SI and SSIM as a quantitative metric for evaluation of their proposed deep learning technique.

TABLE 6 .
Publishing year of each selected articles.