An Effective Two-Branch Model-Based Deep Network for Single Image Deraining

Removing rain effects from an image automatically has many applications such as autonomous driving, drone piloting and photo editing and still draws the attention of many people. Traditional methods use heuristics to handcraft various priors to remove or separate the rain effects from an image. Recently end-to-end deep learning based deraining methods have been proposed to offer more flexibility and effectiveness. However, they tend not to obtain good visual effect when encountered images with heavy rain. Heavy rain brings not only rain streaks but also haze-like effect which is caused by the accumulation of tiny raindrops. Different from previous deraining methods, in this paper we model rainy images with a new rain model to remove not only rain streaks but also haze-like effect. Guided by our model, we design a two-branch network to learn its parameters. Then, an SPP structure is jointly trained to refine the results of our model to control the degree of removing the haze-like effect flexibly. Besides, a subnetwork which can localize the rainy pixels is proposed to guide the training of our network. Extensive experiments on several datasets show that our method outperforms the state-of-the-art in both objectives assessments and visual quality.


Introduction
The prevalence of rain, particularly in some locations, not only seriously reduces the images quality captured by cameras, but more importantly impacts negatively upon the robustness of devices and/or algorithms that must operate continuously irrespective of the weather. For example, the inability of driverless cars to operate in the rain has become a notorious issue 1 . Most of the early attempts use videos [11,36,4,3,1,2,29] as utilising temporal correlation 1 See the Bloomberg Businessweek article 'Self-Driving Cars Can Handle Neither Rain nor Sleet nor Snow' on 17 Sept. 2018 (a) Input (b) [9] (c) [34] (d) [35] (e) [25] (f) Ours Figure 1. An example of a real-world rainy image and the deraining results. Our method removes the obvious heavy rain steaks and recovers the colors of the scene by removing the haze-like effect.
We mainly focus on single image based deraining. Conventional methods can be categorised into three categories. The first is filtering-based, where a nonlocal mean filter or guided filter [14] is used [7]. One limitation is that the filter can adapt to neither the raindrop sizes nor its locations. The second category uses dictionary learning to decompose a rain-affected image so that the dictionary elements corresponding to the rain might be separable from those associated with the image content, and their effects can be removed accordingly [10,20,5,18,17,32,31]. This kind of approaches are suitable for removing rain streaks with clear edges, but for heavy rain, they often leads to undesirable visual artifacts. The third category builds models for rain streaks [6,27,26]. These models attempt to discriminate rain streaks from the background. However, these methods tend to misidentify fine image details as rain streaks, thus falsely removing desirable fine details.
Very recently, deep learning has also been applied to rain removal and achieved remarkable results [9,34,35,8,22]. They either train a network to estimate a clear (rain-free) image B from a rainy image I directly, or to estimate a residual R to model the rain layer and B can be obtained by B = I − R. Learning to estimate the clear image directly often ignores the forming process of the rainy image, thus requires a large dataset to learn the impact of every possible rain type, at every scale, orientation, and illumination in nearly all scenes to obtain better rain-removed results. Learning the residual processes rainy image as a simple summation of the background image and a rain layer. As rain layer is always easier to learn compared to the background with complex textures, these networks are faster to converge and also obtain remarkable rain-removed effects.
However, both of these two types of networks cannot remove rain effect completely, such as Figure 1. Under the rainy condition, not only rain streaks will be imaged, but also the colors of objects will become shallow or gray as if they are covered by a layer of haze which is more apparent when encountered heavy rain (e.g. Figure 1). This phenomenon is caused by the scattering of accumulated tiny raindrops to light which leads to the color shift [28]. There is no standard term for this phenomenon, so we call it hazelike effect (maybe not accurate) for convenience in our paper. Besides, the edges of rain streaks tend to be blur under this situation, which also enhances the difficulties in the rain-removing task.
To deal with these problems and remove rain effect more deeply and completely, we take the scattering of accumulated tiny raindrops to light into consideration to model a rainy image I as: I = T • B + R (• is pixel-wise multiplication) to describe the relationship of I and B. Compared with the commonly-used linear rain model I = B + R, we also use R to model apparent rain streaks, but an enhanced coefficient T is used to model the transmission of accumulated tiny raindrops (haze-like effect) which is always ignored by using R only. Instead of solely training an end-to-end network for rain removal, considering that the rain model itself contains a more specific structure of the deraining task, we integrate the rain model into the design of the deep neural network. Specifically, we propose a two-branch model-based deep network for single image deraining via rAin Model Parameter Estimation, referred to as AMPE-Net. The network framework is shown in Figure  2.
Rain streaks and haze-like effect can be removed cleanly by the trained two-branch subnetworks. However, people may think the haze-like effect is over-removed (the colors of rain-removed results are brighter than the colors in orig-inal rainy images) by our method. After all, this is a subjective assessment, different people have different opinions. We will analyze the colors of our results in the later section. However, in order to offer more flexible results to meet the requirements of different people, we utilize a spatial pyramid pooling (SPP) module [16] in our network to refine our results by extracting multi-scale features. Besides, a parameter α is set to control the degree of removing the haze-like effect. Every user of our code will obtain their favourite results. Because the processing of network for rainy pixels is different from that for non-rainy pixels, the location of rain pixels which is always ignored by many other methods can play a positive role in the rain removal performances. We design a LocNet to learn the location of rain to guide the training of subsequent networks. Our main contributions are: • We use a new rain model which takes the scattering of accumulated tiny raindrops into consideration to model rainy images more completely, so that not only rain streaks but also haze-like effect can be removed to obtain a clearer image.
• We build a two-branch network which is guided by our rain model to jointly learn the model parameters. A SPP module is used to refine our rain-removed results to control the degree of removing haze-like effect. Besides, we design a simple but effective deep convolutional network to identify the location of rain. Rain location provides more information about rainy images which will also help other peer workers in many other rain-removing methods.
• Our method removes full range of rain effects, from large raindrops to haze-like effect. It can also flexibly control the haze-like effect to produce different visual effects. Hence, we provide a more complete solution to rain-removing task. We also simply show the potentials of our model and network in dehazing task.

Related Work
Conventional Methods Rain streaks were detected and removed initially in videos. For example, temporal correlation and motion blur were explored to model the photometry of rain in order to detect and remove the rain effect from videos [11]. Same authors extended the work by considering how to render rain streaks as realistic as possible and then using which to remove the rain effect [12]. Similarly temporal and chromatic characteristics of rain streaks were both explored to detect and remove rain streaks in [36]. The distribution of orientation of streaks was exploited in [3] to remove rain streaks. Apart from exploring the properties in the temporal and spatial domain, characteristics of rain in Figure 2. The architecture of AMPE-Net. LocNet captures the location of rain affected pixels, while EstNet-T and EstNet-R estimate the parameters T and R in our rain model, we call them two-branch unit for convenience. In ResNet, we use SPP structure with the factors 6, 8, 16 and 32 respectively. The operation α means average weighted combination operation with coefficient α.
frequency domain was also investigated, using which rain streaks were removed based on the frequency [1,2]. Single-image rain removal has gained much success and popularity recently. There are many attempts using dictionary learning to remove rain streaks by their shape or color characteristics [10,20,5,18,17,32,27]. However this type of methods tend to work well on the rainy images with apparent streaks, and dictionary learning is often time-consuming. To avoid rain pixel detection and timeconsuming dictionary learning stage, a low-rank rain appearance model was proposed in [6] to capture the spatiotemporally correlated rain streaks to remove rain streaks in images (and videos). Instead of learning a dictionary or imposing a low rank structure, simple patch-based priors (using Gaussian mixture models) were proposed to model both the background and rain layers [26]. These priors accommodate multiple orientations and scales of the rain streaks and remove rain streaks well in some cases. However, the resulting images sometimes lose fine details. Deep Learning Based Methods Very recently, deep learning has been used in rain removal. Inspired by deep residual network (ResNet) [15], a deep detail network was proposed to reduce the mapping range from input to output [9], to make the learning process easier. Moreover, they use priori image domain knowledge by focusing on high frequency detail during training, which can remove the interference from the background to a degree. They extended the work by decomposing the rain image into low and high frequency components, and extract image details from the high-frequency component [8]. These two methods are particularly good for removing light rain, but have issues of removing bright or blur rain streaks. To handle bright rain streaks better, Yang et al. add a binary map to locate rain streak. They create a new model to represent rain streak accumulation (where individual streaks cannot be seen, and they appear as mist or haze instead), and various shapes and directions of overlapping rain streaks [34]. Their method is very good for removing bright rain streaks, but often fails for removing blur rain streaks. To handle diverse rain effects, Zhang et al. propose a multi-stream dense network that can automatically determine the rain-density information and thus can efficiently remove the corresponding rainstreaks according to the estimated rain-density label [35]. This method can handle a diverse range of rainy images, but sometimes causes blur in image details. To model and remove rain streaks of various size (and distance among them) and the veiling effect, a multi-stage network consisting of several parallel sub-networks was designed, each of which models a different scale of rain streaks [23]. Li et al. remove the rain streaks via multiple stages and use recurrent neural network to exchange information across stages [25]. Unlike cascaded multi-stage learning scheme, a non-locally enhanced encoder-decoder network framework is proposed, which captures long-range spatial dependencies via skipconnections and learns increasingly abstract feature representation while preserving the image detail by pooling indices guided decoding [22].

The Proposed Method
Given an observed image I contaminated by rain, the goal of deraining is to recover a clean image B. Our target is to train a neural network (i.e. AMPE-Net) to estimate B from I by: where U(·) denotes the proposed AMPE-Net. Unlike the commonly-used rain model I = B + R, we propose to integrate our rain model into the neural network to guide the estimation of parameters. Before introducing the network U(·), we will first remodel a rainy image I to express the relationship between I and B more completely (in Section 3.1).

Rainy Image Modeling
Existing deep learning based rain-removing works produce good rain-removed performances for many rainy images [35,9]. Majority of them model a rainy image I as a simple summation of the background B and the rain layer R: However, in some rainy scenes, not only large raindrops but also accumulated tiny raindrops will be imaged by a camera [34]. Large raindrops tend to be imaged as apparent rain streaks which are considered to be sparse and have similar falling directions and shapes [25]. Single tiny raindrops cannot be seen, but can impair an image by accumulating together and occluding the propagation of light through scattering. When imaged, they always look like a layer of haze (haze-like effect) which can shallow the colors of background and lead to a low image contrast. In this case, the edges of rain streaks also become blurry and merged into the haze-like effect, which enhances difficulties in deraining task further (e.g. Figure 1). When encountered rainy images with apparent haze-like effect, especially under heavy rain condition, model (2) tends not to obtain satisfactory results, haze-like effect and some rain streaks which have blurry edges still remain in the deraining results (e.g. Figure 1). To handle above problems, we take the scattering of accumulated tiny raindrops to light into consideration and add a variable T to describe the influence of accumulated tiny raindrops on the background: where T is the transmission of the accumulated tiny raindrops to model haze-like effect and its value is greater than 1 as haze-like effect always enhance the intensity of pixels [13], and R models the apparent rain streaks with values in range [0, 1]. However, the ground-truth of T and R are difficult to acquire. That is why we design a two-branch network which is trained jointly to evaluate T and R together. We let network itself to determine the optimal parameters to fit our rain model and remove rain effect (including rain streaks and haze-like effect).

The Proposed AMPE-Net for Deraining
According to our rain model in Eq. (3), given a rainy image I, if we can obtain the corresponding parameters T and R, the clean image B can be predicted through: where B m denotes the estimation of the background B by our rain model, and is the point-wise division. However, estimating T and R from I is non-trivial, and different rainy images have different values. We thus consider estimating T and R by learning a jointly-trained networks.
There is no ground truth for T and R, hence we cannot implement complete supervision training to the network to simultaneously estimate two unknown variables T and R only under the supervision of the ground truth for background B. In our work, we utilize incomplete supervision (not semi-supervision, in our paper 'incomplete supervision' means the number of unknown variables is larger than the number of variables which have ground truth during the training) to train a parallel two-branch network which is guided by our new rain model to estimate T and R simultaneously.
As shown in Figure 2, the two subnetworks EstNet-T and EstNet-R which are concatenated in parallel to form a twobranch network to estimate T and R, then the estimation B m of background B is calculated further by Eq. (4). As Eq. (4) is differentiable to T and R, EstNet-T and EstNet-R can be updated simultaneously to find the optimal T and R to remove rain more completely, which is also the way our model guides the training of two-branch network. Compared with complete supervision training and rain model I = B + R, our network and model have more flexibility and capabilities to remove rain effect.
After using Eq. (4), we can obtain a clear image in which rain streaks are removed and the colors of objects are also recovered (e.g. Figure 1(f)). However, people may think the haze-like effect is over-removed (the recovered colors are too bright). We will analyze the color of our results in the experiment section. As visual effect of an image is after all a subjective assessment, different people have different views. In our work, we design a network (RefNet) which are trained jointly with the two-branch EstNet-T and EstNet-R to refine our results and use an averaged weighted combination to control the degree of removing haze-like effect. We will show the results later.
Considering that network has different treating to rainy and non-rainy pixels, we propose to estimate rain location map as a guide. The proposed AMPE-Net consists of three major components: a subnetwork for estimating a rough location map (LocNet), a two-branch subnetwork for estimating T and R (EstNet-T and EstNet-R) and a SPP module (RefNet) to refine our rain-removed result. In the following parts, we name the two-branch subnetwork of EstNet-T and EstNet-R as two-branch unit for description convenience. LocNet It takes the rainy image I as input and estimates the location information L of rain pixels in I: where H(·) denotes the mapping of LocNet. L is the es- timation of L. Note that we utilize a Softmax layer to approximate the binary location map in the training process, so L is not binary any longer. The detected rain pixels will have high values in L and vice versa. EstNet-T One input of this subnetwork is the image I. Because it is used to estimate T which is related to the background, we use the non-rain location information 1 − L and non-rain information I•(1− L) as another two inputs, which is the way that L guides the training. Here, we can treat every value in L as the probability of corresponding pixel being a rain pixel. If we use F(·) to denote the mapping of the EstNet-T, then T can be calculated by: EstNet-R Similarly, the inputs of EstNet-R are I, L and I • L, and R is obtained by the mapping G(·) of EstNet-R: Then rain-removed result by our model is: AMPE-Net If R(·) is the mapping of RefNet, based on the above definition, the rain removed result B can be calculated by: which only takes the observed rainy image I as input. We use average weighted combination operation to tune the degree of removing haze-like effect. α is the combination coefficient. During the training we set α = 0.9, but α can be any number in range [0, 1] to tune the degree of removing haze-like effect during the testing.

Network Structure of LocNet H(·)
Raindrops always have different sizes, only using convolutional kernels with single size cannot always extract features of rain completely. Inspired by [35], three densely connected convolutional networks [19] are first utilized in LocNet ( Figure 2) to extract multi-scale shallow features of I. The kernel sizes of the three densely-connected blocks are 7 × 7, 5 × 5 and 3 × 3 respectively. We concatenate the obtained features with I to form the shallow-layer feature f l after a Conv layer.
The core part of our LocNet composes of downsampling, details extraction and up-sampling operations with four different scale factors (16,8,4, 2 respectively), which is a bit similar to pyramid pooling module in [37]. The reason why we utilize four scale factors to extract the deep features is also the different size and shape of rain. We use 5 ResNet blocks [15] to deepen the features, then up-sample the obtained features to original size to form the deep-layer features f d . After concatenation of the shallow and deep features, a Conv is used to fuse the combined features. At last, we utilize Softmax to estimate the location map L. Some location maps of real-world rainy images are shown in Figure 3. Not only rain streaks but also pixels covered by haze-like effect will be localised, hence for some images, like Figure 1, majority part will be identified as rain and have high values in L.

Network Structure of EstNet-T F(·) and
EstNet-R G(·) In two-branch unit, a 9 × 9 Conv is firstly utilized to extract the features of the guided input. To eliminate inaccuracy near image edges, the input is reflection-padded. As rain streaks are various in shape and size, then two downsamplings are implemented to suit to the variety. Furthermore, we use five ResBlocks to extract deeper features. Then, two times up-samplings are used to recover the original input size. In order to avoid checkerboard artifacts, we up-sample directly then followed by a Conv to substitute traditional Dconv. After up-sampling the feature map, we concatenate it with the output of the first Conv. At last, T is estimated by a composition of Conv and ReLu operation. Except for the last Softmax, EstNet-R is the same as EstNet-T.

Network Structure of RefNet R(·)
As the rain-removed result by Eq. (4) has already been very close to the ground truth, we do not need high-level features to refine its colors. In R(·), only two Conv layers are first used to extract low-level features. Then SPP which is originally used to improve the recognition accuracy [16] is utilized to obtain multi-scale low-level features. The scale factors are 4, 8, 16 and 32 respectively. For every feature with different size, we adopt pointwise convolution [30] to reduce their channels and up-sample them by nearest interpolation method to original size. The refined result is obtained by implementing Conv and Tanh activation function on the concatenated multi-scale features successively.
At last average weighted combination is utilized to integrate refined result and the result before being refined to obtain our final rain removed results.

Training Loss
Our AMPE-Net is trained by two steps. At first, given the training samples {(I t , L t )} N t=1 , we learn the mapping H(·) from I to L. Then the three sub-networks F(·) and G(·) in two-branch unit and R(·) are trained jointly on the training samples {(I t , B t )} M t=1 . Training loss on location map LocNet is trained to provide more information for the subsequent networks. Because we apply Softmax activation to approximate the binary location map, the MSE loss is used: Training loss for rain-removal To fully use the constraints of our rain model to optimize the parameters of two-branch unit, we minimize the following two MSE loss functions which are related to Eq. (3) and (4): By these two losses, we can obtain good estimation of I t from B t , and vice versa. Hence, the trained two-branch unit is more robust to remove rain than only using L 2 , the results will be shown in Section 4.5. In the training process, these two loss functions are utilized alternatively with different batches of training samples. Hence, the loss function L m to optimize T and R and further obtain the rain-removed result B m by our model is: where i is the batch index during training. B m is refined by R(·) and average weighted combination is utilized to control the degree of removing haze-like effect to obtain the final rain-removed result B. For B and the ground truth B, we also can obtain a loss L r : Our final loss function is: where α is the parameter to control the degree of removing haze-like effect. Larger α removes more haze-like effect.

Experiments
To assess the performances of our method quantitatively, we utilize PSNR and SSIM [33] as evaluation metrics. For real-world images, we only evaluate the performance visually. As the author of [22] is not convenient to release their code, another four state-of-the-art works [9,34,35,25] are selected to make comparisons.

Datasets
Training and testing dataset Li et al. [24] synthesized a rain dataset that includes 20800 training pairs, Zhang et al. [35] synthesized 12000 pairs of training samples. We randomly select half of these datasets respectively to constitute our training dataset. For our LocNet, we utilize the dataset of [34], which includes 2000 pairs of samples. Besides, we randomly select 100 testing sample pairs from the testing datasets of [9,25,35] respectively to constitute a 300-image dataset Rain-I as one of our testing datasets, so that we can make a fair comparison with selected methods. We also synthesize another 400 images which has apparent haze-like effect as dataset Rain-II 2 to test selected and our methods. We do not choose the dataset by Yang et al. [34]. The reason is the rain streaks in this dataset are not like real-world rainy streaks. In our previous works, the peer reviewers proposed not to use this dataset. However, in Figure 5, we still show a synthetic rainy image from Yang et al. to show our performance on this dataset. Real-world dataset Some real-world images are downloaded from Google and others are the images from selected works [9,25,34,35]. Our real-world images include light rainy images, heavy rainy images, and the contents are also various, including people, landscape, city etc .

Studies of the Behaviors of the Proposed Model
As we said in Section 3.1, we jointly train the two-branch unit, and let network itself learn the most proper parameters T and R to remove rain effect. In Figure 4, we show GT Input [9] [34] [35] [ 25] Ours (  the learned parameters T and R for two randomly-selected real-world rainy images. We can see that nearly all rain streaks are in the parameter R, and haze-like effect is included in T. From the results of I − R in Figure 4(d), we can see nearly all rain streaks disappear. But some slight haze-like effect and some trace of rain streaks still remain. In Figure 4(e), better deraining results are obtained by the revision of T.  Figure 7. Ablation studies on synthetic rainy images: (a) Ground truth, (b) synthetic rainy image, (c-f) results of H + G (L1 + L2), F +G (L1 +L2), H+F +G (L2), H+F +G (L1 +L2). The first part of the notation is the used sub-networks, e.g. , H + G means H(·) and G(·) are used, the second part in parentheses is the used loss, L1 + L2 means both L1 and L2 are used in the training.

Quantitative Evaluation on Synthetic Datasets
(without RefNet), our method has comparable PSNR/SSIM with the other methods. After RefNet is added, our PSNR/SSIM surpass them. The reason is some ground truthes are taken in slight-haze environment, our method (α = 1) will regard haze as haze-like effect and remove it to obtain a more clear images (e.g. , the first one in Figure 5) which causes the difference between ground truthes and our results and leads to relatively lower objective index. Please take a close look at our results in the first line of Figure 5 with different α controlling the degree of haze-like effect.

Qualitative Evaluation on Real-World Images
In this section, we show some results of real-world image in Figure 6. We can see that our method outperforms the state-of-the-art. Rain streaks and haze-like effect are both removed. Figure 6(f) is the result of our two-branch unit without R(·) (α = 1), in which object colors become brighter than in rainy images. This is the result of removing haze-like effect, not unnatural hue. The color of letters in the second line images is gray which is kept in our result in Figure 6(f), hence our method do not introduce abnormal hue in the rain-removed results. The results of tuning the degree of removing haze-like effect are shown in Figure 6(g)(h)(i) with selected α. We can see that the degree of removing haze-like effect can be controlled by α flexibly. Our results are more closer to the reality and the image details are also preserved better. Selected methods cannot remove haze-like effect well and some apparent rain streaks with blurry edges still remain in the final results for some images.

Ablation Studies
To verify the roles of different parts in our AMPE-Net, we do some ablation experiments. PSNR/SSIM of differ-  Table 3. Some visual results on synthetic and real-world images are shown in Figure 7 and 8 respectively. In our ablation study, we do not include R(·) whose role has been shown above and all the experiments in this subsection are done with α = 1. We can see that the guide of H(·) (LocNet) to the subsequent network is important and it boosts performances apparently. When removing F(·) (EstNet-T), our model degrades into I = B + R, the PSNR/SSIM decrease most seriously. Beside, the performance of removing haze-like effect is also lower than other cases which proves the role of rain model further ( Figure 8). The loss L 1 does not make too many differences.

Potentials of our Model and Network
Our model can be easily extended to other weather conditions, such as haze and snow. For haze, we randomly select 5000 training samples from [21] to dehaze with our model and networks (note that the LocNet will not be used in haze condition). Here, we only show some dehazing results for real-world hazy images in Figure 9. For more complete dehazing results including comparisons with state-ofthe-art works we will show in our extended journal paper. We will try to deal with snowy images in the future. Besides, our model has the potential to deal with deblur task. The reason is that the blurry image can be modelled by convolutions with different masks, and the operation of convolution can be rewrite as a linear model.

Conclusion
In this paper, we utilized a new model to describe rainy images more completely. In order to remove rain effect more completely, we proposed a two-branch network to jointly learn the parameters in our rain model. Two invertible loss functions are utilized to optimize the two-branch unit alternatively to better fit our model. To control the degree of removing haze-like effect, an average weighted combination and a SPP structure was utilized to refine our rain-removed results. Besides, a location map of rain was also learned to guide the training of our network. Compared with several state-of-the-art deep learning works, our method outperforms these methods objectively and subjectively, and our work can handle more kinds of rainy images, including removing haze-like effect to recover the original color of degraded images.