Saliency Consistency-Based Image Re-Colorization for Color Blindness

,


I. INTRODUCTION
People with normal eyesight can correctly perceive the frequency of light reflected on the surface of an object. However, patients with color vision deficiency, or colorblindness, cannot perceive colors correctly owing to genetic factors [1] or physical damage. As color vision defects (CVD) are not usually associated with fatal diseases and the number of patients reporting such defects is relatively small, colorblindness has not attracted much attention; therefore, no effective method for curing colorblindness has been developed. The Ishihara test [2] is used to detect color vision defects, which are of three main types: red-green colorblindness, blue-yellow colorblindness, and full colorblindness. Colorblindness can severely affect patients' daily lives, restricting their work-life balance. For example, people with colorblindness cannot get a driver's license. Many professions in related fields, such as engineering and medicine, also have certain requirements for The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. color perception ability. Although the inability to distinguish colors does not affect people's learning and cognition, working in color-related industries is a challenge for patients with colorblindness. On screen displays of most devices, patients with colorblindness are not fully considered.
Human color vision is based on three light-sensitive pigments [3], [4], which exist in three dimensions. Three types of cone cells-L-cones, M-cones, and S-cones-on the retina determine the correctness of trichromaticity. When any one of the cone cells is damaged or loses function, people can only experience part of the spectrum, necessarily missing spectral information. People suffering from protanopia lack L-cones, making it impossible for them to distinguish between red and green, and those suffering from deuteranopia lack M-cones and cannot distinguish between pale red and pale green. People suffering from tritanopia lack S-cones and cannot distinguish between light yellow and light blue. Approximately 8% of the population worldwide suffers from color vision defects. Therefore, helping colorblind patients perceive color more effectively must be addressed urgently.
One method for improving patients' ability to differentiate color is to recolor the image, while another method is to improve the color contrast through auxiliary equipment tools to achieve differentiation. However, assistive tools need to be tailored, and not every patient can access such tools. From the image perspective, image recoloring has wider applicability. Using the priority of key colors to color the dichromats image [5]. Lin [6] uses the directional feature vector of the color vision defect simulation result to reverse the color distribution. RGB color space is converted to λ, Y-B, R-G color space for recoloring. Poret [7] developed a filter based on the Ishihara test for color correction of colorblind images. Huang [8] proposed a rapid recoloring method based on the best mapping relationship between colorblind images and standard images. Dody [9] RGB color cluster and graph coloring color blind image color correction algorithm. In addition to the color correction of color-blind images, color correction in multi-viewpoint imaging systems has also been extensively studied [10]- [12].
Most recoloring algorithms are performed on standard images, and the colors of the recolored images are rearranged so that patients with colorblindness can distinguish colors [13]. However, one drawback of this color correction method is that the color distribution of a recolored image looks unnatural, such as producing blue apples and cherries. Besides, the purpose of recoloring for color blind images is only to help patients distinguish colors, but it does not consider the problem of saliency changes.
To solve these problems, we propose a saliency consistency-based image re-colorization method. This method uses image retrieval to filter a large number of similar images to form a collection. CVD simulation is performed on the images in the collection, and co-saliency detection is performed on the colorblind images. The detection result is compared with the ground-truth saliency map of the image, and the image with the salient area unchanged is selected as the reference image. The reference image is used to recolor the grayscale image so that the color distribution of the grayscale image is similar to that of the reference image. Because the salient area of the reference image is basically unchanged, this means that the color loss is less during the process of converting the image into a colorblind image; therefore, the colorblind patient's perception of the image is closer to that of a normal person.
The proposed method has the following contributions: • We propose saliency consistency-based image recolorization, which not only realizes color discrimination, but also realizes saliency correction of color blind images.
• We use image retrieval technology to collect images. The collected images are similar in semantics and content, so the image coloring effect is better.
• We recolor according to the color scheme of the reference image, instead of randomly changing the color of the image, the color of the recolored image is natural.

II. RELATED WORK A. COLOR BLINDNESS SIMULATION
The human retina contains two typical types of cells-cone cells and rod cells. Cone cells are active under normal illumination and are mainly concentrated in the depressed area of the retina facing the lens axis, called the fovea centralis. There are 5 to 7 million cone cells in the region, and each cone cell is connected to one optic nerve. Therefore, cone cells can sense brightness and color information and have enhanced resolving power. In areas other than the fovea centralis, there are between 75 and 150 million rod cells. Because rod cells are more sensitive to light, they are mainly active in low light. As many rod cells are connected to one optic nerve, the color of the rod cells is low, and therefore, the perception of color by the eyes mainly depends on the cone cells.
Human color perception is based on three light-sensitive pigments that are presented in a three-dimensional state, and the power of each wavelength specifies the degree of the color stimulus. The trichromatic accuracy is determined by three types of light-sensitive pigments cells (L-, M-, and S-cones) in the retina. Different wavelengths of light stimulate the receptors differently. For example, yellow-green light stimulates the same degree of L-cones and M-cones, but the degree of stimulation of S-cones is weak. The nerve center combines information from various cone cells in response to light of different wavelengths. Therefore, when any cone cell is destroyed or loses function, color vision defects occur.
In order to enable patients with colorblindness to perceive their surroundings properly, colorblind image simulation is needed (see Figure 1). Many colorblindness simulation methods have been carried out, and some rules have been summarized. For example, Brettel [14] used spatial transformation to perform red-green colorblindness simulation, converted the RGB color space to the LSM color space, and performed simulation based on the response of three cone cells. Machado [15] implemented colorblindness simulation based on human color vision theory. Based on the electrophysiological study, a simulation matrix was developed. For normal color vision, three-color vision and two-color vision can be processed. Okajima [16] used a personal color vision model to improve the brightness difference. Rivera [17] and coworkers used a convolutional neural network, taking Ishihara printing plates as input and adjusting the brightness through the network to achieve real-time colorblind simulation. Aytac [18] designed an integrated colorblind analog user interface. In addition, ready-made applications such as ColorDoctor are also available.

B. IMAGE RETRIEVAL
With the development of screen display technology, users can view many different types of images and videos. In order to achieve effective retrieval of multimedia data systems, image retrieval was developed, enabling users to quickly and accurately retrieve images according to their needs. With the development of deep learning, image retrieval technology has achieved significant development [19] and many applications such as product search [20], face recognition [21], and image geolocation [22] have been developed. Beginning with the text-based image retrieval (TBIR) process, such as the Page-Rank [23] method, image retrieval has undergone a complex development process. Although the TBIR method can locate images quickly and accurately, it also has some disadvantages, such as time-consuming and labor-intensive manual labeling, strong subjectivity of text, and insufficient object description of the image. By 2000, content-based image retrieval (CBIR) technology appeared and mathematical description was performed through low-level features such as color, texture, and shape in the image content. By calculating the similarity of features, the retrieved images are filtered to obtain the image that best meets the needs of the user. Such as the dictionary learning algorithm, FV algorithm, VLAD algorithm mentioned in [24]. In order to overcome the lack of simple visual features, semantic-based image retrieval (SBIR) is proposed. SBIR includes natural language processing and traditional image retrieval techniques. Using of convolutional neural networks to comprehensively consider the low-level and high-level features of images bridges the semantic gap. An image retrieval method based on capsule network is given in [25].
The image retrieval process has developed from text to visual content to semantic-based retrieval wherein the idea is to extract the relevant attributes [26], relative attributes [27], and absolute attributes [28] of images and compare the features to retrieve the desired image.

C. SALIENCY DETECTION
Development of multimedia technology has resulted in a large number of images with varying quality. The key issue is to quickly filter useful images and delete unnecessary images. Computers can be used to simulate human vision, extract a small amount of significant information from a large amount of image information for analysis, and help people to solve detection problems. Currently, visual saliency research has become a topic of great interest in the field of multimedia technology-the goal of which is to simulate human visual perception and attention mechanism and to realize automatic prediction, localization, and extraction of important information. Saliency detection involves cross application of psychology, neuroscience, and computer vision [29] and can predict the position of the gaze point of the human eye and detect a salient object.
With the development of deep learning, attention models are introduced into neural networks to mimic human attention mechanisms in visual scenes. A recurrent attention model with hard alignment was proposed by Mnih [30] and coworkers, but model training was more difficult. Later, the image title of the circular attention model to align the words with the image area was studied in [31]. Sermanet [32] used recursive attention model for fine-grained classification by focusing on discriminant regions. In addition, attention models are applied to computer vision [33], [34], which effectively resolve computer vision problems by analyzing context information in the aforementioned applications.
The saliency detection model relies on the local contrast [35], global contrast [36], and background prior [37] cues to detect salient objects. In addition, CNN has achieved great success in saliency detection. In [38], the FCN-based saliency model is combined with the saliency model of multiscale image regions. Wang [39] repeatedly used FCN to refine the saliency map. Amulet network [40] was used to combine multiscale context information. Hu [41] used guided filtering to refine the saliency map.
In the face of complex images, frequently occurring patterns, or main foregrounds, are used to characterize the main content of the images and batch processing of the images is realized, which promotes the development of co-saliency detection methods. Co-saliency finds the common saliency regions from multiple related images. The image collection can be divided into four parts: common foreground (CF), common background (CB), uncommon foreground (unCF), and uncommon background (unCB). The co-saliency method is divided into three steps: extracting effective features, using information clues to characterize saliency, and calculating common saliency areas. Among the effective feature extractions are mainly the low-level features such as color histograms, Gabor filters, or SIFT descriptors [42]. In addition, there are methods to take the results of the saliency regions in a single image as intermediate features [43] and detect the co-saliency regions in the entire image collection.

D. GRAYSCALE IMAGE COLORIZATION
Grayscale image colorization is the process of recoloring a grayscale image to enhance the color perception of the image [44], and it is widely used in the fields of refurbishment of black-and-white photos and recoloring of medical images. The colorization method can be divided into three categories: user-assisted coloring, automatic coloring, and grayscale coloring based on reference images. The user-assisted shading method involves manually marking the color [45] or the area where the color is migrated [46] by the user, using the least square method to optimize it based on the relationship between the colored pixels and neighboring pixels, and finally, realizing the colorization of the global image. However, the process of labeling samples is time-consuming and laborious and the coloring effect depends heavily on the color of the label. To simplify the coloring process, a reference image-based coloring method is proposed. Grayscale the target image and recolor it using the reference image. As per the method given in [47], find the pixel points that match the target image in the reference image and recolor the target image with reference to the color distribution in the reference image. The effect of recoloring is different due to use of the different color migration methods.
The purpose of automatic coloring is to realize automatic color migration through machine learning or statistical learning methods, such as histogram statistics [48], and multimodel prediction methods [49]. An increasing number of people are using convolutional neural networks to learn prior information from large-scale data to achieve automatic coloring. However, coloring is an ill-conditioned problem with multimode uncertainty. Therefore, using automatic image coloring may result in problems, such as out-of-bounds color and blurring in the edge area.
Reference-based image coloring provides a basis for coloring by inputting a reference image similar to the content of a grayscale image. Bugeau [50] super-pixel the image and determines the similarity between the target image and the reference image based on pixel-level attributes. However, when there is a difference between the target image and the reference image, coloring based on the similarity between features is more prone to make mistakes.

III. PROPOSED METHOD
Due to the destruction of cone cells, patients with colorblindness cannot recognize certain colors. If the color scheme of an image contains colors that the colorblind patient cannot recognize, then even after conversion of the image to a colorblind image, the CVD patient still cannot effectively perceive the image. Because the color scheme of the image is different, the loss of color in the conversion is also different. In order to make the perception of images of a colorblind patient as close as possible to that of a normal person, a color scheme with less color loss should be chosen. The algorithm pipeline is shown in Figure 2. First, search for images with similar semantics based on image retrieval techniques (see III. A). Then, conduct co-saliency detection on semantically similar images (see III. B) and select a suitable reference image according to the saliency map of the colorblind image and the standard image. Finally, the reference image is used to color the target image (see III. C) so that the color distribution of the colorized image is similar to that of the reference image.

A. CONTENT-BASED IMAGE RETRIEVAL
Extracting meaningful features from an image is difficult in content-based image retrieval. The human visual system has directional selectivity for visual attributes such as color, direction, and intensity information; therefore, integration of effective information throughout an image is a challenging problem. In this paper, the saliency structure model is used, and the saliency histogram is used to represent the effective features of the image through the saliency area as a visual clue. The image retrieval is shown in Figure 3.
It has been experimentally proven [35] that the human visual system is more sensitive to color, direction, and intensity. In order to briefly represent the main features of an image, feature quantization is required in the HSV space and limited colors are selected to describe the true color image to the greatest extent [51]. In order to obtain the color map, the H, S, and V color channels are uniformly quantized into 6, 3, and 3 bins so that a total of 6 × 3 × 3 = 54 color combinations can be obtained. Using M C (x, y) = w, w ∈ {0, 1, . . . , N C − 1} for color combinations, where N C = 54. The intensity information is represented by V. After quantization, the intensity information is represented as M I (x, y) = s, s ∈ {0, 1, . . . , N l − 1}, where N l = 16. In addition, edge information g(x, y) is detected using intensity information O(x, y) and the Sobel operator is used to detect the image. After uniform quantization, the edge pattern The main visual features of images are extracted in the HSV space, but not all features are effective. Based on this situation, a color volume is defined in the HSV space to describe the salient features. Since the shape of the HSV space can be simulated as a cylindrical coordinate system, for the color volume of a random point (h, s, v), the formula based on the volume of the cylinder can be defined as follows: where Convert the three-dimensional (3-D) coordinate system to the two-dimensional (2-D) coordinate system and define the color volume of the point (h, s, v) as follows: Combine the 3-D coordinates with the 2-D coordinate information to define cv = {cv 1 , cv 2 }. Gaussian pyramids cv (σ ) and g (σ ) are generated using cv and edge information g(x, y), with scale σ = [0, 1, 2, 3, 4]. Gaussian pyramid information is used to represent the center-surround receptive fields, as follows: where c, s represents two different scales and represents the subtraction operation. The feature map is obtained after the center surround, and the 2-D Gabor function is used for modeling. The model uses Gabor energy, selects the appropriate direction to detect the salient structure, and describes the image features. If the direction of the feature is the same as the direction of the Gabor, the feature is considered to be a significant structure. To avoid the repetitiveness of the significant structure space, the content-based saliency structure histogram (SSH) [52] is used for retrieval.

B. CO-SALIENCY DETECTION
Co-saliency detection is used to identify common salient objects in a group of related images. It is also widely used in image retrieval, image annotation, and video foreground extraction. To effectively extract the salient features of an image, not only the images in the collection are used, but also the visually similar images outside the collection are used. First, use the BING [53] method to extract 256 object proposal (OP) windows in the image and then use CNN with Restricted Boltzmann Machines (RBMs) to build higher level features [54]. In order to consider information more widely, use K-means to classify x m,p 256 p=1 as C k K k−1 , where the center of each class C k is given by c k K k−1 , and calculate the consistency between images as follows: Pr(x m,p y m,p = 1) where Ed(·) is the Euclidean distance. Use visually similar neighborhoods in image groups to expand the field of view. If the neighborhood backgrounds are similar, the risk of significant area division errors can be reduced and the purpose of suppressing background areas can be achieved.
To find significant features, the features are screened by scoring. A Bayesian frame was used to score the consistency between the contrast in the graph and the contrast in the group and judge the significance. We define the significance of x m,p as follows: Among them, use x m,p 256 p=1 to represent OPs of I m in the current image group and the 2-D random variable y m,p to indicate whether x m,p is a feature in the co-dominant region. 1 Pr(x m,p ) represents the contrast inside the image. In information theory, the logarithmic form of 1 Pr(x m,p ) is − log Pr(x m,p )the self-information of the random variable [55]. When the probability of OPs decreases and the self-information increases, it is more likely to be the foreground object. Pr(x m,p y m,p = 1) stands for consistency between images.
To obtain a clear boundary, the co-saliency scores of OPs need to be converted into pixel-level saliency maps. According to the method in [56], a foreground region agreement (FRA) is proposed and applied to two phases of intra-image and inter-image. For the FRA in the figure, the image is super-pixel, and the image I m is represented as where sp i is a single pixel and N m is the number of super-pixels in the image. Using the image classification pooling method, the co-saliency score of a single pixel sp i is equal to the sum of the co-saliency scores of all the pixels in the OPs, given by: where Cosal rgh (sp i ) represents the pixel value of a single pixel sp i and Area(·) represents the area. Based on the score of a single pixel, set an adaptive threshold and select the foreground pixels. After obtaining the intra-image FPA, use the popular sort algorithm [57] for the obtained foreground pixels to calculate the co-saliency score of each pixel.
For the pixel sp i of the intra-group FRA in I m , calculate the Euclidean distance between the features of the adjacent pixels and obtain the most similar pixel {sp n } (m−1)N n=1 ,the number of which is N, shown as follows: where ϕ (sp i ) represents a super-pixel set similar to sp i and exp (−φ (sp i )) represents the similarity between super-pixels.
According to the consistency of the super-pixels between the images, the saliency nodes in the image and the image are combined to obtain the saliency map in the image group.

C. IMAGE COLORING
A grayscale image removes the color information of an image, leaving behind only the brightness information, not enough to colorize the image using only gray information.
A gray image can be colorized by using prior knowledge or by manually setting rules. Based on the reference image coloring, find the similarity between the target image and the reference image and color by pixel matching. For different images, due to the pixel mismatching between the target image and the reference image, the shading effect is poor. In this paper, we use support vector machine (SVM) to calculate the initial class probability and perform the initial classification. To ensure the accuracy of the classification, the image is subjected to simple linear iterative clustering (SLIC) to improve the consistency of the classification space. Pixels with similar brightness, color, and texture constitute a super-pixel region, which eliminates duplicate information in the image and improves the processing speed of the image. Through brightness and texture features, combined with classification information, pixel matching is performed on the reference image and the target image. The pixels are colored according to the pixel matching result, and thus, the grayscale image is finally colored.
Owing to some differences in the brightness information between the target image and the reference image, direct region matching does not give satisfactory results; therefore, the pixels are classified. First, the reference image is remapped so that its brightness value is closer to that of the target image, and then area matching is performed. The brightness remapping formula is given in equation (11). Features used for region matching include brightness features and LBP for texture matching, where the mean brightness can better reflect the global information of the image.
whereȲ S and Y S are the brightness values of the reference image pixels after remapping and before remapping, respectively. µ T and µ S are the average brightness values of the target image and the reference image, respectively; and σ T , σ S are the standard deviations of the target image and the reference image, respectively. VOLUME 8, 2020 The matching information of the target image and the reference image is preprocessed through the brightness information to obtain a set of reference image subregions that have similar brightness characteristics as that of the target image subregions. To make the matching more accurate, the local matching (LBP) is used to complete the entire matching process in two image subregions with similar brightness characteristics. The LBP operator is a description of the local neighborhood texture in an image. Any pixel point g (x c , y c ) in a certain neighborhood in the image, g c is the gray value of the center point of the local neighborhood of the image, and the local texture T c = g c , g 0 , · · · g p−1 , c ∈ , g p is the gray value of the pixel points p evenly distributed in the neighborhood, such as the equation (12).
The LBP value of the local area centered on according to is given as follows: where P = 8 and R = 1. According to the brightness feature R L , T L and the LBP texture feature of the reference image and the target image, the similarity sim T ↔R between the color reference image R and the target grayscale image T is obtained. After obtaining the similarity between the images, the target grayscale image is colored according to the similarity between the target image and the reference image. When there is no similar relationship between the target image and the reference image, how to color the area? This paper refers to the method in [58] and uses an end-to-end network to color the grayscale image. For the chroma branch, consider T L , sim T ↔R , T ab as the input of the network and output the colored result P T ab , where T ab (p) = T ab (φ R→T (φ T →R (P))) is the result of reference to the chromaticity of the ground-truth image. Calculation of loss L 1 is performed on the network output result P T ab and the ground-truth chroma value T ab , making the image effects more realistic after coloring.
where p represents a pixel of the image. For the part where the similar region cannot be found in the reference image, the perceptual loss function is used for training, shown as follows: where F P (P) represents the features in the original image P Lab and F T (P) represents T Lab of the target image. The traditional grayscale image colorization method has a better coloring scheme for the matching area and a poor coloring effect for the nonmatching area. In this paper, the perceptual loss function is used to train the nonmatching area to effectively color it.

IV. EXPERIMENT
In this section, we first introduce some details of the algorithm implementation (see IV. A). Afterwards, the experimental process is introduced through several examples, and the experimental results are analyzed (see IV. B). Afterwards, we show a quantitative comparison between the detection result of the recolored image and the detection result of the unprocessed image (see IV. C).Finally, we compare our method with other methods and perform the human subjective evaluation (see IV. D).

A. IMPLEMENTATION DETAILS
During the experiment, image retrieval technology was used to establish a suitable image collection. The computer used in the experiment was configured with i7 3.6 GHz CPU, 16 GB RAM and a GTX 980 GPU, and compilation software MATLAB R2018a-64bit.
We use area under curve (AUC), average precision (AP), F-measure, Mean absolute error (MAE) and Root Mean Squard (RMS) to evaluate co-saliency detection of different image.The AUC is calculated by calculating the area under the ROC curve. AP is the area of the siege of the precision-recall curve and the two axes. PR curve cannot be used to evaluate the saliency picture more comprehensively in many cases. So people use the non-negative parameter β to weigh precision and recall. We set β to 0.3 to emphasize precision.

B. RESULTS AND ANALYSIS
To demonstrate the effectiveness of the proposed method, several images were selected as experimental objects. Example 1, Example 2, and Example 3 were, respectively, selected as the experiments on flowers, puppets, and natural landscapes. While conducting the experiments, the saliency detection was first performed on the standard image and the colorblind image, and the detection result was analyzed. Then, an image with less significant change was selected as a reference image. After graying out the remaining images, the reference image was used to recolor the grayscale image. In this case, the color distribution of the recolored image was more consistent with the reference image; thus, the saliency of the image was better retained, and the purpose of saliency correction of the colorblind image is achieved.

1) EXAMPLE 1
The saliency detection results of the original image and the colorblind image of Example 1 are shown in Figure 4, and the images in Example 1 are numbered 1-7 from left to right. The first row is the original image. The second row is the result of the saliency detection in the first row. The third row is the result of the first line image CVD simulation. The fourth row is the result of saliency detection on the third row of the image. It can be seen from the Figure 4 that after a normal image is converted to a colorblind image, the salient areas of some images are reduced, and even in some images, they have changed because with the change in the color distribution of the image, there is a chance of change in the saliency. However, there are also few parts of the image that have not changed in saliency but have been well preserved. This shows that the color-matching scheme of this image has less loss of color information during colorblindness simulation and retains the original state of the image to a large extent, which is suitable for colorblind viewing, because the colorblind patient's perception of this image is closer to the normal vision.
The saliency detection results of colorblind images and normal images were evaluated by AUC, AP and F-measure values. The higher the AUC score, AP score and F-measure, the more accurate the significant area detected. The analysis results of the color blind image in Example 1 are shown in Table 1, which comprehensively shows the AUC, AP, and F-measure values. Name label6 provides the better performance among all images, with AUC score of 0.764, AP score of 0.634, and F-measure score of 0.986. Meanwhile, name label 4 achieves AP score of 0.677, and outperforms Name label6 by 0.043. But AUC and F-measure scores of name label 4 are lower than name label6. We compare the saliency maps of the color-blind images name label4 and name label6, and finally decide name label6 as the reference image.
The reference images were used to color the significantly changed target image. The name label images 1, 2, 3, 4, 5, and 7 in Example 1 were gray-scaled, as shown in the first line of Figure 5. The second line is the result of coloring the gray image in the first line. It can be seen from the figure that the color scheme of the image after coloring is mainly concentrated in purple and green, similar to the reference image. The third row is a green blind simulation of the recolored image. The fourth row is the saliency detection of the image after colorblindness simulation. The fifth row is a true saliency map of the name label images 1, 2, 3, 4, 5, and 7. Figure 5 shows some qualitative results. Among them, it is obvious that the proposed method can better extract the co-saliency regions in different image. In unprocessed images, the leaves in the background shares the same color with the flowers, making the co-saliency detection methods fail to fully detect the flowers, but our method achieve satisfying results because it uses recolorization of target images that can well discriminate target from background. Analysis of the fourth and the fifth lines shows that the saliency of a colorblind image after color correction is closer to the ground-truth image.

2) EXAMPLE 2
In Example 1, experiments are performed on single background image. In this experiment, experiments are performed on complex background image. In Example 2, the image of a teddy bear wearing a red top was used as the experimental VOLUME 8, 2020 object, as shown in Figure 6. The first row is the original image. The second row is the result of the saliency detection in the first row. The third row is the result of the first line image CVD simulation. The fourth row is the result of saliency detection on the third row of the image. However, a colorblind patient whose vision is insensitive to red color cannot effectively identify the color. The third line in Figure 6 is the result of green blindness simulation of the original image, and the fourth line is the result of saliency detection of the simulated image. Comparing the fourth line with the second line shows a change in the salient area. After CVD simulation of the image, the vivid red in the original image is no longer significant but is replaced by the yellow part. But for the whole yellow bear, it can accurately detect the saliency area of the image.
Analyze the saliency detection results of the colorblind image in Example 2, as shown in Table 2. Name label7 provides the best performance among all images, with AUC score of 0.875, AP score of 0.756, and F-measure score of 0.669. Considering and analyzing the detection results comprehensively, choose name label 7 as the reference image and recolor the images with significant changes.
Grayscale the name label images 1, 2, 3, 4, 5, and 6 in Example 2 and recolor them using the reference image name label 7. The coloring results are shown in Figure 7. The second line shows the result of coloring the gray image in the first row. It can be seen from the figure that the color scheme of the image after coloring is mainly concentrated in yellow, which is similar to the color scheme of the reference image. The third row shows a green blind simulation of the recolored image. The fourth row shows the saliency detection of the image after colorblindness simulation. The fifth row shows the ground-truth image of a saliency image of each image.

3) EXAMPLE 3
In Example 3, difficult natural landscape images were used as experimental objects, such as red maple leaves, pink cherry blossoms, and purple lavender. In Figure 8, the standard image was subjected to green blind simulation. The first row is the original image. The second row is the result of the saliency detection in the first row. The third row is the result of the first line image CVD simulation. The fourth row is  The saliency detection results of the colorblind image in Example 3, as given in Table 3. Name label7 provides the better performance among all images, with AUC score of 0.815, AP score of 0.758, and F-measure score of 0.568. Meanwhile, name label 6 provides the best performance on F-measure. Name label7 AUC score and AP score outperforms name label 6 by 12.8% and 34.6%, respectively. After analyzing the data in the table, the name label 7 with better saliency was selected as the reference image and correct the rest of the images with saliency changes.
Using the reference image name label 7 to recolor the remaining images in Example 3, the coloring results are shown in Figure 9. The first line shows the result of graying out the target image. The second line shows the result of coloring the gray image in the first line. It can be seen from the figure that the color scheme of the colored image is mainly concentrated in yellow and green. The third line shows the result of green blind simulation of the images in the second line. The fourth line shows the result of saliency detection of the images in the third line. The fifth line shows the ground-truth image of the saliency maps of the name label images 1, 2, 3, 4, 5, and 6. These results verify the effectiveness of our saliency correction for color blind image.

C. COMPARISONS WITH UNPROCESSED IMAGE
In the previous section, the conducted qualitative analysis of the experimental results is detailed. The detection result after converting the original image into a colorblind image is compared with the saliency detection result of the recolored image by RMS, MAE and F-measure. For example, Figure 10 compares the RMS values of the two. As can be seen from the figure, the RMS value of the image after color correction is significantly reduced. This shows that the saliency detection result of the recolored image is more accurate, proving the effectiveness of the method proposed in this paper.
In addition, the MAE values of the recolored and unprocessed images were also compared. The comparison results are shown in Figure 11. It can be seen from the figure that the MAE curve of the recolored image is wrapped by the curve of the unprocessed image and the MAE value of the recolored  image is significantly reduced; hence, the detection result is more accurate.
To analyze the experimental results more comprehensively, the F-measure values of the recolored image and the unpro-88568 VOLUME 8, 2020 cessed image are compared, such as the histogram shown in Figure 12. It can be seen from the Figure 12 that the F-measure values of the recolored images are higher than those of the unprocessed images. After an image is recolored, the saliency detection result of the image is more accurate, thereby reaching the goal of saliency correction of the colorblind image. Through the above three evaluation metrics, test the effectiveness and feasibility of the method proposed in this article.

D. HUMAN SUBJECTIVE EVALUATION
Color-blind image correction should follow some principles, not just to distinguish colors for correction. For example, to make the patients better distinguish the colors, the oranges are recolored to blue. But the perception experience of patients with color vision deficiency tells themselves that oranges are not a blue fruit. Therefore, we should maintain perceptual learning while performing color correction [12]. Perceptual learning consists of the following parts: 1) Color naturalness. We want to make the perceived difference between the recolored image and the source image as small as possible so that the color of the image is natural and not obtrusive. 2) Colour consistency. We hope that the color mapping of the recolored image is regular, and the color distribution of the recolored image can be predicted.
3) Colour contrast. The contrast of different colors in the VOLUME 8, 2020   image is enhanced, which helps CVD patients to distinguish colors.
Based on the above principles, we compared the method in this paper with the previous way. As shown in Figure 13 and Figure 14.The first row is a recolored image with different methods. The second row is the result of a CVD simulation of the first row of images. The third row is the result of saliency detection on the second row of images. We can see from the Figure 13 that the color distribution of the images recolored by different methods is different. The correction of some methods can help color blind patients to achieve color differentiation, but it does not pay attention to the problem of saliency. CVD simulation was performed on the LMS recolored image, and the petals were blacker. The petals and leaves are similar in color, which is difficult to distinguish. Color contrast and Huang's methods [8] recolored the images  for CVD simulation. Although the petals are contrasted with the green leaves, the significance is not consistent. Lin's algorithm [6] recolors the petals to sky blue, and the color contrast of the image is enhanced. From Figure 13, we observe the salient areas detected by Lin's method. In addition to flowers, some green leaves are also treated as salient regions, and the salient regions have changed. Unlike Lin's recoloring method, this article recolors the petals to purple. This color is consistent with the actual situation, and after the CVD simulation, the salient area of the image is substantially unchanged. Figure 14 is a landscape map. There are more details in the image, and the content is more complicated. It is challenging to recolor for this kind of image. It can be seen from Figure 14 that the recoloring of the LMS method is incomplete, and the visual experience is poor. The color contrast method recolors the red maple leaf to pink, which is not desirable in perceptual cognition, and it violates the natural principle of color. The recolored by Huang's method is still standard, but after the color blindness simulation, the saliency of the image has changed. Lin's approach recolored the maple tree to blue-purple, which is extremely rare in nature, and the salient areas of the recolored image have changed. In the method proposed in this paper, we select the image whose saliency does not change from the standard color image collection as the reference image. We perform recoloring on the color-blind image according to the color scheme of the reference image, which meets the principles of natural color and color consistency.
Color feelings are more inclined to personal subjective opinions. To verify the effectiveness of the proposed method, we invite CVD patients to evaluate the recolored image. In the evaluation process, we tested images recolored by different methods and recorded the judgment results of the subjects. Among them, there are five recolor methods, named M1, M2, M3, M4, M5. These five methods are: M1: LMS method. This method converts RGB space to LMS space. In LMS space, color information lost during color blindness simulation is supplemented, and the lost information is mapped to visible long (red) and short (blue) waves.
M2: RGB Color contrasting method. This algorithm halves the RGB pixel values to provide space for the increase of red and green pixel values, thereby achieving contrast enhancement.
M3: Huang's color-blind image recoloring method [8]. This method uses a Gaussian Mixture Model (GMM) to represent color information and weigh key colors to ensure smoothness of the color of the recolored image. VOLUME 8, 2020  M4: Lin's color-blind image recoloring method [6]. This method uses the directional feature vector of the color vision defect simulation result to reverse the color distribution. RGB color space is converted to λ, Y-B, R-G color space for recoloring.
M5: The method proposed in this paper. We use the saliency-unchanged image as the reference image to recolor the saliency-changed image to achieve the purpose of recoloring.
By explaining the purpose and process of the test, we found 7 green blind patients in the school who agreed to participate in the trial. Participants were teachers and classmates from the university, including 5 males and 2 females, aged 36.4 ± 15.2 years. Besides, we also found 7 standard vision subjects. Among them, there were 4 males and 3 females, with an age of 37.6 ± 14.3. During the test, we need the subjects to point out the salient areas of the image within the specified time. We compare the salient area pointed out by the subjects with the real salient area. If the indication area is within a valid domain, the indication is considered correct. 1 score for right instruction and 0 score for false.
We selected 14 images and used different methods to recolor the color-blind images. The recolored images are divided into two groups, namely I and II. The background of I is relatively simple, and the background of II is relatively complicated. We record the judgment results of each image under each method and count the percentage of correct answers. We use the accuracy rate to evaluate the effect of single image correction. Besides, we calculated the judgment results of a group of images by all subjects. We use MEAN ± SEM to calculate the accuracy of each method. The MEAN ± SEM corresponds to the performance of the Mi method. The larger the number of right images, the higher the percentage and the better the correction effect.
To reduce the influence of other factors, keep the indoor light constant during the experiment. Firstly, testers enter gender and age into the computer. Immediately after, the next instruction message appears on the screen to ensure that the examiner understands the program correctly. To ensure the accuracy of the experimental results, we specified that the subjects observe each image for 3 seconds. The monitor is color corrected, and the screen resolution is set to 1366 × 768 pixels. Table 4 and Table 5 record the tests. Table 4 is the statistics for the group I, and Table 5 is the statistics for group II. Among them, Mi, Imagej represents the accuracy rate of the jth image under the ith method. We use MEAN ± SEM to calculate the accuracy of each method. We can see from the data in Table 4 that the accuracy of judging the salient area under standard vision is high, but there are also cases of misjudgment. The recoloring effect of M1 and M2 methods is relatively weak, and the highest accuracy of a single image does not exceed 50%.The MEAN ± SEM of the M1 and M2 methods are 0.2449 ± 0.0408 and 0.3061 ± 0.0373, respectively. The accuracy of the M3 and M4 methods has been improved. The accuracy of a single image has exceeded 20%, and some even exceeded 50%. M5 has high accuracy and MEAN ± SEM, second only to standard vision. Compared with the previous four methods, the performance of M5 is the best, MEAN ± SEM is 0.8163 ± 0.0263. Table 5 shows the statistics of group II images. Comparing the two sets of data in Table 4 and Table 5, the accuracy rate in Table 5 under standard vision is much lower than that in Table 4. This is because the background of the first group of images is relatively simple, and the context of the second group of images is relatively complicated. It can be seen from the MEAN ± SEM of the two sets of data that the performance of the M1 and M2 methods is relatively low, and the MEAN ± SEM is not high. The performance of M3 and M4 methods is better than M1 and M2, and the MEAN ± SEM of the M4 method is more elevated than M3. M5 achieved excellent performance in all two groups of experiments. The accuracy of a single image and the MEAN ± SEM are high. The performance of the five methods is sorted, and the order of performance from best to worst is M5, M4, M3, M2, M1.

V. CONCLUSION
In order to facilitate color difference detection in patients with color vision deficiency for salient areas at a level comparable to people having normal vision, this paper proposes saliency consistency-based image re-colorization. First, a contentbased image retrieval method was used to find many images to form an image collection. CVD simulation was performed on the images in the collection, and co-saliency detection was performed on the initial image and the CVD simulation image to compare the saliency maps of the two. Then, the image whose salient area remained almost unchanged was selected as the reference image and used to color the image whose salient area changed so that the color distribution of the colorized image is similar to that of the reference image. Because the color scheme of the recolored image has less impact on a colorblind patient, the colorblind patient's perception of the image is closer to that of a person with normal vision. The result not only achieves the purpose of significant correction but also meets the requirements of the required color correction. In different evaluation metrics show promising detection performances on various image. In future work, we want to explore the recolorization of color blind images in videos.