Boundary Preserved Salient Object Detection Using Guided Filter Based Hybridization Approach of Transformation and Spatial Domain Analysis

Techniques for detecting salient objects mimic human behavior by recognizing the most noticeable parts of images as objects. Salient object detection has attracted many researchers’ attention nowadays for various computer vision and pattern recognition applications. In this paper, a unique approach is proposed based on the global and local saliency detection using wavelet transform and hybridizing it with learning-based saliency detection using a guided filter. First, the input image is subjected to superpixel segmentation to achieve visually uniform regions and to reduce the computational cost. The global and local saliency maps are then generated using global and local features extracted by the wavelet transform of the segmented image, as the wavelet transform gives a multiscale analysis of images in frequency as well as in spatial domain. The learning-based saliency maps are generated using random forest regression which considers the location, color, and textural features of the segmented image. The global and local saliency maps are fused to generate the wavelet-based saliency map which is further hybridized with the saliency map generated using random forest regression. The paper discusses the novel technique for hybridizing wavelet-based and learning-based saliency maps using a guided filter-based attention map generation. Several experiments are conducted on five different saliency datasets containing images with complex backgrounds, multiple objects, and low contrast. To evaluate the efficacy of the proposed method, extensive qualitative and quantitative performance analysis is carried out. Experimental results validate the significant improvement in the detection of salient regions as compared to the state-of-the-art methods.


I. INTRODUCTION
Salient object detection is a technique where some objects draw more visual attention than its surrounding. The human visual system (HVS) is a very complex process and it can easily detect salient objects. But it is very difficult to apply HVS in the applications of computer vision. Salient object detection methods are mostly based on the development of human attention mechanisms. The main aim of salient object detection is to localize the objects of interest as per HVS and The associate editor coordinating the review of this manuscript and approving it for publication was Gangyi Jiang. suppress the other surrounding part. It detects to what extent the objects are distinct from its surrounding and based on that the saliency maps are calculated. Salient object detection is considered to be a very effective step as it reduces a lot of computational time in a selection of computer vision applications like image segmentation, object recognition, image compression [1], image thumbnailing, image quality assessment, image retrieval, video summarization, object tracking, and medical imaging, etc. [2]. Zhu et al. [3] used perceptual information using visual saliency detection for image style transfer which is an artwork application of saliency detection. Many deep learning techniques for salient object detection (SOD) tasks are proposed in recent years [4] which gives very good results at the cost of large data availability, very high computational and architectural complexity. Also, their results are quantitatively good but lack in preserving the complete boundaries and edges of the objects. The work in [5] used contour information to obtain the boundary preserved saliency maps by using a twice learning strategy. This method gives good results however the CNN model has to be trained twice, first to obtain the contour map, and this contour map is used as an input for the second training to obtain the final saliency map. This twice learning strategy may result in complex architecture. As a result, the work of SOD employing handcrafted features plays an important role in today's world for applications where data availability is limited and low computing complexity is preferred with better preservation of details.
In the proposed work superpixel segmentation is used to convert an image into visually consistent regions called superpixels, with the improvement in the computational efficiency of the proposed method. Superpixels are non-overlapping and aesthetically meaningful sets of pixels that can be used in place of individual pixels. HVS is more attracted to image regions than the individual pixel, as image regions give more information. In recent years, there has been a rise in developing saliency detection algorithms in both the spatial and frequency domains. It is very important to consider an acceptable range of frequency components to detect the entire salient region consistently. In most of the spatial domain approaches, the high-frequency components are discarded which results in the blurring of edges and boundaries of the objects. So the complete salient region in an image can not be highlighted uniformly. This limitation can be addressed by using frequency-domain techniques. The wavelet transform deals with the multiresolution analysis of an image, i.e. it uses both spatial and frequency domain-based knowledge to generate the saliency maps. The majority of saliency detection algorithms consider only the local saliency features, which are based on the image's fundamental characteristics, but the overall significance is frequently overlooked. The global saliency map depicts global information based on low-level characteristics. It accurately and consistently recognizes the complete important object. As a result, combining global and local saliency maps can improve saliency detection performance. In the proposed work the local and global saliency maps based on wavelet features are fused to generate the final wavelet transform-based saliency map. The learning-based method deals with the features i.e. spatial distance and local and global color contrast between the superpixels, and textural distance to obtain the saliency map using random forest regression. With the help of the wavelet transform-based method, we are concentrating on the textural information and by using the learning-based method, location, color, and textural contrast-based features are considered.
For creating the final saliency maps, most state-of-theart algorithms use saliency map integration of previously produced baseline saliency maps [6]- [8]. For integrating baseline saliency maps many methods [9], [10] use simple pixel-wise addition or multiplication. Edge blurring or artifacts around the object boundaries may emerge if these methods ignores the intensity variations of nearby pixels. Some of the methods integrate global and local saliency maps using a weighted average, however, the selection of weights is generally done by trial and error. Meta-heuristic optimization approaches have also been proposed, although they come at a higher cost in terms of computing. In the proposed work the novel hybridization technique using guided filters is presented, to intelligently integrate the more visually prominent regions from the wavelet transform-based saliency map and learning-based saliency map. A guided filter-based model extracts the prominent visual features by generating the decision maps of the wavelet and learning-based saliency maps and gives a more informative and perceptually appealing final saliency map. Mainly the work in video saliency detection requires computationally efficient methods to apply in real-time video salient object detection tasks. The proposed method can also be applied to video saliency detection [11] in the future as it is computationally efficient and has simple architecture.
In the era of deep learning-based methods, the employment of handcrafted feature-based algorithms and conventional computer vision techniques to solve SOD tasks is still highly significant [12], [13]. Traditional computer vision methods can solve problems faster and with fewer lines of code than deep learning-based methods. Handcrafted featurebased methods, such as simple color thresholding and pixel counting techniques, are not class-specific, which means they work with any image making it independent of the data. Deep learning features, on the other hand, are specific to the training dataset and are unlikely to work well for images other than the training set if poorly created. If the training dataset is too small, the machine may be unable to generalize to the situation at hand due to over-fitting. Manually changing the model's parameters would be too complex because a deep neural network has millions of parameters, each with complicated inter-relationships. Most recent deep learning algorithms may achieve far higher accuracy but at the cost of billions of additional math operations and a higher computing power demand. For some applications where low computational complexity and less data dependency are required, machine learning-based algorithms play a very important role. As machine learning techniques are computationally less complex and can work on less amount of data than deep learning methods. So the handcrafted feature-based and machine learning-based salient object detection techniques are not obsolete. In the proposed work machine learning-based salient object detection technique is used which makes the proposed method computationally efficient.
The following are the major contributions of the proposed work: • A novel technique is used to obtain local and global saliency maps using wavelet transform, as earlier VOLUME 10, 2022 research worked only on generating global saliency maps using wavelet transform.
• The learning-based saliency map is predicted by a random forest regression model using location, color, and textural features of superpixels.
• The novel hybridization method to merge the waveletbased and learning-based saliency maps using a guided filter is proposed. Guided filters work on decision map generation by refining the attention maps which considers saliency maps using wavelet transform and learning-based method as guidance images.
• The proposed method detects more than one salient object accurately by preserving boundaries and edges of salient objects concerning other state-of-the-art methods. The remainder of the paper is laid out as follows. Section 2 discusses, in brief, the salient object detection techniques concerning wavelet transform-based as well as learning-based approaches. Section 3 presents in detail the implementation steps of the proposed method. Section 4 presents the experiments and discussions. The run time analysis is presented in section 5. Finally, the limitations and conclusion are presented in sections 6 and 7 respectively.

II. RELATED WORK
The saliency methods are mainly based on two approaches: generation of eye fixation maps and generation of saliency maps. The work in [14] and [15] indicated saliency by capturing eye gaze points. The eye fixation point prediction is used in various applications such as advertisement placement in a video, finding fixation scan paths of an eye, etc. But the tasks in computer vision and pattern recognition mainly deal with locating and segmenting the salient objects, where locating the eye fixation points does not work. So other methods like [7], [16] provided the results as a saliency map which indicates the probability of each pixel being salient. Further, the saliency detection task is classified into bottom-up and top-down approaches. Bottom-up techniques are typically used for handcrafted low-level features that are data-driven. The bottom-up saliency approaches [2], [17], [18] employ self-information, histogram, and region-based characteristics, dissimilarities measured in local areas, weighting based on information content, and a frequency refined approach to compute the saliency. Many saliency detection [6], [17], [19] techniques use graph-based approaches with the help of handcrafted low-level features and different saliency priors which has improved the saliency detection results to great extent. All the above-discussed methods use saliency models to identify the most important object based on color contrast. As a result, items with comparable colors to the background will not be identified by these methods, and textures and edges will not be retained. Bottom-up techniques are concise to develop and converge faster, but they have limitations when dealing with low contrast and dynamic patterned images. Top-down saliency detection methods, on the other hand, are task-driven and use convolutional and deep neural networks to recognize task-specific high-level features [4], [20]. The use of such technologies improves performance in this area but at the cost of a significant amount of data and a lot of computer power. Also recently many researchers are focusing on depth enhanced saliency detection models [8], [12], [20] which take the information from depth cues to enhance the performance of saliency detection.
Salient object detection techniques can also be categorized as supervised, semi-supervised and unsupervised techniques as suggested by Ji et al. in [4]. The proposed method can be treated as the hybridization of supervised and unsupervised saliency detection techniques. The transform-based saliency detection and hybridization part using guided filters of the proposed method does not involve any training process with labeled data so they are categorized as unsupervised saliency detection while the learning-based saliency map generation process of the proposed method involves training of random forest regression model using labeled data to predict the result, so it can be categorized to supervised saliency detection technique.
The wavelet transform-based methods are used in the literature on salient object detection for extracting the image features in frequency domain [21], [22]. The work in [23], [24] utilizes wavelet transform for detecting the salient objects. The main advantage of using wavelet transform for salient object detection is that it uses multi-scale analysis of the image by considering frequency domain as well as spatial domain information at the same time. The saliency model based on wavelet transform [25] detects the salient regions which consider only detail coefficients at different scales to generate multi-scale feature maps. The method proposed by [24] utilizes the visual data extraction by using a combination of wavelet transform and contrast mechanism. Here the difference of Gaussian function is used as the basis function for the wavelet transform and wavelet transform decomposition is applied to the multi-channel of the human visual system. The wavelet-based saliency detection techniques help to preserve the details of the objects to a great extent but the entire object cannot still be detected using only wavelet transform.
In recent research, machine learning-based bottom-up saliency detection methods are also extensively accepted. For saliency identification, Pang et al. [16] employed a bagging-based distributed learning strategy that uses training samples based on center prior and background prior information. Singh and Kumar [26] and Lei et al. [27] used a framework based on bagging and Bayesian decision, respectively to enhance the basic rough saliency map extracted using several saliency techniques. Tong et al. [9] used a bootstrap learning strategy to create a powerful classifier that can differentiate between prominent and background objects. Yang and Yang [28] used a conditional random field (CRF) and a visual dictionary to quantify saliency. Huang et al. [29] proposed a saliency metric that takes into account both object suggestions and multiple instances of learning. The suggested method's learning-based saliency detection is also based on a machine learning-based bottom-up salient object recognition methodology. All the above-discussed models are accurate and simple but they require more computational time. So in this paper, we have used superpixel-wise computation to overcome the limitation of high computational time. Superpixel segmentation is generally used to reduce the computational cost of many algorithms, which is largely used in salient object detection research. Superpixel-wise segmentation helps to convert an image into homogeneous regions which are considered as the image elements for that particular group of pixels, which further helps to reduce the computational overhead.
The wavelet transform-based salient object detection method helps to preserve the edges and boundaries of the objects but can not highlight the object as a whole. While learning-based techniques consider various visual features to detect the complete salient objects but can not preserve the object boundaries. So, to utilize the advantages of both wavelet-based and learning-based salient object detection methods, the proposed method merges wavelet-based and learning-based saliency maps using edge-preserving guided filters. There are many methods proposed to merge the baseline saliency maps. To incorporate several existing saliency approaches, Xu et al. [30] presented an arbitrator model. The method created a reference map by combining the results of numerous existing saliency methods with external knowledge to create a reference map. After that, the method picks up on current saliency methods' expertise. Finally, to merge the known saliency methods of varying expertise with the reference map, a unique integration framework based on Bayesian inference is used. Qin et al. [31] proposed a cellular automata aggregation approach for incorporating diverse saliency maps created by existing saliency methods. These integration methods fail to preserve the details of existing saliency maps. So the proposed method uses boundary preserved guided filter-based integration which improves the salient object detection task by preserving the edges and boundaries of objects and also by detecting more than one salient object with better accuracy.

III. PROPOSED METHOD
The workflow of the proposed method based on hybridization of wavelet-based and learning-based saliency maps using guided filters is shown in Fig.1. The implementation details of the proposed method are given below.

A. SUPERPIXEL SEGMENTATION
Single pixels are less appealing to the human visual system than image sections. In the proposed method firstly an input image is transformed to CIELAB color space, as CIELAB color space is thought to be highly similar to human perception. An image is split into the superpixels after it has been converted to CIELAB color space. Many research [12], [32]- [35] have also shown that superpixel segmented images are computationally intensive and particularly successful for salient object detection. We obtain the superpixels SP from an input image I by over-segmenting an image, where SP is given by, In the proposed work the number of superpixels P is considered to be P = 500. For segmenting an image into superpixels VOLUME 10, 2022 SLIC superpixel algorithm [36] is used. It is computationally effective as the search space in SLIC superpixel segmentation is considered to be the region similar to the size of the superpixel instead of the whole image.

B. SALIENCY MAP GENERATION USING WAVELET TRANSFORM
The proposed method works on obtaining the local and global feature maps using wavelet transform. The local saliency maps detect the saliency of pixels within the fixed neighborhood and global saliency detects the saliency of pixels considering the entire image. Wavelet transform is preferred over other transform domain techniques as it deals with multiresolution analysis, while other transforms like Fourier transform, performs more local analysis. In the proposed method, the l, a, and b values of all the pixels which belong to one superpixel are replaced by the l, a, and b values of the center of that superpixel. The most unique superpixels are located by applying the Wavelet transform to the l, a, and b channels of the superpixel segmented image. The application of wavelet transform to the superpixel segmented image is given by Eq. 2.
SP C denotes the superpixel of the image at each color channel. DWT (.) denotes the Discrete Wavelet Transform and A C S and D C S represents the approximation coefficients and detailed coefficients containing horizontal, vertical and diagonal details of an image, at different scales for each color channel C ∈ {l, a, b}. S ∈ {1, 2, . . . ., N } indicates the decomposition levels of wavelet transform where N is the maximum decomposition level which we have considered 8 here. Based on the superiority of wavelet transform in multiscale decomposition, the wavelet transforms coefficients at each scale are used to extract the local and global features which are further used to generate the local and global feature maps for salient object detection. The local feature maps are obtained by using some statistical process on low pass sub-band and global feature maps are obtained by utilizing the high pass sub-bands of the wavelet transform.

1) LOCAL SALIENCY MAP GENERATION USING WAVELET BASED FEATURES
The local feature maps using wavelet transform are obtained by extracting features from low pass sub-bands. After each decomposition level, the low-pass sub-band is partitioned into blocks of p × q superpixels, and the local variance of each block is taken as the block's local feature map value, as indicated in Eq. 3.
These local feature maps describe the image's different textural features at different decomposition levels. To generate the local saliency map, the local feature maps are linearly combined using low entropy criteria. The local saliency map is given by Eq. 4.
w C S describes the weights to corresponding features at each decomposition level for each color channel. The weight values are decided based on the entropy of the local feature maps. A low entropy value generally indicates that the data is concentrated towards one value i.e. the spread is less, which is the useful criteria for saliency detection. So the weights are assigned according to the Eq. 5.
M is a two-dimensional centered Gaussian mask with elements that have the highest value of 1 and with the same size as the channel feature map. Here centered Gaussian map is considered, since saliency is generally considered at the center of an image. Nr (·) indicates the normalization function applied to local feature maps. H (·) refers to the entropy value of the smoothened local feature maps which are filtered with Gaussian kernel G.

2) GLOBAL SALIENCY MAP GENERATION USING WAVELET BASED FEATURES
The global saliency map is generated based on the selection of detailed coefficients of wavelet transform over each scale. These wavelet coefficients are further used to generate the feature maps with multiresolution analysis using superpixels.
To extract the global features of an image from the three color segmented channels, inverse wavelet transform is used where the reconstruction is done only using the detailed wavelet coefficients from finer to coarser scale which is given by Eq. 6.
where Global C S denotes the global feature maps for each color channel at various decomposition levels and IDWT (·) denotes the inverse discrete wavelet transform which is used for reconstruction. The feature maps are scaled using the scaling factor k which is considered as k = 10 4 . The scaling of the feature map is important to minimize the large fluctuations in the co-variance matrix while calculating inverse wavelet transform. The feature maps at different decomposition levels represent the l, a, and b color channels features from finer to coarser level i.e. it indicates the features from edges to different textures of an image. To obtain the global saliency map from the extracted features, only the maximum value from three color channels {l, a, b} is considered and finally these maximum valued features are summed over all the decomposition levels. The Gaussian lowpass filter G is applied to this global saliency map to obtain the smooth saliency map Smap G (x, y) which is given by Eq. 7.
The saliency map based on wavelet transform is obtained by merging the local and global wavelet-based saliency maps. The saliency map using wavelet transform is given by Eq. 8.
Smap W represents the final saliency map using wavelet transform by fusing local and global saliency maps. Saliency is nothing but the region of an image that stands out in the entire image. Global saliency helps to preserve the edges of the objects, while local saliency takes neighborhoods into account to detect the objects accurately. So, the merging of local and global saliency gives prominent results in the salient object detection tasks.

C. SALIENCY MAP GENERATION USING LEARNING BASED METHOD
The generation of saliency map based on learning-based method deals with spatial, color, and texture difference of the superpixel with K neighboring superpixels. Here we used features as the euclidean distance of each superpixel from K-nearest superpixels, a global and local color distance of each superpixel from K nearest superpixels, and textural distance of superpixels from K nearest superpixels. The value of K we have set to 25 as it was giving best results considering the F-measure value. For every l th superpixel SP l , we first obtain K-nearest superpixels SP l 1 , SP l 2 , SP l 3 , . . . . . . SP l K .

1) SPATIAL DISTANCE FEATURES
The euclidean distance feature vector of l th superpixel from K-nearest superpixels D l ∈ R K ×1 is given by Eq. 9.
D l = d y l , y l i {i∈1,2,..,K } Here, y l and y l i denotes the center location of the l th superpixel and its K-nearest superpixels. d y l , y l i denotes the euclidean distance of the l th superpixel from its K-nearest superpixels.

2) GLOBAL COLOR DISTANCE FEATURES
The global color contrast feature vector of l th superpixel from K-nearest superpixels D GC l ∈ R 8K ×1 is given by Eq. 10.
Here 8 color channels i.e. CIELAB, RGB, hue and saturation are used to obtain the color contrast features. c l and c l i are 8 dimensional color vectors. d c l , c l i denotes the color distance of l th superpixel from K-nearest superpixels.

3) LOCAL COLOR DISTANCE FEATURES
The local color contrast feature vector of l th superpixel from K-nearest superpixels D LC l ∈ R 8K ×1 is given by Eq. 11. Here also 8 color channels i.e. CIELAB, RGB, hue, and saturation are used to obtain the color contrast features.
where w l i is given by Eq. 12.
p l indicates the normalized position of l th superpixel and Z l is the normalization term. This function of weight is adopted to give more weight to immediate neighboring superpixels. We have set the value of σ 2 = 0.25. The dimension of the feature vector for each superpixel concerning spatial distance is K and concerning color contrast is 8K.

4) TEXTURAL DISTANCE FEATURES
The texture distance features indicate the distance between feature vectors that represent region textures which are used to calculate texture deviation. The texture feature vector of l th superpixel from K-nearest superpixels D T l ∈ R 10K ×1 is given by Eq. 13.
where t (·) indicates the textural attributes of the superpixel region, such as gradient mean, gradient direction, and histogram of the gradient. d t l , t l i indicates the Euclidean distance between the textural attributes of l th superpixel from K-nearest superpixels. The dimension of the feature vector for each superpixel concerning textural distance is 10K.

5) SALIENCY MAP GENERATION USING RANDOM FOREST REGRESSION
The feature vectors for all the superpixels are obtained as per Table 1 and the saliency map Smap RF is generated using these feature vectors by using random forest regression [37] algorithm, as it is very effective for large dimensional feature vector. For training the random forest, we have used 3000 images from MSRA-B dataset [38] and annotated ground truth images we have used as labels. From MSRA-B dataset, 1500 images are used for testing and remaining images are used for validation. The trees used for the random forest regression model are 200 with the maximum level of the tree as 10 [39]. Random forest regression output decides the saliency of the superpixel according to the extent to which the particular superpixel belongs to the foreground or background region and a corresponding saliency map is produced for an image. VOLUME 10, 2022

D. FINAL SALIENCY MAP GENERATION USING GUIDED FILTER BASED INTEGRATION
The final saliency map is created by fusing the attentive regions of the wavelet domain-based saliency map Smap W and learning-based saliency map Smap RF using guided filters [40]. The stepwise description of attention-based saliency map fusion using guided filters is as follows.
Step 1: The saliency maps Smap W and Smap RF are considered to be the source images for obtaining the final saliency map using guided filter-based fusion. To detect the attention regions, the source images are first blurred using mean filters. The textural or edge information is then obtained by finding the difference between source images and their mean filtered images with window size w. f a is mean filter with window size w as 3×3. The high pass edge maps of the corresponding source images are given in Eq. 14 and Eq. 15.
Step 2: The high pass information maps are further refined using guided filters [40] by considering Smap W and Smap RF as guidance images. The refined high pass maps are given by Eq. 16 and Eq. 17. (17) where GF R,θ () represents the guided filter having local window radius R and degree of blur θ of kernel function. In the proposed work we have considered R = 5 and θ = 0.3 by trial and error approach.
Step 3: The attention measures of the source images Smap W and Smap RF are represented by REmap 1 and REmap 2 , respectively. As a result, the pixel-wise maximum rule of the matching accurate attention maps REmap 1 and REmap 2 is used to generate an initial decision map. The initial decision map is given by Eq. 18.
Step 4: The non-attentive regions of the initial decision map IDmap 1 contain some non-required spots or burrs which can be removed using morphological opening and closing operation.
Eq. 19 indicates the morphological opening operation on the initial decision map using disk structure window D of radius 10. The attentive regions in O have some blurs which can be further eliminated by applying the morphological closing operation with disk structure window D. Eq. 20 generates the decision map IDmap 2 using morphological opening and closing operations.
Step 5: Following the tiny region elimination technique using morphological opening and closing operation, an initial fused saliency map Smap I is obtained as stated in Eq. 21 using decision map IDmap 2 and source images Smap W and Smap RF .
Step 6: By utilizing the processed initial decision map IDmap 2 , the edges between attention and non-attention regions of the source images cannot be retained, which may result in some distortions in generating the final saliency map. So, the guided filter is employed once again to obtain boundary preserved final decision map by considering the initial fused saliency map Smap I as the guidance image which helps to improve the weights of the final decision map. The desired final decision map FDmap is obtained by Eq. 22 by filtering decision map IDmap 2 and considering initial fused saliency map Smap I as the guidance image.
FDmap (x, y) = GF R,θ Smap I (x, y) , IDmap 2 (x, y) (22) Step 7: The final fused saliency map Smap F is thus obtained by pixel-wise weighted averaging of source images and final decision map which is stated in Eq. 23.  [13].

A. DATASETS FOR SALIENT OBJECT DETECTION
For the performance evaluation of the proposed technique, five key object detection datasets are used, as shown in Table 2. Images with a complicated and messy background, poor contrast, and many objects can be found in the datasets. The suggested method's performance is assessed across the full dataset to demonstrate its capacity to operate efficiently and reliably across a diverse group of images.

B. EVALUATION METRICS
The performance of the suggested technique is evaluated using the assessment metrics listed in Table 3. Receiver operating characteristic (ROC) curve, precision-recall (PR) curve, area under the ROC curve (AUC) score, mean absolute error (MAE) score, F-measure (Fm) score, enhancedalignment measure i.e. E-measure (Em) score and structure measure i.e. S-measure (Sm) score are some of the evaluation metrics. For measuring the overall performance of salient object detection algorithms, F-measure, recall, and precision are commonly utilized [46]. In Table 3, Smap FT x,y and Smap FN x,y are thresholded saliency map and normalized saliency map, respectively. GT x,y is the ground truth of the image. The F-measure value is a comprehensive metric for saliency detection tasks since it combines the recall and precision values. Thresholded saliency maps of the acquired grey level saliency maps are required to evaluate these metrics. For a grayscale saliency map with pixel values in the range of [0, 255], the threshold is altered from 0 to 255 to create segmented binary saliency maps. The resulting saliency map Smap F x,y is binarised using thresholds 0 to 255 to evaluate the PR curve. The recall and precision scores are computed for every value of the threshold, which is then plotted on the precision-recall curve. FPR and TPR values are also computed at each threshold to plot the ROC curve. The ROC curve provides a two-dimensional explanation of the effectiveness of the proposed model, whereas the AUC value condenses this description into a single number. The AUC value is calculated using the area under the ROC curve. The overlap-based performance measurements do not take into consideration the true negative assignment of saliency. These measures favor methods that give high saliency to prominent pixels while failing to detect non-salient areas. Continuous saliency maps are more important than thresholded binary saliency maps in some applications, such as content-aware image scaling. In such cases, the MAE does a thorough comparison of the saliency map and the ground truth. The MAE is calculated as the difference between the ground truth and the normalized final saliency map Smap FN x,y , which is normalized in the range [0, 1]. In Table 3 the terms S o and S r in Sm calculation are the object aware and region aware structural similarity which can be obtained from [48]. The term (x, y) in Em calculation is enhanced alignment matrix which can be obtained using [49]. Sm score analyses structural similarity between real-valued saliency map and the binary groundtruth, instead of just pixel-wise errors. Em score takes into account the global average of the image as well as local pixel matching simultaneously.

TABLE 4.
Quantitative analysis of the proposed algorithm with other saliency detection techniques based on the MAE, F-measure and AUC values on five benchmark saliency detection datasets. (Red, green, and blue highlight the three leading models, respectively. AUC and Fm score higher is better. MAE score lower is better.).
suggested method detects numerous salient objects with boundary preservation. These approaches can preserve borders to some extent but fail to recognize many salient objects, as can be seen in Fig. 4 -Image 1. From Fig. 4 -Image 1 it can be seen that only the proposed method can detect all the deer present in the image with correct boundary details. The SMD VOLUME 10, 2022  [60] technique in Fig. 4 -Image 2 can detect numerous salient objects, but not in the image displayed in Fig. 4 -Image 1. In the HKUIS dataset, the proposed technique detects several salient objects for all types of images. Fig. 5 -Image 1 indicates that the proposed technique correctly recognizes the entire salient object, unlike all previous methods, which miss the tail section of the object. Fig. 5 -Image 2 shows an image with 5 salient objects, with only the proposed method accurately detecting all of them when compared to all other methods. Fig. 6 -Image 1 and Image 2 shows the images with complex background. It can be seen from the figures only the proposed method and SMD [60] method can provide qualitatively good results by detecting objects to a great extent. From Fig. 6 -Image 2 it can be seen that the proposed method is preserving the boundaries of the object as compared to other salient object detection techniques, specifically as compared to SMD [60] which is giving competitive results. According to the qualitative study, the proposed method surpasses existing methods for all datasets covering all conditions. The proposed method, in comparison to other saliency methods, generates a high-resolution saliency output on various difficult natural images. When compared to other approaches, the suggested method generates a saliency map that uniformly highlights salient regions while effectively suppressing background regions. The proposed method completely preserves the boundary and sharp details of images. Even in complicated background images, it accurately separates background and foreground and detects the salient sections by preserving the edges and boundaries of the objects. It also detects many important objects with more accuracy than other approaches.

D. QUANTITATIVE ANALYSIS
A quantitative evaluation of the proposed method as well as other existing salient object detection techniques is also carried out on five benchmark salient object detection datasets listed in Table 2. Figs. 7 and 8 demonstrate the relative performance of the proposed method with other saliency detection methods using the PR curve, and ROC curve, respectively. The proposed technique has the best performance in terms of PR curve on the DUT-OMRON, HKUIS, MSRA10K and SOD datasets and comparable performance on the ECSSD dataset, as shown in Fig. 7. Furthermore, in terms of ROC curves presented in Fig. 8, the proposed method beats other methods and provides the best performance for practically all datasets. In Table 4, a comparison analysis based on MAE, F-measure (Fm), and AUC scores is also done and represented. The proposed technique earns the highest AUC score for all the datasets except the ECSSD dataset where it is in the second-highest position, demonstrating its superiority in reliably differentiating the background and foreground areas in salient object identification tasks. The proposed technique also obtains the MAE score and F-measure score in the top three positions as compared to other techniques, as shown in Table 4. For all the datasets, the proposed method outperforms most other methods in terms of AUC value and obtains MAE and F-measure values that are equivalent to the best-performing methods. Table 4 shows that the proposed technique performs much better than the other approaches investigated. The following points can be interpreted by comparing the results in Table 4: • In comparison to the proposed method, the methods CA [50], SEG [52], and GR [51] perform poorly in terms of AUC, MAE and Fm score.
• The proposed method gives the highest AUC score on all the datasets except the ECSSD dataset where it gives the second-highest performance as compared to CDHL [7] technique.
• The proposed method achieves first, second, or third highest performance in terms of Fm score, while MAE score is comparable to top-performing methods as compared to existing salient object detection methods. The overall performance of the proposed method is better or comparable to all the mentioned salient object detection techniques.
• For SOD dataset: the proposed method improves AUC score by ≈1.21% than BSDL [16] and 17% than FCB [61] [53], RR [55] and LGF [56] methods. The metrics discussed in Table 4 are only evaluated at the pixel level and do not account for the structural similarity of the objects. In Table 5, a comparison analysis based on S-measure (Sm) score, and E-measure (Em) score is represented which can take into account structural details as well as global shape. It can be observed from Table 5, that the proposed method gives the highest Em and Sm values for HKUIS and SOD datasets. For DUT-OMRON, ECSSD and MSRA10K datasets, the proposed method achieves performance in the top three positions as compared to other VOLUME 10, 2022 TABLE 5. Quantitative analysis of the proposed algorithm with other saliency detection techniques based on the S-measure and E-measure values on five benchmark saliency detection datasets. (Red, green, and blue highlight the three leading models, respectively. Sm and Em score higher is better.).  mentioned salient object detection methods. This shows the superiority of the proposed method in retaining the structural details of the objects as compared to other salient object detection techniques. , HRSOD-DH [72], JDFPR [73] and SCRN [74]. The comparison of the mentioned deep learning-based techniques with the proposed technique based on MAE and AUC scores is presented in Table 6. The proposed approach achieves equal or better results than deep learning-based techniques, as shown in Table 6. Considering the dataset's reputation for challenging images with crowded backgrounds, low contrast images, and images with many objects, the proposed method can give significantly better results than some of the deep learning-based techniques like HRSOD-DH [72], SSNet [69] Fig. 9 that the proposed method is more efficient in detecting complete objects in images with cluttered backgrounds by preserving the fine details and boundaries of objects as compared to many of the deep learning-based methods. From Fig. 9 it can be observed that though deep learning-based methods give a good score in terms of MAE score and AUC score, in some cases they fail to detect the complete object in an image. While the proposed method can preserve the boundaries and fine details of the complete object as compared to other deep learningbased methods. So the proposed method is efficient in the task where less complex saliency architecture with accurate detection of objects by preserving complete boundary is more important. Also, the computational time required to train the deep learning-based model is very high as compared to the machine learning-based model. The method proposed by Qin et al. [75] which is based on saliency detection with boundary preservation takes 145 hours to train the model, whereas the model based on random forest regression takes only 3 to 4 hours for training which is comparatively very less. So the proposed boundary preserved saliency detection model is more computationally efficient as compared to deep learning-based techniques.

V. RUN TIME
The proposed method generates a saliency map in 1.92 seconds on a 64-bit PC with an i7-4770 3.40 GHz processor and 32.0 GB RAM. The test is performed on a 300 × 400 image. All of the procedures are executed in MATLAB 2017a. The integration of transformation and learning-based saliency maps using guided filters takes around 0.06 seconds. The computational performance of saliency integration time (which include only the integration time of different saliency maps) has been presented in Table 7 where the proposed method minimizes the average integration time by ≈76.92% and 94.39% as compared to SIHCA [31] and SIAM [30] methods, respectively. So Table 7 indicates that the proposed method is computationally very fast and efficient in integrating transformation-based and learning-based saliency maps than all compared integration methods in Table 7. The overall final saliency map generation time of the proposed method is also very less.

VI. LIMITATIONS
In this paper, the saliency detection task as saliency integration using attention map-based guided filters for integrating transformation-based and learning-based saliency maps are incorporated. In the proposed saliency detection task, when the transformation-based and learning-based saliency maps are unable to detect the particular salient region at the same time, the proposed model fails to detect that region as well, which is illustrated in Fig. 10. Also, it can be observed from Fig. 10 that the proposed method sometimes detects the background as an object which affects the performance of the proposed method. The limitations of the proposed algorithm can be avoided by adding some class-specific objects like humans or vehicles etc. in the saliency models to improve the accuracy for specific vehicle objects or human object detection tasks. Also, concatenation of different deep features along with the features used in the proposed method to train the random forest regression can improve the performance of the proposed technique.

VII. CONCLUSION
This paper presents a novel salient object detection technique that integrates the wavelet-based and learning-based saliency maps using the edge-preserving guided filter model. SLIC superpixel segmentation is employed initially for the input image to speed up the computations. The local and global wavelet features are used to generate local and global saliency maps, which are finally integrated to produce the wavelet-based saliency map. While color, spatial distance, and textural distance features are used to generate the learning-based saliency map. The proposed method effectively merges the wavelet-based and learning-based saliency maps which takes into account the human visual consistent features of the wavelet-based and learning-based saliency maps by preserving the boundaries of objects. The use of guided filters takes into account the neighbor-hood pixel variations and helps to preserve the object boundaries without introducing blurring and artifacts. The outcome of substantial experiments carried out on a variety of datasets demonstrates that the suggested combination of wavelet-based and learning-based saliency maps outperforms other existing salient object detection approaches.