NSCT-Based Robust and Perceptual Watermarking for DIBR 3D Images

Depth-image-based rendering (DIBR), where arbitrary views are synthesized from a center image and depth image, has received much attention in the three-dimensional (3D) research ﬁeld. With advances in depth-acquisition techniques and the proliferation of 3D glasses and 3D display devices, there is a growing demand for schemes to protect the copyrights of DIBR 3D images. Digital watermarking is a typical protection technology and designing a watermarking method for DIBR 3D images is a challenging task because the synchronization of watermarks can easily be broken in the process of generating synthetic images. To address this issue, we propose a non-subsampled contourlet transform (NSCT)-based blind watermarking for DIBR 3D images. To ensure the proposed method has properties of robustness against the DIBR process, we conduct an analysis of robustness for NSCT subbands against DIBR attacks. Based on the analysis results, we select subbands that are robust against DIBR attacks and embed watermark in the low coefﬁcients of NSCT subbands using quantization-based embedding. While ensuring robustness, the proposed method also improves the imperceptibility of watermarks by adjusting their embedding strength according to computed perceptual masking values. Through experiments, we show that the proposed method is robust against both desynchronization attacks of the DIBR process and common attacks including signal processing operations and geometric distortions. The high imperceptibility of our method is also veriﬁed by several evaluation metrics in a subjective and objective manner.


I. INTRODUCTION
Three-dimensional (3D) content has been attracting a lot of attention from various industries and viewers. Due to the immersive experience provided by 3D content and advances in 3D displays, the 3D content market has continued to increase, as has the demand for 3D content. Such 3D content are generated by two methods: stereo image recording (SIR) and depth-image-based rendering (DIBR) [1], [2]. SIR, which is the conventional approach to generating stereoscopic images, simultaneously captures the same scene using two cameras located horizontally. Since the stereoscopic The associate editor coordinating the review of this manuscript and approving it for publication was Charith Abhayaratne .
images captured by the SIR are a real scene, it allows viewers to enjoy a high-quality viewing experience when viewed using a 3D display. However, the cameras' fixed positions means that the parallax of the stereoscopic images is fixed, and so SIR has a disadvantage that it cannot adjust the depth condition to suit the viewer's preference [3].
Meanwhile, DIBR generates synthetic images using a center image, also referred to as a reference image, and its associated depth image [4]- [6]. Since 3D display devices (e.g. 3D-TVs, auto-stereoscopic displays, and free-viewpoint TVs) require both views actually captured from a camera and views from arbitrary virtual viewpoints, DIBR is an evolutionary and optimized approach to a 3D broadcasting system [7], [8]. Since the multiview-video-plus-depth (MVD) format has been specified as a standard of input for efficient compression and transmission, DIBR has become more widely used [9]. Recently, as the technique for acquiring depth images from a single image and the inpainting technique are advanced, various applications based on DIBR have been actively developed [10], [11]. From the perspective of generating stereoscopic images, DIBR generates both left and right images using a color-center image, a grayscale depth image, and depth condition (baseline distance). DIBR has the advantage that it allows viewers to adjust the parallax of two synthesized views to achieve depth perception, taking user preference into consideration [4], [5]. In addition, the DIBR method has lower network bandwidth and storage costs than the SIR method [6].
As interest in DIBR has increased, the issue of copyright protection for DIBR-based synthesized images, referred to as DIBR 3D images, has emerged. In the past, numerous watermarking methods have been proposed to solve the copyright problem of two-dimensional (2D) images. However, 2D image watermarking methods are not directly applicable to DIBR 3D images because they have not been designed to consider inherent desynchronization attacks that can occur in the process of generating synthetic images [1], [2]. Desynchronization attack is known as one of the most difficult to resist in that it can desynchronize the location of a watermark embedded in an image, causing incorrect watermark extraction. The desynchronization attacks that result from DIBR are as follows: • 3D image warping Pixels in the center image are moved horizontally to generate a virtual image that results in desynchronization of the watermark embedded in the center image.
• Preprocessing of depth image This step is used to reduce the sharp depth discontinuities in the depth image and preprocessed depth information affects the process of generating synthetic images.
• Baseline distance adjustment By adjusting the baseline distance, the parallax of the synthesized left and right views can be controlled; This affects the distance that pixels move horizontally.
This implies that robust watermarking is in high demand for DIBR 3D images, which should be robust against the above three attack types, referred to as DIBR attacks in this paper. Making watermarking robust against DIBR requires considering what illegal redistribution may occur in a DIBRbased 3D broadcasting system [2]. As shown in Fig. 1, a content distributor transmits the watermarked center image and its corresponding depth image to a customer via a DIBRbased 3D broadcasting system. On the receiver side, a malicious customer could generate virtual images using the DIBR process and then illegally redistribute not only the center image but also the synthesized left and right images. Thus, watermarking for DIBR 3D images should be able to extract watermarks for the following listed content: the center image and the synthesized left and right images [1].
In this paper, we focus on designing a watermarking method that is robust against desynchronization attacks that occur in the DIBR process, taking into consideration the watermarking methods' fundamental requirements [15], [16] including robustness, imperceptibility, blind extraction, and sufficient watermark capacity. As mentioned above, since the synthetic image generation of DIBR is a partial translation along the horizontal direction considered a desynchronization attack, watermarking for DIBR 3D images should have robustness against DIBR attacks. In our work, we utilize the non-subsampled contourlet transform (NSCT) domain, which is the first attempt in the field of watermarking for DIBR 3D images. We believe that the NSCT domain, which has not previously been used for DIBR 3D image watermarking, is advantageous for securing invisibility and robustness compared to other domains such as discrete cosine transform (DCT) and dual tree complex transform (DTCWT) domains. DCT and DTCWT are domains used in watermarking methods [12], [13] selected as baselines in the DIBR watermarking field, but these two domains are accompanied by degradation of image quality in the watermark embedding process.
As reported in [14], DCT domain-based watermark embedding triggers block artifacts, and in the case of the DTCWT domain, the imperceptibility decreases due to the watermark insertion into the subbands generated through sampling of the VOLUME 8, 2020 decomposition process. Compared to the DTCWT domain, the NSCT domain without subsampling in the decomposition process has little degradation in image quality due to watermark insertion. In addition, because the NSCT domain has the property of shift invariance, it was considered suitable for DIBR 3D image watermarking, which was confirmed through the results of the robustness analysis and various experiments in Sections IV and VI. Through a robustness analysis of NSCT subbands against DIBR attack, we select NSCT subbands with robustness against horizontal pixel shift, and then we adaptively quantize the row coefficients of NSCT subbands in the watermark embedding phase.
In the field of watermarking, it is important not only to select a domain for watermarking, but also to design a framework well by combining each module of watermark insertion and extraction with consideration of robustness and imperceptibility. That is, it is possible to achieve both robustness and imperceptibility by carefully selecting modules for the insertion and extraction of watermark through the analysis of the domain and DIBR process, rather than simply utilizing the properties obtainable in the NSCT domain. For robustness, quantization-based watermark embedding is performed for row coefficients, not column coefficients, to be robust against the horizontal pixel shift occurring in DIBR attacks. In the watermark extraction phase, the watermarks are extracted using the statistical difference between the coefficients caused by the watermark embedding phase. With consideration of imperceptibility and robustness against common attacks, including signal processing operations and geometric distortions, the embedding strength of watermarks is controlled by the perceptual masking value and experimentally determined parameters, thereby minimizing the degradation of visual quality caused by the watermark embedding.
As we intended, our method shows higher watermark extraction performance against DIBR attacks, signal processing operations, and geometric distortions while maintaining higher imperceptibility than comparative methods [12]- [14]. In addition, the proposed method can extract embedded watermarks in a blind fashion while ensuring sufficient watermark capacity. The main contributions of the proposed method are as follows.
• Our work is the first study to apply the NSCT domain with the shift invariance property to watermarking for DIBR 3D images, and we showed that the NSCT domain is suitable for the given task through the results of the robustness analysis and various experiments in Sections IV and VI.
• Our goal was to design a method that meets the fundamental requirements of DIBR 3D image watermarking. To do this, we conducted extensive experiments in the process of applying quantization-based embedding and perceptual masking to NSCT subbands and specifying parameters that yielded the best performance.
To verify the effectiveness of the proposed method, extensive experiments were designed based on existing comparative methods [12]- [14] with unique strengths (see Table 1) and larger datasets [4], [17]- [20], and the experimental results showed the superiority of the proposed method in terms of fundamental requirements of DIBR 3D image watermarking. The remainder of this paper is organized as follows. Section II reviews related works, Section III presents the DIBR system, and Section IV briefly reviews the NSCT domain. The proposed watermarking method is proposed in Section V and the performance of the proposed method is then demonstrated in Section VI. Finally, Section VII concludes this paper.

II. RELATED WORK
Over the past decade, several watermarking methods have been introduced for DIBR 3D images. Depending on whether blind extraction (ability to extract watermark in a given work without extra information) is possible, these methods can be classified as non-blind watermarking, semi-blind watermarking, and blind watermarking [15]. Initially, non-blind watermarking methods were proposed in which the original image is necessary in the watermark extraction process. In [21], Halici et al. proposed a watermarking method based on the estimation of projection matrix between center image and synthesized image. In [22], Lee et al. proposed perceptual watermarking based on human visual system (HVS) that predicted the occlusion regions that were occluded by adjacent pixels after view synthesis.
In [23]- [25], several local feature descriptor-based semiblind watermarking methods that necessary side information in the watermark extraction process were proposed. Under the predefined rendering condition, the method in [23] utilized matched common areas between the center image and the 93762 VOLUME 8, 2020 synthesized left and right images based on descriptor matching. In [24], Miao et al. proposed a resynchronization scheme that uses descriptors to estimate the disparity map between the center image and a synthesized image; this approach could extract a watermark in any synthesized image with disparity map-based compensation. In [25], Cui et al. presented discrete wavelet transform-based watermarking with geometric rectification. Since this method rectified a geometrically distorted view, it had robustness against geometric distortions and desynchronization attacks in the DIBR process.
Unlike the above-mentioned methods [21]- [25] that have limitations in terms of practical applications, watermarking methods considering blind extraction and the fundamental requirements of watermarking have been proposed. In [26], Lee et al. proposed horizontal noise mean shifting-based watermarking that exploits the mean of the horizontal noise histogram of the center image. In [27], Rana et al. proposed depth map-based dependent region detection and block partitioning. The DCT is applied to blocks of the dependent regions that are robust against view synthesis and then a watermark bit is inserted by modifying the DC coefficients. The authors in [28] proposed a histogram shape-based blind watermarking method that exploited a pixel mean valuebased pixel group selection method. In [29], Kim et al. presented a template-based approach for robustness against geometric distortions; this approach inserts the template and watermark into the curvelet domain and one-dimensional (1D)-DCT domain, respectively.
In [12], Lin and Wu proposed blind watermarking based on the DCT domain and the inverse rendering (IR) of the DIBR process. Under a predefined rendering condition, three watermarks for the center, left, and right images are embedded in the DCT domain of the center image. In [13], Kim et al. presented a DTCWT-based blind watermarking. To design the method such that it is robust against DIBR attacks, the approximate shift invariance of the DTCWT domain is employed. These two methods in [12], [13] are most actively compared as the baseline. Recently, the authors in [14] proposed a scale-invariant feature transform (SIFT)based method that utilizes the invariability of the parameters of the SIFT keypoints to deal with the issue of view synthesis. Unlike the approaches in [23]- [25], this method can extract watermarks blindly without side information, but has the obstacle of limited watermark capacity.
To design a robust and perceptual watermarking method for DIBR 3D images that outperforms existing methods, we conducted a performance analysis of the existing comparative methods [12]- [14] that have their strengths in terms of fundamental requirements. Table 1 presents the results of analyzing each watermarking method's strengths and weaknesses. As listed in Table 1, the IR-based method [12] has advantages in terms of watermark capacity and the SIFT-based method [14] has high performance in objective fidelity test and robustness against geometric attacks such as translation and cropping because it inserts watermarks around specific keypoints. The DTCWT-based method [13] performed generally well against various attacks including DIBR attacks, signal processing operations, and geometric distortions. As listed in Table 1, it is difficult to achieve high robustness and invisibility simultaneously due to tradeoffs between the fundamental requirements of watermarking. To differentiate from methods [12]- [14], we aim to design a watermarking method that is robust against DIBR attacks and common attacks while also maintaining high imperceptibility. In addition, the proposed method is designed to extract watermarks directly from a given image without additional information and ensure sufficient watermark capacity.

III. DEPTH-IMAGE-BASED RENDERING
Depth-image-based rendering (DIBR) generates synthesized images captured from virtual viewpoint using a center image and its associated depth image [4], [5]. Fig. 2(a) and Fig. 2(b) show examples of a color center image and associated grayscale depth image, respectively. In the depth image, a higher intensity value denotes that objects are closer to the shooting camera. As illustrated in Fig. 3, the DIBR system consists of the following steps: preprocessing of depth image, depth normalization and 3D image warping, and holefilling [30]. For natural synthesized image generation, preprocessing the depth image is needed before virtual view rendering [3]. 3D image warping is employed to generate synthesized images by partially moving some pixels of the center image horizontally, and baseline distance adjustment is employed to control the rendering condition [4], [5]. The goal of hole-filling is to fill in holes, referred to as disocclusions, generated from the 3D image warping [31].

A. PREPROCESSING OF DEPTH IMAGE
Preprocessing of the depth image is employed to reduce hole occurrences in view synthesis [6]. At this stage, the depth image is usually smoothed by a Gaussian filter to mitigate sharp depth discontinuities at objects' edges and borders in the depth image [3]. A 1D Gaussian filter is defined as follows: Here, σ and w are the standard deviation and window size of the filter, respectively. Let D(x, y) be a depth value in the depth image at pixel coordinates (x, y) and D(x, y) in the ranges 0-255. As depicted in [6], the depth valueD(x, y) in the preprocessed depth image after Gaussian filter-based smoothing equals where G(h, σ h ) and G(v, σ v ) are the Gaussian filter for the horizontal and vertical directions. Here, σ h and σ v denote the horizontal and vertical standard deviations, respectively. Symmetric filter-based preprocessing is presented in [32] and Fig. 2(c) shows the depth image after smoothing with a symmetric Gaussian filter where σ h = σ v = 20. The authors in [3], [6] showed that symmetric smoothing can generate a distortion leading to the vertical boundaries becoming curved; they proposed asymmetric filter-based preprocessing of the depth image to generate more natural views. Fig. 2(d) indicates the depth image after smoothing with an asymmetric Gaussian filter with σ h = 10 and σ v = 30. The effectiveness of preprocessing the depth image is addressed in the last subsection.

B. 3D IMAGE WARPING AND DEPTH NORMALIZATION
3D image warping is the process of generating virtual views from a center image and its corresponding per-pixel depth information. This process consists of the two following steps: 1) With depth values, the points in the center image are re-projected into 3D space, 2) These points on 3D space are projected into the image plane of the virtual left and right cameras [4]. The parallel camera configuration is typically utilized for view synthesis because it does not generate vertical disparities, unlike the convergent camera configuration [30]. Fig. 4 represents the illustration of 3D image generation on a parallel camera configuration, where c l , c c , and c r denote the left, center, and right cameras, respectively. Before view generation, the depth valueD of the preprocessed depth image is normalized linearly such that the depth values lie within the new range from the farthest clipping plane Z f to the nearest clipping plane Z n [1], [4]. Normalization is performed according to the following equation: where Z denotes the normalized depth value. In 3D image warping, pixels in the center image are horizontally moved according to the corresponding normalized depth value. Under a parallel camera configuration, the y-coordinate of the projection of the point P with depth Z on each image plane is identical. Based on this camera configuration approach, virtual views can be generated from the following function [3], [4]: where t x and f denote the baseline distance between two virtual cameras c l and c r and these cameras' respective focal length; t x is used to control the depth condition. x l , x c and x r denote the pixel x-coordinate of the virtual left view, center view and virtual right view, respectively. Figs. 2(e) and 2(f) are the synthesized left and right images, respectively; synthesized images are generated with baseline distance t x = 5% of the image width.

C. HOLE-FILLING
The hole-filling process is employed to fill-in newly exposed areas, referred to as holes, that appear in virtual views [31]. The yellow pixels of Fig. 2(e) and 2(f) represent the holes that occur due to 3D image warping. One solution is to replace the holes by linear interpolation with adjacent pixels. Here, we exploit linear interpolation as a hole-filling method due to its efficiency. Compared to Fig. 2(e), we can see that the quality of Fig. 2(g) with interpolation-based hole-filling has improved. Furthermore, the effectiveness of the depth image preprocessing is verified by comparing with two magnified figures Fig. 2(h) and 2(j). The perceptible distortions in synthesized images are mitigated by utilizing depth image smoothing and hole-filling.

IV. NON-SUBSAMPLED CONTOURLET TRANSFORM
The non-subsampled contourlet transform (NSCT) domain has shift invariance, multi-scale, and multi-directional selectivity properties, hence it has robustness against geometric distortion-like DIBR attacks including 3D image warping, depth image preprocessing, and baseline distance adjustment.
Here, we present a brief review of NSCT and analysis of the NSCT subbands that are particularly robust to desynchronization attacks resulting from DIBR.

A. BRIEF REVIEW OF NSCT
Contourlet transform (CT) is a multi-scale and multidirectional transform that is constructed by the Laplacian pyramid and directional filter bank (DFB) [33]. The CT performs well in image processing applications but this transform is not shift-invariant since downsampling and upsampling are needed in both DFB and the Laplacian pyramid.
To overcome the drawback of the CT, the authors in [34] proposed NSCT, which is a shift-invariant version of CT. NSCT is established as a combination of two structures of a non-subsampled pyramid (NSP) and non-subsampled directional filter bank (NSDFB). The left part of Fig. 5 illustrates the NSP-based multi-scale decomposition. The NSP is a two-channel non-subsampled filter with ensured multi-scale property. At each NSP decomposition level, one low-pass subband and one high-pass subband are generated that have the same input size [35]. Given that the decomposition level is l d , NSP generates l d + 1 subbands and Fig. 5 gives an example of multi-scale decomposition with l d = 3 (one lowpass subband and three high-pass subbands). The directional decomposition is derived from the NSDFB and this filter bank decomposes the high-pass subband into several directional high-pass subbands. NSDFB is constructed based on a two-channel fan filter bank and resampling; this filter bank splits the 2D frequency plane into directional wedges [35]. NSDFB is iteratively exploited on the high-pass subband to provide directional decomposition [34]. As shown in the right part of Fig. 5, each high-pass subband generated from NSP-based multi-scale decomposition is filtered by NSDFB to construct 2 1 , 2 2 , and 2 3 directional subbands from coarse scale 1 to fine scale 3. In this work, NSCT decomposition level l d is set to 3 and the coefficients of the low-pass subband and directional high-pass subband in the NSCT domain are represented by C LP and C HP s,d , respectively. Here, s and d denote scale and direction index, respectively.
Based on the properties of the CT and NSCT domains, watermarking methods for 2D images have been steadily presented [36]- [41]. In particular, NSCT-based watermarking methods for 2D medical images in [36], [37] showed outstanding performance by exploiting NSCT and DCT in a hybrid method. Unlike existing NSCT-based approaches for 2D images in [36]- [39], we propose DIBR 3D image watermarking exploiting the NSCT domain. As mentioned in the previous sections, the DIBR process can be regarded as a desynchronization attack in that it affects the distance of pixels moving in the horizontal direction for view synthesis. Since the approaches in [36]- [39] are not designed for DIBR attacks, such techniques cannot be applied directly to DIBR 3D images. Therefore, in order to design a watermarking method for DIBR 3D images, domains that are robust to DIBR attacks must be analyzed, followed by research on watermark insertion and extraction algorithms. The following subsection provides an analysis of the robustness of the directional high-pass subbands against DIBR attacks.

B. ANALYSIS OF ROBUSTNESS FOR NSCT SUBBANDS AGAINST DIBR ATTACKS
NSCT is a shift-invariant transform with multi-scale and multi-directional selectivity based on NSP and NSDFB structures [42]. As we saw in Section III, DIBR attacks can be regarded as a partial translation for some pixels in the center image that causes watermark desynchronization [14]. VOLUME 8, 2020 Hence, the watermarking scheme for DIBR 3D images should be designed to be robust to watermark desynchronization that results from DIBR attacks. The NSCT domain has the advantage of being more resistant to DIBR attacks than other transform domains due to the shift invariance property. With multi-directional property, NSCT decomposition also provides precise directional details for a given image [42]. NSCT high-pass subbands can effectively capture edges with the shape of smooth contours in various directions that are the dominant features of a given image.
The representations of directional subbands depicted in  Fig. 6(c) as a representative, NSCT decomposition for scale 2 has four directional high-pass subbands from C HP 2,0 to C HP 2,3 . Each subband represents the characteristics of the Zoneplate image around the edges along with its directional wedge of the filter; evidently, each subband has strong energy in a specific direction. This means that each directional high-pass subband may have a different robustness against DIBR attacks.
To determine what high-pass subbands in the NSCT domain are suitable for watermarking and robust against DIBR attacks, similarity measurement experiments between the high-pass subbands obtained from the center and synthesized images were conducted. If the similarity of a specific subband is measured as high, this indicates that this subband is resistant to the partial translation of pixels on the DIBR process. In this experiment, we employed center and depth images from the following datasets [4], [17]- [20]: Microsoft Research 3D Video Datasets, Heinrich-Hertz Institut, and Middlebury Stereo Datasets. In the process of generating synthesized left and right images using DIBR, the focal length f is set to 1 and the value of t x to 2%-5% of the width of the center image to vary the depth condition, and synthesized images are generated for each t x . As a metric for measuring the similarity between subband coefficients, the mean square error (MSE) was employed and the MSE between where C HP s,d andĈ HP s,d,t x indicate the high-pass subband decomposed from the original center image and high-pass subband decomposed from the DIBR-based synthesized image with t x , respectively. W and H denote the width and height of the given image, respectively, and i and j denote the coordinates of the subband coefficients. Lower MSE values indicate a lesser change in the coefficients of the subbands in the NSCT domain after the DIBR process. Fig. 7 shows the results of the average MSE between the subband coefficients of the center and synthesized images. Each legend shown in the upper-right-hand corner of Fig. 7 represents t x used in the process of generating DIBRbased synthesized images. First, we see that MSE increases as the value of t x increases. This is because the DIBR attacks applied to the center image become stronger as  the t x value increases. In addition, we confirmed that the subbands for scale 1 had higher MSE values than those of other scales, hence the subbands belonging to scales 2 and 3 are suitable for DIBR watermarking. In our work, the subbands for scale 2 are used for robustness against common attacks and we used half of the subbands of scales 2 and 3 for watermarking. For scales 2 and 3, the high-pass subbands showing greater robustness against DIBR attacks than other subbands in the same scale were as follows: C HP 2,0 , C HP 2,1 , C HP 3,0 , C HP 3,1 , C HP 3,2 , and C HP 3,3 . These six subbands show much lower values of MSE than other subbands on the same scale, indicating that the subbands show little change in coefficient value even if a partial translation of pixels in the horizontal direction of the DIBR process is applied. As we clearly see through Fig. 6, each directional high-pass subband has more energy around the edges along with its directional filter bank. As mentioned previously, since the pixels of the center image move horizontally according to corresponding depth information in the DIBR process, the vertical edges are more degraded or distorted than the horizontal edges. Thus, directional highpass subbands that have more energy along vertical edges than horizontal edges are relatively more vulnerable to DIBR attacks. Looking at the aspect of the selected six subbands in Fig. 6 reveals that they have more energy around the horizontal edges, and through this, it can be deduced why these six subbands have lower MSE values than other subbands even after the DIBR process is applied. From this, we can conclude that the six subbands are suitable for the watermarking of DIBR 3D images, and the algorithm using the selected subbands is introduced in the following section.

V. PROPOSED WATERMARKING METHOD
This section describes the proposed NSCT-based watermarking for DIBR 3D images. Our goal is to design a watermarking method that meets the following fundamental requirements: blind extraction, sufficient watermark capacity, robustness, and imperceptibility. The proposed method is a multi-bit watermarking in that watermarks composed of multi-bit messages can be inserted and extracted and it can extract embedded watermarks in a blind fashion. To ensure robustness against DIBR attacks, robust subbands analyzed in Section IV are employed for watermarking. For the acquisition of additional robustness, we insert watermarks into the coefficients of the NSCT subbands by quantizing each row. In addition, adjusting the strength of watermark embedding based on perceptual masking is utilized to improve imperceptibility without compromising robustness. First, we introduce the process of watermark embedding and then describe the process of watermark extraction based on statistical differences between the coefficients of subbands caused by the quantization-based watermark embedding.
A. WATERMARK EMBEDDING Fig. 8 illustrates the overall process of watermark embedding, which consists of the following steps.
1) Y-channel acquisition and subblock division 2) Shuffling the message and watermark bit assignment 3) Conduction of three-level NSCT decomposition 4) Selection of high-pass subbands based on the assigned watermark bit 5) Perceptual masking value computation 6) Quantization-based watermark embedding 7) Conduction of inverse NSCT 8) Repetition of steps 3-7 for the entire subblock 9) Merging subblocks and watermarked image generation The proposed method only embeds multi-bit watermarks in the center image I of size W × H , not in the corresponding depth image. YUV representation can be obtained from an RGB representation using the following equation: where R c , G c , and B c denote each channel of RGB representation and Y c , U c , and V c denote each channel of YUV representation. In the field of general watermarking research, the U-and V-channels (chrominance channels) are used to improve the visual quality considering HVS, while the Y-channel (luminance channel) is employed for high robustness [43], [44]. In our method, watermarks are inserted into the luminance channel of the center image, represented by Y c , for better robustness. The proposed method has high imperceptibility by utilizing perceptual masking-based watermark embedding and only NSCT high-pass subbands which are less noticeable to HVS. The detailed description of achieving high imperceptibility is introduced later in this section.
To give the proposed method sufficient watermark capac- Since the proposed method allocates a watermark bit (0 or 1) to each subblock, our approach has watermark capacity N × M bits. Before assigning a watermark bit to each subblock, the original message M o of size N × M is shuffled to improve security using secret key (see the lower left in Fig. 8) and our work uses a Knuth shuffle. The shuffled message has same size of M o , and a watermark bit b u,v located at the (u, v) coordinate of the shuffled message is allocated to B u,v . As shown in Fig. 8, the watermark bit represented by yellow shade is assigned to the corresponding subblock represented by yellow shading. On the watermark embedding process, b u,v assigned to B u,v is used to select the specific group of NSCT high-pass subbands. For the sake of clarity, B and b denote the subblock and watermark bit with arbitrary u and v values from now on.
As mentioned previously, we exploit the NSCT domain with the property of shift invariance to make the proposed method robust to watermark desynchronization caused by DIBR attacks. Here, three-level NSCT decomposition is applied to each B. As a result of the NSCT decomposition of B, one low-pass subband C LP and 14 directional high-pass subbands C HP s,d are generated, where s and d denote the scale and direction index, respectively. When the watermark signal is inserted into the coefficients of C LP , it gives the watermarking method high robustness while the visual quality of watermarked image is greatly degraded. Hence, NSCT highpass subbands are only utilized for watermarking to improve imperceptibility. Based on the robustness analysis of highpass subbands against DIBR attacks, we carefully selected six high-pass subbands that have better robustness compared to other high-pass subbands in the watermark embedding and extraction process. As mentioned in Section IV, highpass subbands for scale 1 (i.e. C HP 1,0 and C HP 1,1 ) showed lower robustness against DIBR attacks than other high-pass subbands where s = 2, 3; hence, we used the high-pass subbands corresponding to scales 2 and 3 for watermarking. The details on selected subbands for watermarking are as follows: • For s = 2, high-pass subbands where d = 0, 1 (i.e. C HP 2,0 , C HP 2,1 ) were determined as an embedding domain for watermarking.
Next, the six selected high-pass subbands are grouped into two groups, denoted G 0 and G 1 , taking into account their directional characteristics (see the energy representation of the specific direction for each subband depicted in Fig. 6) and robustness against DIBR attacks. The high-pass subbands corresponding to G 0 and G 1 are as follows: We embed the watermarks into just the subbands of one group according to the assigned watermark bit b. In other words, if the value of a given b is 0, watermark embedding is performed in the three high-pass subbands that correspond to G 0 . If the value of a given b is 1, watermarking is applied to the three high-pass subbands that correspond to G 1 . These can be expressed as follows: • If the assigned watermark bit b is 0, • If the assigned watermark bit b is 1, where WE(·) denotes the proposed quantization-based watermark embedding applied to high-pass subbands that correspond to the group. As depicted in the two equations above, depending on the value of a given b, high-pass subbands belonging to one group are watermarked and the remaining groups retain their original coefficients.
Prior to the detailed description of the quantization-based watermark embedding, a perceptual masking value-based adjustment of the embedding strength that can improve the imperceptibility is described. The embedded watermark should not be noticeable to the human eye and not degrade the original image's perceptual quality [14]. Perceptual masking makes it possible to embed a watermark with an appropriate strength with the consideration of HVS and perceptual quality; hence, we control the embedding strength of watermark using perceptual masking value. The proposed method utilizes two perceptual masking schemes that are designed to consider the texture and brightness characteristics inherent in given content. Then, We integrate the two computed values to generate a combined perceptual masking value λ PM B for a given B. On watermark embedding, the embedding strength of a watermark for B is adjusted using the computed value of λ PM B . In our study, if the value of λ PM B for B is greater than a predefined threshold, it means that even if the watermark is inserted with strong intensity, given that B has the property that the artifacts created by watermarking are less noticeable to the human eye. Therefore, the watermark is inserted into selected high-pass subbands of B as in the above case with a strong intensity and it is inserted with an appropriate intensity in the opposite case.
In detail, we first adopt noise visibility function (NVF)based perceptual masking in [45]- [47] to analyze the textured area of the given B. The NVF is exploited to identify highfrequency information such as textured areas that are visually less perceptible. In other words, NVF focuses on the fact that the human eye cannot easily recognize a noise signal in edge regions; hence, we strongly insert a watermark into particular identified regions based on NVF. As depicted in [14], [47], the NVF values of B are calculated with the following equation: where σ 2 x (i, j) indicates the local variance in a kernel (size 3 × 3) centered on the pixel with coordinates (i, j) and σ 2 x max denotes the maximum local variance of given B. Here, i ∈ {0, 1, . . . , W N -1} and j ∈ {0, 1, . . . , H M -1}. ξ is an empirically determined scaling constant, where ξ is set to 150 and the obtained NVF values are scaled so that they fit in the range 0-1. Fig. 9 shows the visualization results of the employed representations to calculate the perceptual masking value and we can find that the NVF values are close to 0 and 1 in textured and flat regions, respectively (see Fig. 9(c); the results are scaled at 0-255 for visualization). To control the embedding strength of the watermark as blockwise rather than pixel-wise, we calculate the average of the computed NVF values of given B. The average NVF value of B, represented by NVF B , is calculated as follows:  j). The NVF-based perceptual masking value λ NVF B for adjusting the embedding strength is computed as follows: where α and β are parameters for adjusting the embedding strength according to the value of NVF B . Based on the computed λ NVF B , the proposed method can adjust the level of the embedding strength to take the texture properties inherent in a given B into consideration.
In addition, we adopt brightness map (BM)-based perceptual masking. According to [43], [44], the human eye is less perceptive of modifications (e.g. watermark embedding in our work) that occurs in very bright and dark areas; however, when the brightness values of a specific area is in a middle grayscale, HVS is more sensitive to watermark embedding. Considering this, the proposed method increases the embedding strength of watermarks when the brightness of a given B is either very high or very low. For this, we generate BM using B and C LP to analyze the brightness property inherent in the content of B. Since B is the Y-channel representation of the original data illustrated in Fig. 9(a), it itself represents the luminance information, which helps in brightness analysis. In addition, C LP is used to generate BM because it denotes the low-pass subband obtained by applying NSCT decomposition to B. As shown in Figs. 9(b) and 9(d), the visualization of B and C LP have similar properties, hence we combine the two representations for the following comprehensive brightness analysis: where θ denotes the compositing factor for generating BM and the values of the generated map are scaled from 0 to 1.
To design the proposed method to embed the watermark block-wise based on the brightness characteristics of B, we calculate the average value of the obtained map as follows: BM (i, j). BM-based perceptual masking λ BM B to control the embedding strength with consideration of the brightness sensitivity is obtained with the following equation: where 1 and 2 denote the brightness control thresholds, and the proposed method can adjust the embedding strength when the computed brightness is either very high or low with these two thresholds. η 1 and η 2 indicate the base constant factor and scaling factor, respectively. Finally, we jointly combine the obtained λ NVF B and λ BM B , and then use the combined perceptual masking value λ PM B to adjust the degree of quantization on watermark embedding with consideration of perceptual masking for the block-wise texture and brightness sensitivity. Designed while considering the characteristics of HVS, λ PM B can be computed as follows: The use of λ PM B to control the strength of the watermark embedding applied to the coefficients of high-pass subbands is described later in this section.
To summarize the contents so far, the proposed watermarking method is based on perceptual masking value, an assigned watermark bit, and six selected high-pass subbands of the NSCT domain as shown in middle part of Fig. 8. The proposed method proceeds with three-level NSCT decomposition and then inserts a watermark into the coefficients of the high-pass subbands that correspond to G 0 or G 1 according to the value of allocated b for B. In this step, our method employs NVF-and BM-based perceptual masking values to analyze the inherent visual characteristics of B, and then compute combined perceptual masking value λ PM B for adjusting the embedding strength of the watermark in consideration of HVS. Next, a methodology of inserting a watermark bit into the coefficient of high-pass subbands corresponding to the group determined by the watermark bit b, which is the core content of the proposed method, is introduced. To design the proposed method to be robust against various attacks, we adopted a quantization-based watermark embedding approach as presented in [13], [48], [49].
As analyzed in Section III, the DIBR system moves pixels in the center image horizontally according to the corresponding depth image to generate a new synthesized view. This process causes fewer statistical changes in the rows of the coefficients than in the columns of coefficients on NSCT subbands [13]. In other words, if watermarks are inserted and extracted in units of columns, it is not robust to a DIBR attack that causes a change in the horizontal direction. Therefore, the proposed method embeds and extracts watermarks using rows of the coefficients of selected subbands. Fig. 10 illustrates the detailed procedure for quantization-based watermark embedding applied to specific high-pass subband. In detail, if the value of b assigned to the B is 0, the quantization procedure is applied to all high-pass subbands belonging to G 0 . In this case, the subbands corresponding to G 1 are kept at their original values without quantization.
From now on, an arbitrary selected high-pass subband to be watermarked is represented by C HP s w ,d w , and the watermarked high-pass subband is represented byC HP s w ,d w , where s w and d w denote the scale and direction index of any high-pass subband that corresponds to G 0 and G 1 , respectively. Here, C HP s w ,d w (i, j) denotes a coefficient value of the (i, j) coordinate where i ∈ {0, 1, . . . , W N -1} and j ∈ {0, 1, . . . , H M -1}. As you can see in the part depicted by the bold line in Fig. 10, quantizationbased watermark embedding is performed on the coefficients that correspond to the k-th row of the subband where j = k. The white, gray, and yellow areas indicate the original coefficients, quantized coefficients, and temporarily quantized coefficients, respectively, and the area in blue denotes the coefficient currently being quantized. Fig. 10 shows the state where the coefficients of i from 0 to k − 1 are temporarily quantized and the coefficient, where i = j = k, is currently being quantized.
When the quantization of the current coefficient C HP s w ,d w (k, k) is temporarily completed, the value of i is increased by 1 to quantize the next coefficient C HP s w ,d w (k +1, k) (see the middle part of Fig. 10). If this process proceeds until i becomes W N − 1, the temporary quantization of the coefficients that correspond to the k-th row is completed. Here, by checking the quantization error of the k-th row of coefficients and the quantization level, we can examine whether the quantization-based watermark embedding, in which sufficient robustness and invisibility are secured, has proceeded. If the experimentally determined threshold is satisfied, the value of j is increased by 1 to proceed with the quantization of coefficients corresponding to the (k + 1)-th row. In contrast, if the condition is not satisfied, the quantization step size is increased by increasing the quantization level and the quantization proceeds again from the coefficient where i = 0, j = k. If the procedure depicted in Fig. 10 proceeds for the entire row of coefficients, we can obtain a watermarked high-pass subbandC HP s w ,d w . for l = 1 to L do 5: for

Algorithm 1 Quantization-Based Watermark Embedding on a Row of Coefficients in a High-Pass Subband
if (E s w ,d w (j) > E) or (l == L) then 9: for i = 0 to W /N − 1 do goto 1: {Branch for quantization of (j + 1)-th row} 13: end if 14: end for 15: end for 16: end for Algorithm 1 describes the details of the quantization-based watermark embedding. As inspired by [13], [49], the j-th row is quantized by quantizing each coefficient while changing the i index. The quantized coefficient can be obtained from the following equation: where S and denote scaling factor of coefficients for making finer quantization and size of base quantization step for quantizer, respectively, and l denotes the current quantization level. If the value of l is increased, the quantization step is increased, and thus the embedding strength of the watermark is increased. If the coefficient at the (i, j) coordinate is quantized, a checking process for securing the watermark extraction performance, robustness, and invisibility is followed. For this, we compared the quantization error and quantization level with experimentally determined thresholds (see line 8 in Algorithm 1). First, we checked the degree of quantization by calculating the quantization error, which is the sum of the difference between the original coefficients and the quantized coefficients [48], [49]. Meanwhile, when the quantization level l is equal to the maximum quantization level, which is denoted as L, it is evaluated that sufficient watermark embedding has proceeded, and the quantization on the current row of coefficients is then completed. The quantization error of the j-th row of coefficients is denoted as E(j) s w ,d w and it can be computed by: The computed E s w ,d w (j) is compared to the value of the threshold E, which is determined based on λ PM B generated by taking the visual characteristics into account as follows: where ω 1 and ω 2 denote the thresholds to control the quantization error and these two parameters are determined considering imperceptibility and robustness. Since B with λ PM B ≥ 0.6 has inherent properties of content that is less sensitive to watermark embedding, we made the proposed method strongly quantize the coefficients using ω 1 . Given E, our method can make sufficient statistical difference between the high-pass subbands corresponding to G 0 and G 1 and a detailed description of this follows in the next subsection for watermark extraction. If the calculated E(j) s w ,d w is greater than E, the quantization for the current row is completed and the result of dividing the temporarily watermarked coefficient value by S is determined asC HP s w ,d w (i, j). As illustrated in line 12 of Algorithm 1, the value of j is incremented by 1, followed by quantization for the (j+1)-th row of coefficients. In the opposite case (i.e. when the condition of line 8 in Algorithm 1 is not satisfied), the value of l is increased to quantize it again for the j-th row of coefficients.
Applying the procedures described in Algorithm 1 to all high-pass subbands corresponding to the group determined by b completes watermark embedding for a given B. The watermarked subblockB is reconstructed by inverse NSCT from the watermarked and original subbands. After watermarking N × M subblocks, each reconstructed subblock is merged to generate a watermarked Y-channel image. Finally, the watermarked center image, represented byĪ , can be obtained by converting the YUV representation into an RGB representation. Fig. 11 depicts the overall process of watermark extraction; it consists of the following steps.

B. WATERMARK EXTRACTION
1) Y-channel acquisition and subblock division 2) Conduction of three-level NSCT decomposition 3) Pairing of high-pass subbands and applying quantization to selected subbands 4) Statistical difference computation on paired high-pass subbands 5) Watermark bit extraction 6) Repeat steps 2-5 for the entire subblock 7) Merging the extracted watermark bits 8) Unshuffling and multi-bit message extraction As summarized above, the first two steps are the same as those of the watermark embedding. Since the proposed watermarking method is robust against DIBR attacks, it can extract embedded watermarks from both a watermarked center image and watermarked left or right images. This subsection gives a description on the assumption that a given image denoted I is watermarked center, left and right images. Like the . From now on, for the sake of simplicity, we describe the process of extracting the watermark bit from an arbitrary watermarked subblock, represented byB. The three-level NSCT decomposition is applied forB and watermark bit extraction is then conducted on carefully selected high-pass subbands utilized in the watermarking. Here, each high-pass subband generated fromB is represented byC HP s w ,d w where s w and d w denote the scale and direction index.
Since quantization is performed on the high-pass subbands of one group among G 0 and G 1 , the statistical difference occurs between the two groups after watermarking is complete. We analyze the statistical difference between the two groups by classifying the high-pass subbands that correspond to G 0 or G 1 into pairs: (C HP 2,0 ,C HP 2,1 ), (C HP 3,0 ,C HP 3,3 ), and (C HP 3,1 ,C HP 3,2 ), considering that paired subbands have similar properties of scale and direction index. Then, we apply quantization as described in the previous subsection to the coefficients of six selected high-pass subbands, thereby analyzing the differences between the groups to extract the embedded watermark bit. Here, the result of applying Algorithm 1 of quantization toC HP s w ,d w is represented byC HP s w ,d w . The difference between the coefficients corresponding to the j-th row ofC HP s w ,d w andC HP s w ,d w , can be computed as follows: Here, D s w0 ,d w0 and D s w1 ,d w1 denote the computed results for subbands corresponding to G 0 and G 1 , respectively. We estimate which group is quantized by analyzing the statistical differences based on the calculated D s w0 ,d w0 and D s w1 ,d w1 , and then extract an estimated watermark bitb by aggregating the analyzed results using a majority voting system. Algorithm 2 describes the watermark extraction process in detail. Each pair described above is represented byP p = (C HP for j = 0 to H /M − 1 do 6: Compute D s w0 ,d w0 (j) and D s w1 ,d w1 (j) using Eq. (16) 7:  (16). In this step, the watermarked and unwatermarked subbands have a low value and a high value for D s w ,d w , respectively, and we estimate the watermark bit by focusing on this statistical difference in paired high-pass subbands. The conditions of lines 7 and 9 in Algorithm 2 can be used to estimate which group of G 0 or G 1 is quantized, where µ represents a threshold for estimating an energy difference due to quantization. If it is estimated that the row of coefficients of subbands corresponding to G 0 is quantized (i.e. if the condition of line 7 is satisfied), 0 is incremented by 1, and in the opposite case, 1 increases by 1. In addition, if the difference between the two calculated statistical differences is greater than 0 and less than µ, it is excluded from counting, because there is not enough evidence to determine which group has been quantized. The processes described so far are repeated for the entire row of coefficients and all pairs. As depicted in lines 14 to 18 Algorithm 2, we employ the majority voting system to extractb. If the value of 0 is greater than the value of 1 , the watermark bitb is estimated to be 0. In the opposite case,b is estimated to be 1. As can be seen in the right part of Fig. 11, watermark bit estimation is applied to all subblocksB u,v , where u ∈ {0, 1, . . . , N -1} and v ∈ {0, 1, . . . , M -1}, and then the total of N × Mb is estimated. Finally, extracted message M e can be obtained after unshuffling.

VI. EXPERIMENTAL RESULTS
We conducted a series of experiments to evaluate the performance of the proposed watermarking method. The experimental data and evaluation criteria are introduced first, and then details of the parameter determination of the proposed method and comparative methods are presented. Finally, we present the experimental results and comparison analysis with existing comparative methods in terms of fidelity and robustness. For fidelity, we performed various metricbased objective fidelity and subjective fidelity tests in which the subject participated. For robustness, after applying signal processing operations, geometric distortions, and DIBR attacks to the watermarked images, we evaluated the degree to which the extracted message was similar to the original message.

A. EXPERIMENTAL DATA
To demonstrate the effectiveness of the proposed method, a series of experiments were conducted on DIBR datasets consisting of pairs of center and corresponding depth images. Performance evaluation was conducted using the following datasets: Microsoft Research 3D Video Datasets [17], Heinrich-Hertz Institut [4], and Middlebury Stereo Datasets [18]- [20]. Six hundred pairs of color center and grayscale depth images were employed in the experiments with various resolutions from 720 × 576 to 1800 × 1500. We conducted experiments on 200 pairs of images obtained from each of the datasets, respectively. Fig. 12 shows the example of the pairs of center and depth images on the datasets. For Microsoft Research 3D Video Datasets containing 'Ballet' and 'Breakdancers' in [17], the resolution of the center and depth images of the dataset was 1024 × 768. The dataset consisted of color and depth images generated by shooting eight cameras for two scenes, and we used 200 randomly selected pairs of images for the experiments.
The experimental data, 'Orbi' and 'Interview' with 720 × 576 resolution, can be obtained from Heinrich-Hertz Institut [4], and we randomly extracted 200 pairs from two pieces of sequence data and used them in the experiment. The pairs shown in Fig. 12(e) to Fig. 12(l)  resolutions ranging from 1240 × 1110 to 1800 × 1500. We only used pairs of views where a depth image existed among multiple views taken for a particular scene. For the 2005 and 2006 datasets [19], [20], there were the versions of combinations of the illumination and exposure parameters for each pair. In the experiments, we used a total of 200 pairs for the Middlebury Stereo Datasets, excluding parameter combinations that had too bright or dark brightness values. As shown in Fig. 12, the experimental images obtained from the datasets [4], [17]- [20] varied in resolution, number of objects, texture, and color.

B. PARAMETER DETERMINATION
In this subsection, details of parameter setting for the DIBR process and comparative experiments are presented. First, the parameters of the DIBR process for generating synthesized left and right images are described. Because the recommended disparity between the synthesized left and right images ranged from 3% to 5% of the width of center image W to offer a comfortable viewing experience to viewers [12]- [14], the baseline distance t x for adjusting the parallax of two synthesized views was set to 5% of W . The parameters Z n , Z f , and f used on the DIBR process were set to 1, t x 2 , and, 1, respectively. For natural synthesized image generation, an asymmetric Gaussian filter with σ h = 10 and σ v = 30 was utilized for preprocessing the depth image. The preprocessing was used only for the robustness experiment on smoothed depth image-based view generation, and we utilized the original depth image for other experiments. In addition, linear interpolation was used in the hole-filling process to fill newly exposed holes. Based on these parameters, left and right images were synthesized for the experiments, and the synthesized images were then used for fidelity and robustness tests. In particular, for the robustness test for baseline distance adjustment, we conducted an experiment with varying t x from 3% to 7% of W .
To show the effectiveness of the proposed method, we designed comparative experiments to analyze the performance of our work versus IR-based [12], DTCWT-based [13], and SIFT-based [14] methods in terms of fidelity and robustness. The proposed method and comparative methods in [12]- [14] were blind watermarking algorithms that could extract watermarks without the presaved side information and original cover work. To conduct experiments under the same conditions, the parameters of the IR-based, DTCWT-based, and proposed methods were determined to have sufficient watermark capacity (32 bits). The SIFT-based method, which has a limitation in the watermark capacity, was exceptionally set to have a capacity of 12 bits. For all comparative methods, watermarks were embedded into the Y-channel of center images. The details of parameter setting for the proposed and comparative methods are given in Table 2. As depicted in [12], there were two different settings for the IR-based method, depending on the block size B size employed for watermarking. In the experiments, we used a setting of the IR-based method utilizing B size of 16 × 16, which guaranteed higher robustness than that of B size of 8 × 8. The IR-based method extracted the watermark bit for each block and then generated 32 bits of a message by applying a majority voting system on the basis of predefined intervals.
In the case of DTCWT-based and SIFT-based methods, the key parameters were set based on descriptions in [13], [14]. In particular, the SIFT-based method specializing in imperceptibility and robustness against geometric attacks was parameterized to have a watermark capacity of 12 bits in consideration of the trade-off between capacity and performance. As stated in [14], the SIFT-based method has optimal performance under the parameter conditions listed in Table 2. By comparing the invisibility performance of the proposed method with that of the SIFT-based method, we attempted to verify the effectiveness of the perceptual masking methods exploited in the watermark embedding process. For the proposed method, the values of parameters were empirically determined with consideration of robustness and imperceptibility. In detail, the given Y-channel image was divided into 8 × 4 subblocks, where each subblock was W 8 × H 4 in size. The parameters for generating the perceptual masking value λ PM B were set as α = 1, β = 0.7, θ = 0.5, 1 = 0.2, 2 = 0.8, η 1 = 0.6, and η 2 = 0.4. The parameters of quantization-based watermark embedding and extraction were set as L = 8, S = 2, = 2, ω 1 = 400, ω 2 = 340, and µ = 0.1522. For the comparative experiments based on the described parameters, we used a desktop computer with an Intel(R) i7-4790K CPU and 16 GB main memory. Each algorithm was implemented in MATLAB R2014a, and the criteria for evaluating the performance in terms of fidelity and robustness are introduced in the following subsection.

C. EVALUATION CRITERIA
We conducted comparative experiments in terms of fidelity and robustness. The fidelity test evaluates the quality of the watermarked image, and we conducted the following two tests: objective fidelity and subjective fidelity tests. For the objective fidelity test, image quality assessment (IQA) metric-based quantitative analysis was performed. Including the peak-signal-to-noise ratio (PSNR) and structural similarity (SSIM) [50], which are most actively used for the evaluation criteria of image quality, we analyzed the quality of watermarked images using the following IQA metrics: feature similarity (FSIM) [51] and multi-scale SSIM (MS-SSIM) [52]. Based on these various IQA metrics, objective analysis between the watermarked and original images was performed.
Next, the subjective fidelity test is a qualitative analysis based on human judgment, and the mean opinion score (MOS), which is popular indicator of perceived media quality is used for subject quality evaluation [53], [54]. In this experiment, a modified version of the double-stimulus continuous quality scale (DSCQS) presentation structure based on [55] was utilized to evaluate the subtle watermark signal centrally. As illustrated in Fig. 13, the original and watermarked images were repeatedly displayed according to the DSCQS presentation structure. Here, O o and W o mean original and watermarked images were displayed at the observing time, and O v and W v mean original and watermarked images were displayed at the voting time, respectively. At voting time, the subjects assessed the quality of the image on the screen according to the rating scale of the MOS depicted in the right part of Fig. 13. Most often judged on a scale of 1 (Very annoying) to 5 (Imperceptible), the MOS is computed as the average of ratings performed by human subjects for a given stimulus. In summary, the subjects assessed the relative degradation of the watermarked image compared with the original image. On analyzing the subjective fidelity, we focused on the fact that when the MOS value was high, the difference between the original and watermarked images recognized by HVS was small. , respectively. Therefore, the more robust the watermarking technique is against an attack, the closer the calculated BER is to 0. To analyze the robustness of the proposed and comparative methods, we conducted the robustness test of each method by applying various attacks, such as DIBR attacks, signal processing operations, and geometric distortions, to the watermarked images and then measuring the BERs.

D. FIDELITY TEST
In this subsection, the objective and subjective quality evaluation results for the watermarked images generated by the proposed and comparable methods are introduced.

1) OBJECTIVE FIDELITY TEST
To evaluate the objective perceptual quality of the watermarking methods, we exploited various IQA metrics, including PSNR, SSIM, FSIM, and MS-SSIM [50]- [52], where PSNR is usually expressed in terms of the logarithmic decibel (dB) scale. We conducted objective fidelity measurements between the watermarked center and original center images based on the IQA metrics. Table 3 shows the experimental results of the objective fidelity test, and the first row of the measures in Table 3 shows the results between the two original center images, which were added for the numerical comparison of each metric. The SIFT-based method showed the best performance because the method embeds watermarks in partial regions based on refined SIFT keypoints. In addition, since the SIFT-based method has a lower watermark capacity of 12 bits, the degree of degradation due to watermark embedding was relatively small compared with that of other methods. The DTCWT-based method, which strongly inserts watermarks into coefficients to achieve high robustness, had the lowest performance for all metrics. On the other hand, the proposed method aims to improve invisibility through the adjustment of the embedding strength based on the perceptual masking value, the use of selected high-pass subbands, and empirically determined VOLUME 8, 2020 optimized parameters. As shown in Table 3, the PSNR, SSIM, FSIM, and MS-SSIM values of the proposed method were 45.02 dB, 0.9969, 0.9986, and 0.9977, respectively. The proposed method achieved performance close to that of the SIFTbased method specialized for objective quality evaluation even with a higher capacity. In addition, our work showed higher quality measures in the IQA metrics-based objective fidelity test than those of the IR-based and DTCWT-based methods, which have the same watermark capacity of 32 bits. In terms of PSNR, the proposed method achieved 1.26 dB and 3.51 dB higher values than the IR-based and DTCWT-based methods, respectively. From this, it can be concluded that our work has high performance in terms of objective quality evaluation.

2) SUBJECTIVE FIDELITY TEST
We performed the subjective fidelity test in addition to metricbased objective evaluation. Fig. 14 shows the results of the watermarked image of the Books image using the proposed and comparative methods. From top to bottom of Fig. 14, the results are the watermarked center images of the IRbased, DTCWT-based, SIFT-based, and proposed methods, respectively. The second and third columns were made to enlarge the partial region of the watermarked images of the first column to evaluate the degradation caused by watermark embedding in detail. The areas in the boxes marked in yellow and red were enlarged and listed in the second and third columns, respectively. Since the subjective fidelity test is performed with human eyes, it is possible to evaluate distortions caused by watermarking that cannot be measured by objective metrics.
In the case of the IR-based method, the block artifacts due to the watermark embedding of the 16 × 16 DCT coefficients can be seen in the overall part of the watermarked image. For the DTCWT-based method, there was distortion, such as smearing around the letters of the enlarged image of the area for the yellow-marked box. The SIFT-based method achieved the best performance in the objective fidelity test, but when we looked closely at the enlarged image in the third column, there were noticeable block artifacts. Since the watermark embedding of this technique was performed only on the selected block around the keypoint rather than the whole area, the method showed high performance in the objective metric-based evaluation. However, depending on the inherent characteristics of the image content, block artifacts can be highlighted, as in the example of Fig. 14. On the other hand, there were no noticeable distortions or artifacts in the watermarked results of the proposed method compared with the results of the comparative methods (see the bottom row of Fig. 14).
For a more detailed analysis, we conducted the subjective fidelity test based on the computed MOS values. In this experiment, ten subjects participated to evaluate the fidelity of the watermarked images and a Samsung S27E500C Curved LED 27-inch Monitor was employed as the display device. The twelve center images depicted in Fig. 12 were used as the original image, and the four watermarking algorithms were applied to the original image to generate watermarked images. As depicted in Fig. 13, the DSCQS presentation structure was constructed by placing the original and watermarked images at the intersection, and each subject experimented by evaluating the quality of each displayed image as a score. The results of the subjective fidelity test are listed in Table 4. In the MOS-based subjective fidelity test, which evaluates the quality of watermarked images with a score of 1 to 5, the proposed and IR-based methods achieved the high MOS values with above 4.3. The SIFTbased method, which showed the best performance in objective fidelity test, achieved a MOS value of 4.1416, and the DTCWT-based method showed the worst performance with a MOS value of 3.1. From the objective and subjective fidelity tests, we demonstrated that the proposed method has more higher imperceptibility performance than comparative methods.

E. ROBUSTNESS TEST
In this subsection, the results of the robustness test for various attacks, including desynchronization attacks from the DIBR process, signal processing operations, and geometric distortions, are introduced.

1) ROBUSTNESS TO DIBR ATTACKS
For the proposed and comparative methods, watermarks were embedded into a center image, and then the left and right images were synthesized from the DIBR process. To deal with the illegal redistribution illustrated in Fig. 1, watermarking methods for DIBR 3D images should be designed to extract embedded watermarks with a low BER value from center, left, and right images. In the robustness test, we first measured the watermark extraction performance in the watermarked center images. Table 5 shows the average BER values of the watermarking methods for watermarked center images in each dataset, where MSR, HHI, and MSD indicate the abbreviation of Microsoft Research 3D Video Datasets [17], Heinrich-Hertz Institut [4], and Middlebury Stereo Datasets [18]- [20], respectively. As listed in Table 5, all methods showed a low BER below 0.03 for each dataset, and the proposed method achieved BER values of 0, 0, and 0.0127 for MSR, HHI, and MSD, respectively. For the center view, the DTCWT-based method showed the worst performance with an average BER value of 0.0129. Next, we evaluated the watermark extraction performance for synthesized left and right images generated from the 3D image warping of the DIBR process with predefined parameters. As listed in Table 6, the IR-based method showed the lowest BER for the center view but worse performance for the left and right views. The average BER values for the left and right images of the proposed method were 0.0052 and 0.0056, respectively. For 3D image warping from gen- erating left or right views, the proposed work achieved the best performance. Compared with the comparative methods where performance degradation occurred for view synthesis, the proposed method showed little difference in the performance of watermark extraction between the center and synthesized views, and it can be concluded that the proposed method was robust to desynchronizaton occurring on 3D image warping with predefined t x . This strength derives from the fact that the proposed method utilizes quantizationbased watermark embedding on row coefficients and selected NSCT high-pass subbands from the robustness analysis in Section IV.
In addition, we conducted a robustness test for the following two types of DIBR attacks: baseline distance adjustment and depth image preprocessing. Baseline distance adjustment is used to control the depth condition of synthesized views, and it can be regarded as a desynchronization attack in that it affects the distance of pixels moving in the horizontal direction for view synthesis. In this experiment, t x was set to range from 3% to 7% of the width of center image W where the baseline distance ratio t x was defined as t x W ×100. As listed in Table 7, the proposed method achieved the best performance (BER values below 0.006) for various t x ranging from 3 to 7. Although the DTCWT-based and SIFT-based methods have higher BER values than that of the proposed work, two methods achieved acceptable performance because of the watermarking algorithm with consideration of baseline distance adjustment. In contrast, the IR-based method can extract the watermark only in the virtual view generated based on the predefined condition in [12]. Therefore, for the IRbased method, the BER value increases except when the predefined condition (i.e. t x = 5) is met. The preprocessing of the depth image is used for reducing sharp depth discontinuities in the depth image, which also affects the synchronization of watermarks in the process of smoothing the depth image. To test the robustness of the watermarking methods, we preprocessed the depth image using an asymmetric filter and then generated synthesized images under the same conditions as those of the previous experiment on baseline distance adjustment. As listed in Table 8, the proposed approach achieved lower BER than comparable methods. When comparing the results of Tables 7 and 8, it can be seen that the preprocessing of the depth image affected watermark extraction. In some cases, the change in the depth condition was adversely affected, while in other cases, the BER values slightly decreased due to the generation of natural views. For a combination attack of preprocessing of depth image and baseline distance adjustment, our work showed lower BER values than other comparative methods. From the robustness test in this subsection, we demonstrated that the proposed method has more stable and higher watermark extraction performance against three types of DIBR attacks than comparative methods.

2) ROBUSTNESS TO SIGNAL PROCESSING OPERATIONS AND GEOMETRIC DISTORTIONS
The watermarked contents undergo various attacks in the distribution process, and these attacks adversely affect watermark extraction. In this subsection, we present the robustness test results of watermarking methods for signal processing operations and geometric distortions, such as JPEG compression, white Gaussian noise addition, salt and pepper noise addition, cropping, translation, and scaling. To demonstrate the effectiveness of the proposed method, we attempted to extract watermarks from synthesized left images after applying various attacks. The results of the robustness test for signal processing operations and geometric attacks are depicted in Figs. 15 and 16, respectively. Each attack was implemented through functions in MATLAB, and translation indicates a type of geometric attack in which pixels in an image move due to a translation factor and zero padding is applied to the area before movement.
We first conducted the robustness test of watermarking methods against signal processing operations. When the strongest parameters for JPEG compression, white Gaussian noise addition, and salt and pepper noise addition were applied, the average PSNR values were 35.71 dB, 30.79 dB, and 28.92 dB, respectively, indicating severe degradation of the watermarked image. For JPEG compression, four watermarking methods had low BER values below 0.055 when the JPEG quality was within the range of 80 to 100. In the case of JPEG quality below 80, the BER value of the IR-based method rapidly increased. Additionally, the proposed method had robustness close to the best performance results of the SIFT-based method. For white Gaussian noise addition, the proposed and IR-based methods outperformed the DTCWT-based and SIFT-based methods. When the variance of Gaussian noise was 8.0 × 10 −4 , the BER values of the proposed and IR-based methods were 0.0524 and 0.0637, respectively. On the other hand, for salt and pepper noise addition, the proposed work outperformed the  IR-based method for noise density in the range of 1 to 4. When the noise density was 4, the BER value of the proposed method was 0.0453. For three types of signal processing operations, the proposed method showed sufficient robustness against the level of attacks that can be applied in real environments.
For geometric distortions, the proposed method had stable and low BER values for three types of geometric attacks, as shown in Fig. 16. For cropping and translation, the BER values of the proposed method slightly increased as the factor value increased, but the BERs were very small values of less than 0.05. Furthermore, the proposed method outperformed all comparative methods for scaling attacks ranging from −20% to 20%. While the proposed method maintained a low BER for various scaling factors, comparative methods showed increases in BER as the scaling factor changed from 0%. The IR-based method achieved the worst performance against geometric attacks, and the DTCWT-based method had a higher BER than the proposed method but showed overall stable and acceptable performance. While the SIFT-based method achieved the best performance against cropping and translation, the method was not robust against scaling that adversely affected the extraction of SIFT keypoints. From the test results in Fig. 16, we confirmed that the proposed method has more outstanding watermark extraction performance against geometric distortions than comparative methods.

VII. CONCLUSION
In this paper, we propose an NSCT-based robust and perceptual watermarking framework for DIBR 3D images. We believe that the NSCT domain is advantageous for securing invisibility and robustness compared to other domains. Compared to the DTCWT domain, the NSCT domain without subsampling in the decomposition process has a little degradation in image quality due to watermark insertion. In addition, because the NSCT domain has property of shift invariance, the NSCT domain was considered suitable for DIBR 3D image watermarking, which was verified through the results of the robustness analysis and various robustness experiments in previous sections. The proposed watermarking framework is the first attempt to utilize NSCT for designing DIBR 3D image watermarking, and we expect that the proposed study will lead to various follow-up studies on NSCT domain-based DIBR 3D image watermarking. In the field of watermarking, it is important not only to select a domain for watermarking, but also to design a framework well by combining each module of watermark insertion and extraction with consideration of robustness and imperceptibility.
For robustness, we have selected directional high-pass subbands that are robust against DIBR attacks, and the proposed watermarking framework inserted and extracted watermarks in rows coefficients of selected subbands to be robust against horizontal desynchronization of DIBR attacks. For imperceptibility, watermark embedding was performed on high-pass subbands rather than a low-frequency subband, and the strength of watermark embedding was adjusted by considering the texture and brightness characteristics using perceptual masking. To achieve robustness and invisibility at the same time, we carefully selected the parameters of the watermark insertion and extraction modules and integrated them into one framework. Through the comparative experiments, superiority of the proposed watermarking framework was experimentally proved in terms of robustness and imperceptibility. As we intended, our method showed stable and high watermark extraction performance against DIBR attacks, signal processing operations, and geometric distortions distortions, while maintaining high imperceptibility. Additionally, the proposed method achieved higher better performance than the comparative methods in the objective and subjective fidelity tests. In future work, we will extend the proposed watermarking framework for DIBR 3D images to be applicable to video with MVD format.