Blind Camcording-Resistant Video Watermarking in the DTCWT and SVD Domain

Video watermarking techniques can be used to prevent unauthorized users from illegally distributing videos across (social) media networks. However, current watermarking solutions are unable to embed a perceptually invisible watermark which is robust to the distortions introduced by camcording. These watermark-disrupting distortions include lossy compression, the addition of noise, frame-rate conversion and geometric distortions. In this paper, we present a novel video watermarking technique that is blind and robust to camcording attacks. The proposed approach uses the integration of the dual-tree complex wavelet transform (DTCWT) and singular value decomposition (SVD) to achieve robustness against geometric attacks. The experimental results validate our technique’s superior imperceptibility and robustness to several attacks when compared to existing peer mechanisms. In conclusion, the proposed technique can be used to protect against illegal distribution of video content.


I. INTRODUCTION
Illegal video redistribution by digital pirates causes financial harm to the original copyright owners or producers of films and television series [1]. For example, approximately $1.37 billion was lost to the Australian economy due to movie theft in 2010 [2]. Consequently, the security of video applications against digital piracy has become one of the most important issues for both the industry and the research community.
Digital watermarking has been broadly used for a variety of applications, including the tracking of digital pirates, preservation of copyright and playback control [3]- [9]. This paper focuses on the latter. For example, an Internet gateway could scan for the presence of a watermark and filter user's requests accordingly. That is, a user's request for a video downloaded can be cancelled if a watermark is detected, or the request can be responded to it if no watermark is detected. For The associate editor coordinating the review of this manuscript and approving it for publication was Yun Zhang . such applications, robust watermarking is required, meaning that the watermarks can survive signal processing attacks. This is in contrast to fragile watermarks that are often used for data-integrity and tamper-detection applications, which should not survive attacks, but instead manipulations can be found using the destroyed watermark locations.
The development of watermarking schemes that are robust to common attacks has remained a significant challenge to overcome [6], [10]. For example, attacks that cause de-synchronization between a watermark encoder and decoder such as camcording attacks are easily performed by digital pirates. Due to these attacks, existing blind watermark decoders are either completely unable to extract the watermark, or can only detect it with a large error [8]. Blind watermarking means that the original (unwatermarked) video is not required during watermark detection, nor any other information extracted from the original video [3].
In conventional watermarking techniques [11]- [16], the watermark decoders either require the original video for correct detection, or fail to detect the watermark when a com- VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ bination of temporal synchronization, signal processing and geometric attacks is applied. They have suffered from framerate conversion, a very popular signal processing attack which causes temporal distortion. Additionally, they suffer from camcording which causes a combination of temporal, geometric and color distortions. Techniques using the complex wavelet transform (CWT) can overcome the limitations of lack of shift invariance and improper directional selectivity by including limited redundancy in the transform, but cannot achieve efficient reconstruction higher than level 1. The use of the dual-tree complex wavelet transform (DTCWT) can overcome these limitations as it is approximately shift invariant [17]. In addition, the use of singular value decomposition (SVD) can improve the stability and performance of the DTCWT because small perturbations in the spatial domain do not change them significantly [8], [18]. In our previous work [19], we proposed a watermarking approach in the SVD and DTCWT domain. If all of the frames in a video sequence are temporally synchronized, this scheme achieves robustness to geometric attacks, H.264/AVC compression and noise addition. However, it fails to detect the watermark when a temporal synchronization attack such as frame dropping, frame insertion or frame averaging is applied. As a result, it cannot tackle frame-rate conversion and camcording. In this study, a novel video watermarking technique is proposed, which is an extension of our preliminary work and is robust to such types of temporal synchronization attacks.
The proposed technique is developed by integrating the SVD and DTCWT approaches. The main contributions of this study are provided below: • The proposed technique is designed using the integration of SVD and DTCWT techniques applied to a chrominance (U) component of the video to achieve imperceptibility of the watermark and to prevent geometric distortion attacks.
• The proposed watermark extraction is blind because it does not require the original video, neither any other information extracted from the original video (such as the original SVs). This is in contrast to conventional SVD-based schemes.
• The extraction is robust against temporal attacks such as frame-rate conversion and camcording. That is because the extraction of the watermark from a frame depends only on that frame rather than that of multiple frames of a video sequence.
• The imperceptibility is thoroughly evaluated and compared to the state of the art, using both a subjective and objective quality assessment.
• The robustness performance of the proposed technique is thoroughly assessed. These experiments revealed that our method has a much better performance compared to classical watermark algorithms.
• We also analyze the security of the embedded watermark against a multiple watermark embedding attack.
The remainder of the paper is arranged as follows. Section II discusses the related studies of the proposed technique. A brief overview of the DTCWT and SVD is presented in Section III. Section IV explains the proposed watermarking technique. A detailed analysis of the results is discussed in Section V. Finally, the study is concluded in Section VI.

II. RELATED WORK
There have been multiple digital watermarking techniques developed in the literature which use the SVD domain to embed the watermark [7], [8], [10], [18]- [22]. For example, the authors of [21] altered the SVs of an image with the watermark. They then applied the SVD on the modified watermark again to obtain new SVs. The watermarked image was then found by substituting the original SVs with the new ones, i.e., the method is not blind. The watermark extraction from the distorted version of the watermarked image was achieved by performing the reverse operation at the decoder. The outcomes revealed that this approach was robust to JPEG compression, filtering, rotation, scaling and cropping. The authors in [7] suggested a similar watermarking technique using the SVD where the watermark was inserted directly into the SVs and extracted using the reverse operation.
Lai et al. suggested a mechanism using the SVD and discrete wavelet transform (DWT) [8]. In this mechanism, the SVD was performed on two sub-bands of a 1-level DWT decomposition of the watermark and original image. Then, the SVs of these sub-bands of the original image were modified by the SVs of the same sub-bands of the watermark. In the algorithm suggested by Makbol et al. [23], a redundancy DWT (RDWT) was used with the SVD for embedding the watermark. The watermark was embedded into the 4 subbands of a 1-level RDWT decomposition and then an inverse RDWT was executed to provide the watermarked image. Then, every sub-band was used to elicit the watermark. Similarly, Prasetyo et al. [22] proposed an SVD-based watermarking scheme. More specifically, the LL-band of a DWTtransformed host frame is divided in non-overlapping blocks, and the principal components of the scrambled watermark signal is embedded into the largest singular value of each block. For watermark extraction, the largest singular values of the blocks are extracted and compared to the original singular values. For both the method of Makbol et al. and the method of Prasetyo et al., the original SVs are needed to extract the watermark at the decoder, and hence the availability of the host image at the decoder was necessary. This means that these techniques are not suitable for many applications where it is not feasible to have the host image available at the decoder.
Several image and video watermarking techniques have been suggested for various applications [3], [24]- [36]. The techniques that transact with geometric distortions can be decomposed into feature-, synchronization-and invariant transform-based algorithms. In the feature-based approaches [37]- [39], the watermark is embedded into the geometric and invariant features of a video frame. The main limitation of this type of approaches is false feature points detection [40]. These are detected wrongly when a geometric attack is applied and as a result, a false detection of the watermark is produced. In addition, these techniques are used for image rather than video watermarking applications as it is hard to obtain the same salient feature points in every frame of a video sequence.
Synchronization-based techniques validate geometric distortions before detecting the watermark. These techniques estimate the geometric parameters based on a holistic search, template addition or image registration. After that, the original image format is recovered using these parameters to elicit the watermark from the rectified image. The watermark decoder synchronizes the watermark by finding its spatial position through a comprehensive search. It is worth mentioning that this requires high computational resources and raises the probability of false detection while searching in a large space [41]. Image registration algorithms address the issue by using the watermark with a reference registration model before extracting the watermark [42]- [44]. This technique is exploited to restore the transformation parameters in the geometrically distorted version of the watermarked frame. Registration techniques are efficient for non-blind or supervised watermarking mechanisms. For example, Li et al. [25] utilize the scale invariant feature transform (SIFT) to restore geometrically attacks, in combination with a watermark in the contourlet domain. However, this synchronization requires the original video during watermark extraction. Watermark detection in blind or unsupervised watermarking systems is a much more difficult problem to solve. Template addition algorithms have also been used to secure image and video systems against geometric attacks [45], [46]. In these algorithms, a template is added in the watermark embedding procedure. The template does not convey any information but is utilized at the decoder to identify the transformation parameters before extracting the watermark. The major drawback of these algorithms is the ability of a hacker to identify and remove the template by, for example, deleting the peak components in the discrete Fourier transform (DFT) domain [38].
Invariant transform watermarking algorithms exploit the advantage of the embedding domain's invariance to geometric distortions. In current state-of-the-art approaches, the majority of the techniques [47]- [49] use the Fourier-Mellin Transform (FMT) algorithm which provides rotation-and scaling-invariant characteristics that are robust to attacks involving rotation, scaling and translation (RST). Despite FMT-based algorithms being efficient in theory, they are not applicable for real-time systems as they require a large amount of computational processing and are also not resilient to cropping [44], [50]. Another interesting approach is using the polar harmonic transform (PHT). For example, Xu [51] proposed a rotation and scale invariant image watermarking method based on the PHT, but it is not robust to cropping.
Loo et al. [52] proposed an alternative scheme using the DTCWT technique that has proper directional selectivity and approximate shift invariance characteristics that provides inherent robustness to geometric distortions [17]. As a result, other researchers have also adopted this domain for embedding the watermark [53]- [56]. In [54], the level 3 and level 4 components of a 4-level DTCWT decomposition are used to embed the watermark based on a spread spectrum mechanism. However, this mechanism does not support blind detection. In [55], the watermark was inserted into the magnitude of the highest two levels of a 4-level DTCWT decomposition. This scheme was shown to be effective in the presence of upscaling, cropping, rotation and lossy compression. However, as the human visual system (HVS) more easily perceives changes in luminance than in chrominance [57], the performance was limited by the low watermark magnitude required to maintain the imperceptibility of the watermark. Therefore, when required to maintain watermark imperceptibility, its robustness to attacks was shown to be lower than that of methods where the watermark was embedded in the chrominance channel [58], [59].
In the literature, a combination of the DTCWT and SVD has also been used to design image and video watermarking algorithms [19], [60]- [62]. Abdallah et al. [60] described an algorithm using the SVD and DTCWT domains in which the SVs of the level 2 sub-bands of a 2-level DTCWT decomposition of a video frame are used to embed the watermark. Although the watermark decoder does not require the original video to extract the watermark, the SVs of the unwatermarked video are required, making the method only semi-blind. The outcomes of the assessment of this technique showed that its robustness performance against signal processing attacks was superior to other DWT-SVD based techniques. However, the durability of the watermark in the presence of geometric distortion attacks was not considered. Another scheme proposed in [19] also used the DTCWT-SVD domain for embedding the watermark. However, this algorithm was unable to cope with temporal synchronization attacks. In other words, the watermark detection using this method failed in the presence of frame dropping, frame insertion or frame averaging attacks. In [61], Yadav et al. proposed an image watermarking algorithm using a combination of the DTCWT, principal component analysis (PCA) and SVD domain. They obtained the score matrix of the low-frequency DTCWT coefficients using the PCA. Then the SVD was applied on the score matrix to get their SVs. Finally, the resultant DTCWT-PCA-SVD features are combined with the same features extracted from the watermark to generate a watermarked image. Although this method achieved robustness against signal processing and cropping attacks, the availability of the original SVs was required at the decoder to extract the watermark, which limits the watermarking applications. In [62], a video watermarking technique based on the finite state machine was proposed in the DTCWT-SVD domain. In this method, the watermark was generated by the finite state machine, which was then embedded into the SVs of the low-frequency DTCWT coefficients of the video frames. The watermark at the decoder was extracted according to the predefined relationship of the SVs. Experimental results show the algorithm was robust against VOLUME 10, 2022 transcoding, noise addition and temporal synchronization attacks, however, it was unable to survive geometric attacks.
More traditional transforms in image and video watermarking are the DFT and discrete cosine transform (DCT). For example, Sun et al. [63] utilize the DFT domain to provide a geometrically robust algorithm. Although the performance is good and the method is blind, it cannot cope with temporal attacks. Additionally, the DCT-based approach suggested in [59] utilizes the low-frequency DCT coefficients of a frame to embed the watermark. As a modification of the DC coefficient creates temporal flickering, this region is avoided for embedding the watermark. However, there is still slight flickering in the watermarked video because of the modification of the large coefficients around the DC component in the chrominance channel. Other techniques e.g., [64], [65], also embed the watermark using the DCT. In [64], the authors exploited the DCT coefficients to embed the watermark and showed that the watermark was robust to a downscaling attack but could not achieve robustness against rotation, upscaling and cropping. In [65], a video watermarking technique was introduced based on the DCT domain where watermark minimal sequences (WMSs) were used for embedding the watermark. Although the watermark in this scheme claims to be robust against geometric and temporal attacks, it has a limited defense against cropping and rotation. It could detect the watermark in a frame-dropping attack if at least one WMS was present at the decoder. However, as frame-rate change results in no WMS being present, it could not survive this type of attack. Table 1 summarizes the discussed related methods. In summary, the state-of-the-art techniques are unable to fulfill the main requirements of the digital video watermarking: blind detection, robustness against signal processing, geometric and temporal synchronization attacks, and imperceptibility of the embedded watermark. As a solution, the proposed method overcomes the above challenges, i.e., it detects the imperceptible watermark without any reference or original video content, and is robust against a large variety of commonlyused attacks.

III. OVERVIEW OF THE DTCWT AND SVD
This section discusses the background of the DTCWT and SVD which are utilized to develop the proposed technique.

A. DUAL-TREE COMPLEX WAVELET TRANSFORM (DTCWT)
The DTCWT was suggested by Nick Kingsbury to solve the challenges of shift-invariance and poor directional selectivity of the traditional wavelet transform [17]. The DTCWT has two trees where one provides the real portion and the other the imaginary portion of the wavelet coefficients. This approach is responsible for the high performance of the DTCWT compared with the DWT and CWT that employ a single filter tree to produce the wavelet coefficients. The decomposition structure of the DTCWT is shown in Fig. 1. It possesses the characteristics of suitable directional selectivity, perfect reconstruction, approximate shift-invariance and effective order-N estimation. Shift invariance is approximately estimated by doubling the sampling probabilities in Tree A and Tree B through eliminating downscaling by 2 after the level 1 filters H 0 , H 1 , G 0 and G 1 . In order for the samples at this level to be evenly spaced, the delays of H 0 and H 1 are one sample offset from those of G 0 and G 1 . This shift-invariance property can be used when developing a video watermarking mechanism, which is resistant to rotation and scaling.
The DTCWT technique has a redundancy of 4:1 for 2D signals. Each level of a 2D DWT produces three sub-bands that are estimated at angles of 0 • , 45 • and 90 • whereas a 2D DTCWT generates six subbands at angles of ±15 • , ±45 • and ±75 • . Fig. 2(a) shows each level with six directional sub-bands of a three-level DTCWT, with the magnitudes of the identical sub-band coefficients of the Lena image shown in Fig. 2(b). This redundancy in the DTCWT plays a key role in generating durable watermarks. It should also be noted that when a random watermark is added directly to the coefficients in a redundant domain, some of its components may be lost when the inverse operation (DTCWT) is performed [14]. Therefore, we consider the DTCWT coefficients of both the video frame and watermark in our proposed embedding process.

B. SINGULAR VALUE DECOMPOSITION (SVD)
Let f denote one frame of a video sequence. If f has a squared matrix N × N , the SVD of f is declared by and are the orthogonal (or unitary) matrices, and is a diagonal matrix. s 1 , s 2 , · · · , s N , of S are the diagonal elements of S called the SVs of f . The SVs illustrate intrinsic algebraic characteristics of an image [21]. The SVD technique is utilized in video watermarking because the good stability characteristic of its SVs provides robustness to attacks.

IV. PROPOSED WATERMARKING TECHNIQUE
The proposed video watermarking method consists of three stages: the generation, embedding and extraction of the watermark, discussed in Section IV-A, IV-B, and IV-C, respectively. This method is robust to a combination of signal processing and geometric attacks. In short, a pseudorandomly generated watermark is added into the frames of the host video content using the DTCWT and SVD. The combination of the DTCWT's approximate shift-invariance property and good stability of the SVD's SVs are utilized to enhance the robustness to geometric distortions. We extract the watermark at the decoder from each watermarked frame, without reference to the original SVs or video content.

A. CREATION OF THE WATERMARK PATTERN
A watermark is an identifiable pattern embedded in original video content, which could be a logo, signature, image or any other type of content. In our technique, the watermark, w ∈ {−1, +1}, is pseudo-randomly generated pattern using a key K, which is exploited to create a unique pattern w for C consecutive frames. We select the optimal length of C experimentally, since it is a trade-off for robustness against temporal frame averaging (TFA) and watermark estimation re-modulation (WER) attacks [66]. Note that, even though the pseudo-random watermark pattern changes every C, the proposed method is robust against temporal attacks since the detection does not require the watermark pattern. The SVs of the transform (DTCWT) coefficients of w are embedded in the host sequence, as further described in Section IV-B. In order to do this, the level 1 coefficients, H w 1,i , of a one-level DTCWT decomposition of w are selected. These coefficients are defined as where i = 1, 2, · · · , 6 indicate the directional sub-bands of the complex coefficients at angles of ±15 • , ±45 • and ±75 • .  is 8 times lower than that of the frame's untransformed U channel. As the SVs are exploited for embedding, we apply the SVD on H w 1,i which is expressed as where the diagonal matrix, S w 1,i , is defined as The diagonal elements, s w,1 1,i , s w,2 1,i , · · · , s w,M 1,i , in descending order in S w 1,i are the SVs of H w 1,i . The SVs of a frame are modified by these SVs based on the sign of the information bit, b, as discussed in Section IV-B.

B. WATERMARK EMBEDDING
A block diagram that describes the proposed watermark creation and embedding technique is shown in Fig. 3. In short, the watermark is added in the SVs of the highest level coefficients, H u 3,i , of a three-level DTCWT of the U frame, f . The highest level coefficients, i.e., low-frequency coefficients, are robust to compression and geometric distortions but have a greater influence on the perceptual quality of the video [55]. Therefore, selecting the low-frequency coefficients is a trade-off between visual quality and durability. For this reason, only the level-3 coefficients are selected for adding the watermark.
where the diagonal matrix, S u 3,i , is defined by If we group the diagonal matrix, S u 3,i , into three pairs, S u 3,1 , S u 3,6 , S u 3,2 , S u 3,5 and S u 3,3 , S u 3,4 , Fig. 4(a) shows that, in an unwatermarked frame, the difference between the mean of the SVs of each pair is very small. For example, the corresponding means in the figure of both S u 3,1 and S u 3,6 are approx. 35, the means of both S u 3,2 and S u 3,5 are approx. 20, and the means of both S u 3,3 and S u 3,4 are approx. 43. Since the means of these pairs are approximately equal in unwatermarked frames, the main goal of our watermarking method is to create a sufficiently-large difference between them.
More specifically, the method modifies two of the three pairs, namely S u 3,1 , S u 3,6 and S u 3,2 , S u 3,5 . That is because modifications in those bands affect the resulting watermarked frame less than modifications of the other two sub-bands. This is shown in Fig. 4(b), which shows the average PSNR between the unwatermarked and watermarked frame when modifications are made in each of the subbands. Since modifications in subbands with the directions of i = 3 and 4 result in the lowest PSNRs, they are not used for watermarking (although the average PSNR is very close to those in the directions of i = 3 and 4).
Then, the SVs of H u 3,i , s u,1 3,i , s u,2 3,i , · · · , s u,M 3,i , which are the diagonal elements of matrix S u 3,i , are modified by those of the transformed version of the watermark obtained from Eq. (7). The SVs of S u 3,i are modified by the corresponding SVs of S w where β controls the strength of the watermark and b ∈ {−1, +1} is the embedding bit pattern. Note that the value of β is directly proportional to the watermark's robustness and is inversely proportional to its transparency. Hence the selected value of β is a trade-off between the video quality and robustness against attacks. At this stage of the embedding process, the diagonal matrix, S u 3,i , in Eq. (9) is replaced by the modified diagonal matrix,Ŝ u 3,i , in Eq. (11). The watermarked level 3 coefficients,Ĥ u 3,i , are given bŷ Finally, a watermarked video frame,f is generated by taking an inverse transform of the modified DTCWT coefficients. The overall watermark embedding process for a video frame is summarized in Algorithm 1. This process is repeated for each frame in a video sequence.

C. WATERMARK EXTRACTION
After embedding a watermark in a video sequence, it can be compressed for storage purposes or subjected to geometric and/or signal processing attacks. After subjecting a watermarked frame,f , to attacks, the attacked frame is denoted asf . An overall block diagram of the extraction phase is given in Fig. 5. Firstly, a three-level DTCWT is applied to the U frame off . Although the watermark embedding process modified the SVs of the level 3 coefficients, it is extracted from the SVs of any level (1, 2 or 3) to avoid a VOLUME 10, 2022 downscaling in resolution attack. That is because a watermark can be extracted from a lower DTCWT level if the video was downscaled, as examined in previous work [58].
The SVD of the level-l complex coefficients of the U frame off is given bỹ where l = 1, 2 or 3 are the DTCWT decomposition levels.
The SVs of the diagonal matrix,S u l,i , are defined as Before these values were modified during watermark embedding, the difference between the means of the unwatermarked SVs of a pair is negligible. However, the watermark encoder created a large difference between them, using Eq. (11). The mean difference using sub-bands i = 1 and 6 is expressed as and, using sub-bands i = 2 and 5, is given by Since we supplied the same b at the encoder to add the watermark in both pairs of sub-bands, the sign of both D 1 and D 2 should be the same, i.e., either both positive or both negative. Hence, the embedded bits can be extracted based on the signs of D 1 and D 2 . More specifically, the bits are respectively estimated from D 1 and D 2 by Using these equations, we obtained b 1 and b 2 for a single frame. We further utilize the notations b 1 (k) and b 2 (k), which denote b 1 and b 2 decoded from the k th frame in a video. After decoding the bits from P consecutive frames of a sequence, we apply b 1 (k) b 2 (k), where the symbol represents a normalized cross-correlation (NCC) between b 1 (k) and b 2 (k). As we embed the same b in both pairs, the NCC should provide a large correlation peak. In contrast, when no watermark was embedded, the NCC should be approximately zero. Find the SVs ofS u l,i using (14) 7:

Algorithm 2 Watermark Extraction From a Video Sequence
end for 8: Estimate D 1 and D 2 using (15) and (16) respectively 9: if D 1 > 0 then end for 20: end for 21: if b 1 (k) b 2 (k) > Th then 22: Watermark present 23: else 24: Watermark absent 25: end if Finally, we compare the peak of the correlation output with the threshold, Th (see Section V-C1), to examine if the watermark is present or not. The above process is summarized in Algorithm 2.
As shown in Algorithm 1 and Algorithm 2, the proposed method can only embed a single information bit in the video. Hence, the embedding capacity is limited to only a single bit, regardless of the robustness and imperceptibility performance. Still, this information bit can be used to flag protected media. For example, the watermark extraction filter can be placed in an Internet gateway to scan the existence of the watermark. Then, a user's request for a video downloaded can be cancelled if a watermark is detected, or the request can be responded to it if no watermark is detected. Future work can investigate how to blindly extract a larger payload in the SVs.

V. EXPERIMENTAL RESULTS AND DISCUSSIONS
This section evaluates and discusses the proposed watermarking method in comparison with the state of the art. First, Section V-A describes the experimental setup. Then, Section V-B, V-C, and V-D discuss the imperceptibility, robustness, and security, respectively. Finally, Section V-E evaluates the computational complexity.

A. EXPERIMENTAL SETUP
In this part of our study, comprehensive experiments were carried out to evaluate the performance of the proposed technique. In order to justify its performance, ten publicly-available standard test sequences, BasketBallDrive, BQTerrace, Cactus, IntoTree, OldTownCross, ParkJoy, Life, ControlledBurn, SpeedBag and PedestrianArea [67], [68], with HD resolutions of 1080 × 1920, were adopted for our experiments The key, K, was used to create w for C = 7 consecutive frames. The DTCWT decomposition level for extracting the watermark was selected using the resolution of the input video at the decoder. It is level 3 (l = 3) for a resolution of 1080 × 1920, level 2 (l = 2) for the resolutions of 540 × 960, 480 × 640 and 270 × 480, and level 1 (l = 1) for a resolution of 240 × 320. The NCC was performed after decoding the bits of P = 300 consecutive frames of a watermarked content and the level of robustness was assessed based on the false negative rate (FNR) of the NCC peak of these decoded bit patterns.
The effectiveness of the proposed technique was compared with three schemes: the DWT-SVD method by Prasetyo et al. [22], the DCT method by Lee et al. [64], and the DCT method by Ling et al. [65]. As summarized in Table 1, these methods claim to have robustness against signal processing attacks and (some) geometric attacks. Moreover, the DCT-based methods claim robustness against temporal attacks. Note that the method by Prasetyo et al. was adapted such that the watermark is embedded into every frame rather than in key frames only (as is done in all other evaluated methods), and no image scrambling was applied on the watermark signal since the signal is random already. Furthermore, a watermark signal size of 480 × 270 and a scaling factor α of 0.1 was used, which is the same as in their originally proposed method. Furthermore, Lee et al. justified the performance of their method using bit error rates. The bit pattern was extracted from each frame using this algorithm and all 0 bits were set to −1. Finally, to compare the FNR of this method with our algorithm, we performed the NCC between this pattern and the embedded one.

B. WATERMARK IMPERCEPTIBILITY
Because of the non-linear property of the HVS, objective quality measures may not correspond to a real output of the visual quality of video content. On the other hand, subjective quality assessments, which are based on human judgement, also vary from person to person. Therefore, in this paper, both subjective and objective assessments were conducted to reflect the quality of the watermarked sequences.

1) SUBJECTIVE QUALITY ASSESSMENT
In this work, a subjective assessment was carried out based on the double-stimulus continuous quality scale (DSCQS) approach specified in the ITU-R standard [69], using five VOLUME 10, 2022 test sequences: ParkJoy, Life, ControlledBurn, SpeedBag and PedestrianArea. The subjective tests were utilized to examine the strength of the imperceptible watermark, β, in Eq. (11) and to evaluate the quality of a watermarked video. We also performed the subjective tests for Ling's and Lee's methods. In order to do this, we embedded watermarks in the previously mentioned sequences using five different Q-step sizes, = 150, 250, 350, 450, and 547, embedding strength ratios, 0.01, 0.02, 0.03, 0.04, and 0.05 and watermark strengths, β = 36, 38, 40, 42, and 44 for Lee's, Ling's and the proposed algorithms, respectively. The resulting 75 watermarked sequences (i.e., 5 sequences × 15 embedding parameters) were then judged by 15 participants in three small groups. It should be noted that the participants, both male and female, were postgraduate students at the University of New South Wales, Canberra, Australia. Four of them were researching in the area of image processing and others were from different fields. Based on the DSCQS method, we simultaneously displayed a pair of sequences, one was the original and the other one watermarked, on a 60-inch television. Their positions were set randomly and the people were unaware of which was the watermarked sequence. An assessment sheet with a continuous scale (see Fig. 6) was provided to each participant. We displayed each pair twice and asked each person to present their judgement regarding the perceptual quality of the original and watermarked sequences on the provided sheet. At the end of the test, 75 scores (i.e., 5 sequences × 15 participants) for each embedding parameter of each method were obtained. Their mean opinion scores (MOSs) are plotted in Fig. 7, where the error bars indicate a 95% confidence interval. The mean of the scores for the original sequences is also depicted by green horizontal lines. Although Ling et al. [65] and Lee et al. [64] recommended a strength ratio of 0.05 and = 547, respectively, Fig. 7(a) and Fig. 7(b) show that the watermarked video exhibited highly perceptible distortions using the recommended parameters and the MOSs obtained from these algorithms were well below those of the original sequence. On the other hand, Fig. 7(c) shows that the MOSs were close to the average score of the unwatermarked video sequences for the proposed scheme. Although the MOS for each β is very close to the green line, β = 38 (from Eqn. (11)) was selected for embedding the watermark using the proposed method.

2) OBJECTIVE QUALITY ASSESSMENT
The imperceptibility of the watermarked video was objectively evaluated using the peak signal-to-noise ratio (PSNR), structural similarity (SSIM) [70], pixel-based visual information fidelity (VIFP) [71] and video multimethod assessment fusion (VMAF) [72]. In order to average the obtained values of the Y, U, and V channel, a weighted average methodology was used [73]. The average quality values of the watermarked video sequences using Prasetyo's, Lee's, Ling's and the proposed methods are summarized in Table 2. As we embedded the watermark into the U channel, this table shows that the quality of the watermarked video using the proposed method imperceptible, and is better in terms of the PSNR, SSIM, VIFP, and VMAF compared to the state-of-the-art techniques.

C. ROBUSTNESS TO ATTACKS
In this part of the experimental analysis, we analyzed the robustness of our proposed algorithm and state-of-the-art methods to commonly-used attacks. That is, we embedded twenty unique patterns of the watermark in each video sequence. The FNR was then computed by fitting a Gaussian distribution to the NCC peaks which were obtained using these watermark patterns.

1) PROBABILITY OF FALSE DETECTION
The correlation results for the extracted bits were compared with a threshold to determine whether the video was watermarked. In order to do this, we defined the threshold for providing a probability of false detection, P fd , as 10 −6 . This probability is estimated by the model described in [74], and is computed by (19) where n is the length of the extracted bit pattern and Th the watermark detection threshold. Prasetyo's, Ling's and Lee's schemes also used this formula to calculate the thresholds, Th. These were computed numerically as 0.0132, 0.4265, 0.6308 and 0.2700 for Prasetyo's, Ling's, Lee's and the proposed algorithms, respectively.

2) DOWNSCALING IN RESOLUTION AND ASPECT-RATIO CHANGE
In this section, we discuss the performance of the proposed technique for downscaling to arbitrary resolutions and aspectratio changes. The FNRs of Prasetyo's, Ling's, Lee's and the proposed methods when evaluating without attack and downscaling to the resolutions of 540 × 960, 480 × 640, 270 × 480 and 240 × 320 are shown in Table 3. These were zero for each scheme except Prasetyo's algorithm, which reports a FNR of 0.488% when downscaled to a resolution of 240 × 320. The resolution of the original video was 1080 × 1920, which corresponds to the aspect ratio of 16:9. However, as the aspect ratios of the downscaled sequences were 16:9 and 4:3, and the FNRs were still zero at 4:3, we deduced that the watermark of the proposed scheme was robust to aspectratio change.

3) GEOMETRIC ATTACKS
Geometric distortions are very common types of attacks in the area of digital watermarking and consist of upscaling and downscaling in resolution, cropping, rotation and aspect-ratio change. In our experiment for an upscaling attack, firstly, we scaled up each sequence to a certain level (1% tot 15%) and then cropped to the original dimension. Finally, the resultant sequences were additionally downscaled to resolutions of 270 × 480 and 240 × 320. The performances of all approaches are shown in Table 4. It is clear that Prasetyo's method extracted the watermark with a very high falsenegative error, and Lee's method could only withstand a upscaling-and-cropping attack of 1% but failed for stronger attacks. Note that '-' indicates that the watermark detection failed. Although Ling's approach performed well at relatively low levels of upscaling and cropping, it was still inadequate at higher levels. On the contrary, our proposed scheme extracted the watermark with zero FNRs for up to 15% of upscaling and cropping, and even when additionally downscaled to the dimension of 240 × 320.
To evaluate the robustness of our proposed approach to a rotation attack, each sequence was rotated at various angles (between 1 and 15 degrees) and then cropped to remove any newly created zero pixels from the border of the resultant frame. We also extracted the watermark from downscaled versions of the rotated sequences. The results for a combination of upscaling, cropping, rotation and resizing to 270×480 and 240 × 320 are shown in Table 5. These results indicate that, although Ling's method provided low FNRs at small angles of rotation, only the proposed approach was robust to these attacks.

4) ADDITION OF WHITE GAUSSIAN NOISE
To assess robustness to the addition of noise, white Gaussian noise with zero mean and several different variances were added to the watermarked video contents. It should be noted that the pixel values of the distorted frames were constrained to the range 0 to 255. The resultant FNRs after extracting the watermark from noisy sequences are shown in Table 6. This table illustrates that the FNRs of Prasetyo's, Lee's, Ling's and the proposed algorithms were zero or close to zero even when the resolution scaled down to 240 × 320 pixels and the PSNR of the attacked watermarked U channel decreased to 20.45 dB for the proposed scheme.

5) JOINT ATTACK
In this test, a joint attack consisting of the addition of white Gaussian noise, upscaling, downscaling in resolution, rotation and cropping were considered. Firstly, we added white Gaussian noise with zero mean and a variance of 65 into each frame of the watermarked video sequences and then the noisy sequences were rotated at different angles and cropped as    explained previously. Finally, the resultant video sequences were downscaled to 270 × 480 and 240 × 320 pixels resolutions before passing through the decoder. The performances of our proposed method shown in Table 7 indicate that it was far better for the joint attack than those of Prasetyo's, Ling's and Lee's schemes.

6) H.264/AVC COMPRESSION
In this part of our analysis, we evaluate our technique to a lossy compression attack. A H.264/AVC encoder compressed the sequences using the quantization parameter (QP) of 28, at 25 frames per second (fps). We applied the attack using H.264/AVC because it is the most common video encoder. Note that we have no reason to expect different results when using other compression standards such as H.265/HEVC. Although Prasetyo's and Lee's schemes did not achieve zero FNRs for compression attacks, Ling's and the proposed techniques achieved better robustness as shown in Table 8.   In the previous section, we analyzed the performances of our method in terms of downscaling in resolution, upscaling, rotation, cropping, the addition of Gaussian noise and combinations of these attacks. However, in this experiment, we additionally analyzed the robustness to these attacks in combination with H.264/AVC compression. The FNRs of the proposed method, Prasetyo's, Ling's and Lee's approaches are summarized in Table 9 to Table 12. Table 9 indicates that our proposed scheme extracted the watermark without any error for a combination of cropping, upscaling, and compression and downscaling in resolution, even for 15% upscaling and cropping. On the contrary, although the FNRs of Ling's scheme were small for up to 5% upscaling and cropping, they were large for higher values of the upscaling attack. This table also shows that the FNRs of Prasetyo's and Lee's methods were very high even for a 1% upscaling attack.
The FNRs of Prasetyo's, Ling's, Lee's and the proposed algorithms for integration of H.264/AVC compression, rotation, upscaling, cropping and downscaling to stochastic resolution attacks are summarized in Table 10. This table indicates that, for up to 15 • of rotation, the proposed scheme achieved better robustness than the other methods. It is noticed that the performances of our approach were better at downscaled video resolutions than the original resolution for this level of attack because of the high-frequency bands being truncated from a frame and low-frequency complex coefficients are spread to the DTCWT decomposition levels when downscaling occurs [58]. As, in this case, the low-frequency transform coefficients are selected to add the watermark, its effect was evident at a lower resolution of a video frame. Therefore, an additional step of downscaling could be considered in the watermark extraction algorithm to improve the detection performance.
In Table 11, although the FNRs of Ling's and Lee's methods were better at higher variance values than those of the proposed method, Prasetyo's, Ling's and Lee's approaches failed when combined with rotation, upscaling and cropping in the presence of white Gaussian noise with zero mean and a variance of 65, and H.264/AVC compression, as shown in Table 12.
A receiver operating characteristic (ROC) curve is a trade-off between false positive rate (FPR) and true positive rate (TPR). A smaller FPR and larger TPR represents the better performance of the watermark detection. The ROC curves for Prasetyo's, Lee's, Ling's and the proposed algorithms after jointly applying H.264/AVC compression, the additive Gaussian noise of variance 65, 5 • rotation and cropping, and resizing the dimension to 1080×1920, 270×480 and 240×320 pixels resolutions are presented in Fig. 8. These curves indicate that our scheme is more robust to a joint attack than the other approaches, even at a resolution of 240×320 pixels. VOLUME 10, 2022

7) CAMCORDING AND TEMPORAL SYNCHRONIZATION ATTACKS
For the camcording experiments, we tested only Lee's and the proposed algorithm using five different watermarked video sequences ParkJoy, Life, ControlledBurn, SpeedBag and PedestrianArea. Ling's method was not used in this test because camcording causes temporal de-synchronization, for example, frame insertion or frame dropping, as well as geometric and color distortions. If the watermark extraction is dependent on consecutive frames, the watermark decoder will be unable to elicit the watermark when temporal de-synchronization is applied. As Ling's approach requires at least one WMS to extract the watermark but no WMSs appeared due to the frame-rate change, this scheme was not robust to camcording. Furthermore, Prasetyo's method was not considered for the camcording experiment because it is not robust to geometric distortions, and it is a non-blind method which requires temporal synchronization at the decoder. Note that the original method proposed to only embed the watermark in key frames. By detecting those key frames during watermark extraction and hence retrieving the original SVs for non-blind detection, the temporal synchronization issue could be solved. However, those key frames could be (un)intentionally dropped by an attacker during a camcording experiment, making watermark extraction impossible for their method.
In our test, the watermarked sequences were run repeatedly on a 60-inch television at 30 fps. We used a SONY HDR-TD30VE camcorder to capture each sequence 10 times starting from a different frame, for at least 300 frames with three different frame rates (25p, 50i and 50p). The 150 captured AVCHD (MPEG4-AVC/H.264) format sequences (i.e., 5 sequences × 10 trials × 3 frame rates) were re-compressed using the x264 encoder and then downscaled to the resolutions of 540 × 960, 480 × 640, 270 × 480 and 240 × 320. The FNRs after extracting the watermark from the resultant sequences for both Lee's and the proposed algorithms are summarized in Table 13. It is clear that our algorithm was more robust to camcording, even when additionally compressing and downscaling the videos.

D. WATERMARK SECURITY
Multiple watermark embedding is a possible attack that could be used to remove a watermark from the watermarked video sequence. To analyze the security of the watermark against this attack, four different sequences were used where each has different motion characteristic, and all contain 300 frames. In a multiple watermark embedding attack, we considered that the attackers have knowledge of the proposed watermark mechanisms but not of the original watermark, w, which was embedded into the video sequence. Therefore, the attackers might add a second watermark into the watermarked video sequence to remove the effect of the first watermark. As the attackers have no idea of the embedded watermark pattern, we embedded a random watermark pattern into each video sequence. Fig. 9 shows the mean of the SVs of the DTCWT coefficients in each direction after a multiple watermark embedding attack, for four test sequences. In each direction, for each sequence shown in Fig. 9, the first (i.e., blue) bar is the mean SV of the original unwatermarked video sequence, the second (i.e., red) bar shows the mean SV of the watermarked video sequence, and the final (i.e., orange) bar describes the mean SV of the randomly-watermarked video sequence where the second watermark was embedded into the already watermarked sequence. For each sequence, it is clear from the blue bars (i.e., the first bar in each direction, for the unwatermarked videos) that the difference between the mean of the SVs of each pair is very small, especially the mean differences D 1 and D 2 . However, we created a large difference between them by embedding the watermark using (11), as shown in the second, red bars. In the last, orange bars, i.e., after embedding the second (random) watermark, although the difference between the directions in a pair is not large enough, the signs of both mean differences D 1 and D 2 are the same. Hence, the same bit patterns from both pairs will be extracted and the presence of the watermark will be detected accurately.
In order to justify the watermark detection accuracy for a multiple watermark embedding attack in terms of the FNR, we experimented using 20 different random watermark patterns for the watermark embedding strengths, β = 0, 2, · · · , 90. For each strength and video sequence, a random watermark pattern was embedded into the already-watermarked sequence and repeated for other random patterns. It should be noted that there is no effect of VOLUME 10, 2022 FIGURE 10. False negative rate (FNR) and peak signal-to-noise ratio (PSNR U ) after the multiple watermark embedding attack into the U channel for different embedding strengths. the second (random) watermark when β = 0, i.e., it does not modify the original watermarked sequence. Fig. 10 shows the FNR for different embedding strengths and corresponding average PSNR U of the randomly watermarked U channel. In this figure, the PSNR U at β = 0 indicates the quality of the watermarked U channel using the proposed technique before applying the multiple watermark embedding attack. It can also be seen that the PSNR U of the randomly watermarked U channel decreases with increasing embedding strength, although the FNR of the proposed detection is zero or close to zero. Therefore, it can be evident that the proposed watermarking algorithm is secure against a multiple watermark embedding attack. The proposed technique can be employed to preserve copyright of the videos' producers or owners. The technique can guard against illegal sharing with untrusted applications such as those used on social media networks.

E. COMPUTATIONAL EFFICIENCY
The computational complexity of the proposed algorithm was compared with Prasetyo's, Lee's and Ling's methods. The experiments were conducted on a computer with a 2.40 GHz Intel(R) Core(TM) i5-6300U CPU and 16 GB of RAM running on a Windows 7 operating system. The proposed, Lee's and Ling's methods were implemented in MATLAB 9.4, and Prasetyo's approach in Python 2.7.18. We computed the run time of the watermarking embedding and extraction algorithms for each frame of a video sequence. The average embedding and extraction times per frame using the proposed, Prasetyo's, Lee's and Ling's methods are summarized in Table 14. This table shows that the proposed embedding and extraction algorithms are faster than the state-of-the-art algorithms.

VI. CONCLUSION
We proposed a novel video watermarking method that inherits the advantages of both SVD and DTCWT. That is, it embeds the watermark in the SVs of the DTCWT coefficients of the chrominance channel. We chose to use this channel as, for the case of watermark imperceptibility, it supports a higher strength watermark than would have been possible using alternative luminance embedding algorithms.
We examined the imperceptibility of our method by both subjective and objective quality assessments. From these experiments, we found that our method embeds an imperceptible watermark and outperforms the state of the art.
The DTCWT decomposition level for extracting the watermark was selected based on the resolution of the sequence. This approach helps to maintain robustness to aspect-ratio changes and any arbitrary downscaling in resolution. Note that the original video is not required during decoding, i.e., the proposed method is blind. The combined benefits of the SVD and DTCWT enhance the robustness of our scheme to geometric attacks such as upscaling, cropping and rotation. The effectiveness of our algorithm was experimentally validated against the addition of white Gaussian noise, H.264/AVC compression and combinations of these attacks. Finally, our proposed blind watermarking method outperforms existing techniques in robustness against frame-rate change and camcording attacks.