Selective Encryption of the Versatile Video Coding Standard

versatile video coding (VVC) is the next generation video coding standard developed by the joint video experts team (JVET) and released in July 2020. VVC introduces several new coding tools providing a significant coding gain over the high efficiency video coding (HEVC) standard. It is well known that increasing the coding efficiency adds more dependencies in the video bitstream making format-compliant encryption with the standard more challenging. In this paper we tackle the problem of selective encryption of the VVC standard in format-compliant and constant bitrate. These two constraints ensure that the encrypted bitstream can be decoded by any VVC decoder while the bitrate remains unchanged by the encryption. The selective encryption of all possible VVC syntax elements is investigated. A new algorithm is proposed to encrypt in format-compliant and constant bitrate the transform coefficients (TCs) together with other syntax elements at the level of the entropy encoder. The proposed solution was integrated and assessed under the VVC reference software model version 6.0. Experimental results showed that encryption drastically decreases the video quality while the encryption is robust against several types of attacks. The encryption space is estimated in the range of 15% to 26% of the bitstream size resulting in a lightweight encryption process. The web page of this work is publicly available at https://gugautie.github.io/sevvc/.


Introduction
Security and confidentiality of multimedia contents are of prominent importance in many applications to ensure safe storage and transmission of images and videos.The straightforward solution to perform secure transmission of a video is to encrypt the whole video bitstream with a secure encryption protocol such as advanced encryption standard (AES) [1].However, this solution when applied to video has several limitations related to their high computational complexity increasing both the energy footprint and end-to-end latency.This increase in complexity/latency is mainly caused by the processing complexity of the encryption algorithm used to cipher the whole video especially when the video is encoded at high bitrate.Moreover, the deciphering and ciphering processes are required to perform post-processing operations such as transcoding for network adaptation.This may harm security since the secret key is shared with untrusted middlebox in the network to perform splicing, quality monitoring, watermarking and transcoding.The selective encryption solution has emerged as an effective alternative to perform secure and low complexity encryption of images and videos [2].The encryption process is performed in the compressed-domain where only a set of the most sensitive information is encrypted.This enables performing both format-compliant and constant bitrate encryption.The format-compliant property is very important enabling to decode the video bitstream without deciphering and thus all post-processing operations can be performed including packaging and transcoding without requiring access to the secret key used for encryption.Moreover, this property enables encrypting only some spatial regions in the image identified as region of interest (ROI) while keeping the rest of the image clear.The constant bitrate property preserves the encoder coding efficiency.Selective encryption has been widely investigated for different still image and video coding standards including JPEG [3], JPEG-2000 [4,2], advanced video coding (AVC) [5], scalable video coding (SVC) [6] and more recently HEVC [7,8,9] and its scalable extension SHVC [10].Selective encryption of the HEVC standard has been widely investigated in the literature [11,12,13,14] enabling format-compliant, secure and low complexity encryption.
The ISO/moving picture experts group (MPEG) and ITU/video coding experts group (VCEG) developed the next generation video coding standard called VVC.This latter, released in July 2020, introduces new coding tools outperforming HEVC by up to 50% in terms of bitrate reduction for a similar visual quality [15].To the best of our knowledge, format-compliant and constant bitrate encryption of VVC has not yet been addressed.Moreover, it is well known from information theory [16] that enhancing the coding efficiency adds more dependencies in the bitstream making format-compliant and constant bitrate encryption more challenging.
This paper investigates a format-compliant and constant bitrate encryption of a video bitstream encoded with the VVC standard.To meet these two constrains, the encryption is performed at the level of the context-adaptive binary arithmetic coding (CABAC) engine.We first investigate all possible syntax elements that can be encrypted in both format-compliant and constant bitrate.A set of VVC syntax elements including TC values and signs, chroma prediction candidate, motion vector (MV) differences and signs are encrypted.We propose a new algorithm that determines the encryptable bins within the TCs.The proposed selective encryption solution has been extensively assessed under the VVC common test conditions (CTCs) using three image and video quality assessment metrics including peak signal to noise ratio (PSNR), structural similarity (SSIM) and video multimethod assessment fusion (VMAF), and security metrics such as encryption qualilty (EQ) [17], histogram analysis, edge detection and edge differential ratio (EDR) [18].
The proposed solution has also been tested against brute force attack, number of pixels change rate (NPCR) and unified average changing intensity (UACI) [19].The encryption space giving the percentage of encrypted bits in the bitstream varies in the range of 15% to 26% for different targeted bitrates.This results in a very low decryption complexity which remains lower than 6% of the decoding time.
The rest of this paper is organized as follows.Section 2 gives a brief review on selective encryption solutions proposed for HEVC and then Section 3 describes the entropy coding of syntax elements in VVC.Section 4 presents the proposed solution to encrypt VVC syntax elements in format-compliant and constant bitrate.The performance of the selective encryption solution is assessed in Section 5 in terms of video quality degradation, resilience to different attacks and complexity overhead.Finally, Section 6 concludes this paper.

Related works
In this section, we review the existing solutions for HEVC standard encryption.The first format-compliant encryption solution of HEVC was proposed by Shahid et al. [7].In this solution, AES was used in cipher feedback (CFB) mode to perform selective encryption of the selected syntax elements at the CABAC stage.This work considered an earlier version of the HEVC standard and some encryptable syntax elements of HEVC were not identified in this solution.Farajallah et al. [8] proposed a selective encryption solution to cipher the ROI in HEVC standard.This solution relies on the tile concept introduced in HEVC enabling a frame partitioning into independent rectangular regions.The encryption process encrypts only tiles within the ROI and keeps the background clear.The tiles within the ROI are encrypted in format-compliant and constant bitrate by ciphering only a set of sensitive syntax elements.Moreover, to prevent the encryption propagation outside the ROI, MVs of the background tiles are constrained to only refer to background area (no ROI) in the reference frames.Boyadjis et al. [9] presented a selective encryption algorithm in order to increase the visual distortion.The presented research moves selective encryption from bypass mode to regular mode, which negatively affects the bitrate.Luma intra prediction modes are selected to be encrypted in addition to the residuals.The presented solution enables more scrambling performance while the compression efficiency has changed leading to a slight bitrate increase.Hamidouche et al. [10] investigated a selective encryption of the final version of HEVC.The authors have proposed a real time selective encryption solution for the scalable extension of HEVC named scalable HEVC (SHVC).The presented solution has analyzed all SHVC syntax elements in order to perform format-compliant, constant bitrate and low latency encryption while preserving all SHVC features.The presented results showed the high security level of the selective encryption solution with a low complexity overhead below 6% of the decoder complexity.Van Wallendael et al. [11] presented a format-compliant selective encryption solution for HEVC.They selected a set of syntax elements from HEVC that preserve the format compliance.Several techniques to selectively encrypt the video are investigated.The obtained result showed that most of the selected syntax elements have a low effect on the rate-distortion performance while having a broad range in scrambling performance.Memos et al. [12] presented an algorithm that encrypts only Intra (I) frames of the HEVC bitstream based on the idea that predicted (P) and bidirectional predicted (B) frames are useless without I frame.Moreover, encrypting only I frames will decrease the encryption time by 50% and propagates the encryption to other frames.The presented algorithm merged two algorithms proposed in [20,21], while introducing some modifications to the selection and management of the encrypted data to be amendable to HEVC.This work relies on the AES algorithm for secure transmission of HEVC bitstream with 256 bits as key length.It collects sign bits of each transform coefficient of I frames until the collected signs reached 256 bits.However, it is not clear in the proposed algorithm whether the collected bits are used as key value or as state value since AES-256 state size is 128 and not 256.The proposed algorithm performs conventional AES encryption on the collected bits and swap the original sign bits by the encrypted ones.Finally, the Shamir's secret sharing (SSS) input parameters are collected from the non-zero alternating current (AC) coefficients of each transform block within the I frame.It is clear from the description that the proposed algorithm performs partial encryption algorithm.It is also important to note that the proposed solution is not format-compliant, none constant bitrate since it increases the bitstream size at least by 8%.Long et al. [13] presented a format-compliant encryption in order to secure HEVC streams in multimedia social networks.The presented algorithm is tightly integrated with the encoding/decoding processes.The presented work performs encryption in two steps.First, a stream cipher is used to encrypt sign of the nonzero TCs, and the first sign bit hiding of TCs.Second, based on a control factor, only one parameter from merging index, MV prediction index, sign of MV difference and reference frame index is encrypted.The presented research increases the bitrate, while it is format-compliant solution.Finally, the presented work was assessed regarding security and complexity which confirms the good security level and acceptable complexity overhead.Ahmed et al. [14] presented a new solution for efficient selective encryption based on the chaotic logistic map for HEVC.The presented solution encrypts the sign bit of the MV differences and the TCs.The encryption process is performed at the entropy coding stage of the HEVC encoding process.They focused on achieving a low complexity ciphering targeting real time applications, constant bitrate and format-compliant encryption.The presented work was compared with the solution proposed in [11] and the obtained results confirm the suitability for real time applications with an intermediate level of security.Peng et al. [22] presented a tunable selective encryption scheme for HEVC based on chroma Intra prediction mode (IPM) and TCs scrambling.
The presented work has two security levels.The first one encrypts HEVC syntax elements including Luma IPMs, Chroma IPMs, the suffix part of the TCs, sign and value of the MV differences, merge index, advanced MV prediction, reference frame index, and sample adaptive offset (SAO) filter parameters.The second security level relies on edge extraction of each transform block.The transform block coefficients are scrambled to increase the security level only when the current transform block contains edges.Finally, the AES is used in CounTeR (CTR) mode in order to generate the pseudo-random number sequences.These sequences are used to encrypt all previously mentioned parameters with a simple eXclusive OR (XOR) operation.Xu [23] proposed to perform data hiding inside the selected encrypted bitstream of HEVC.The secret message is hidden using a quantized transform coefficient (QTC) modification technique.It only changes bits value based on the data hiding without changing the data size in bypass coding, which confirms that the obtained solution is constant bitrate and format-compliant.Since the used operation is a XOR, the extraction process of the hiding data can be achieved on both encrypted as well as original videos.Obtained results confirm the resilience of the presented work against replacement attacks.Moreover, the degradation on the video quality introduced by data hiding is negligible.However, the presented algorithm was not evaluated regarding important general video attacks such as UACI, NPCR, EDR, EQ, histogram analysis and key sensitivity attacks.

CABAC ENGINE IN VVC
The CABAC engine defined in VVC is similar to HEVC consisting of three main functions: binarization, context modeling and arithmetic coding [25].The overall CABAC architecture is illustrated in Fig. 1.First, the binarization step converts syntax elements to binary symbols (bins).Second, the context modeling updates the probabilities of bins, and finally the arithmetic coding compresses the bins into bits according to the estimated probabilities.

binarization methods
Six binarization methods are used in VVC, namely unary (U), truncated unary (TU), fixed length (FL), truncated binary (TB), truncated rice code with context p (TRp) and Exp-Golomb k-th order code (EGk).The U code represents an unsigned integer B with a binstring of length B + 1 composed of B 1-bins followed by one 0-bin.The TU code is defined with the largest possible value of the syntax element cM ax (0 ≤ B ≤ cM ax).When the syntax element value B < cM ax, the TU is equivalent to U code, otherwise B is represented by a binstring of cM ax 1-bins.The FL code represents a syntax element B with its binary representation of length log 2 (cM ax + 1) with x is smallest integer greater than or equal to x.The TB code is similar to the FL code, except when the cM ax + 1 value is not a power of 2. In this case, let k be k = log 2 (cM ax + 1) (with x is largest integer less than or equal to x).The first u = 2 k+1 − cM ax elements are coded with a FL code of length k.The remaining cM ax + 1 − u symbols are offseted by u and coded by k + 1 bins.The TRp code is a concatenation of a quotient q = B/2 p and a remainder r = B − q2 p .The quotient q is first represented by the TU code as a prefix concatenated with a suffix r represented by the FL code of length p.The EGk code is also a concatenation of prefix and suffix.The prefix part of the EGk code is the U representation of l

TRANSFORM COEFFICIENTS (TCS) CODING
In this section we describe the CABAC coding of the TCs.Similar to HEVC, VVC coefficients are either coded in regular TCs mode or transform skip (TS) mode.In both modes, the transform block is first divided into sub-blocks.

VVC TC CODING MODE
The coefficients of each sub-block are encoded in three passes as illustrated in Fig. 2 for a 4×4 sub-block.The coefficients are processed in reverse diagonal scan order, as depicted in Fig. 3a.The first pass processes a group of flags until it reaches a limit of used bins specified by the standard.This maximum number of bins used in the first pass is computed with respect to the block size (W b × H b ) as follows (2 log 2 (W b )+log 2 (H b ) ) 7/4 .Once this limit is reached, the second pass starts encoding the remainders computed from the coefficient value C as follows The coefficients of value lower than 4 are binarized by the flags in the first pass.The second pass relies on the TRp/EGk binarization, until the position of the last coefficient processed by the first pass is reached.Then, the dec_abs_level syntax element, computed by (2) for the remaining coefficients is bypassed and binarized also using the TRp/EGk binarization.
the V constant is derived from a lookup table (LUT) V Arr according to the state and the local absolute sum LocAbsSum computed for the current coefficient by (3).The V value updates the 0 coefficient value such that coefficients have smaller binarization when large coefficients are mixed with definite 0 values.The LocAbsSum is a saturated sum in the interval [0, 31] of a set of neighboring coefficients S 1 illustrated in green in Fig. 4a LocAbsSum where BaseLvl is equal to 4 for the abs_remainder (Pass 2-1) and 0 for the dec_abs_level (Pass 2-2).
Finally, the third pass encodes the signs of the coefficients.We can notice that only the first pass relies on CABAC context coding and the last two passes perform bypass coding.The abs_remainder and dec_abs_level syntax elements are both binarized by a combination of TRp and in a special case EGk code.This binarization is presented in Section 3.2.3. .

VVC TS CODING MODE
In transform skip mode, the coefficients of each sub-block are also encoded in three passes that process the coefficients in a simple diagonal scan order, as shown in Fig. 3b.The first pass mainly encodes all coefficients considered as significant (ie.C = 0) including its sign and parity.The second pass encodes more flags to check whether the coefficient is greater than a certain threshold.Finally, the third pass encodes the remainder coefficients greater than 10 using the TRp/EGk binarization of abs_remainderT S The local absolute sum in the case of TS mode LocAbsSumT S is computed by (5) as follows It should be noted that the third pass relies on bypass coding.

BINARIZATION PROCESS
Algorithm 1 gives the binarization process of the abs_remainder.The dec_abs_level and abs_remainderT S syntax elements are also binarized by this algorithm.The TCs remainders are binarized using either a TRp code, introduced in Section 3.1, or an EGk code limiting the maximal length of a binarization to 32 bits as presented in Algorithm 2. The selection between the two binarizations depends on a threshold value β defined in the standard as where BinReduc is set to 5, and cRiceP aram ∈ {0, 1, 2, 3} is the rice parameter derived from a LUT riceArr according to the saturated local absolute sum LocSumAbs of previously coded coefficients computed by (3).Fig. 4a illustrates in green the set of coefficients S 1 used to derive the rice parameter of the current coefficient highlighted in yellow.Similarly, Fig. 4b presents the coefficients used in TS mode, where the rice parameter depends only on the top and left neighbor coefficients set S 2 .When the remainder to encode is strictly below the threshold β, the TRp binarization is preformed with p = cRiceP aram.Otherwise, the limited EGk coding is applied.
Algorithm 2 shows that the maximum length of the prefix maxP ref ixLen depends on the range of the transform coefficients 2 log2T rRange and BinReduc.To differentiate between the two binarizations at the decoder side, BinReduc is added to the prefix length when Limited_EGk is used.Then, a classical EGk binarization starts.However, if the computed prefix length pref ixLen is equal to the maximal prefix length maxP ref ixLen, the suffix length suf f ixLen is set to log2T rRange.Both codes are composed of a variable-length pref ix and if exists, a fixed-length suf f ix.The prefix is coded using a U or TU code representation which implies that changing any bin will violate the decoder standard or change the bitrate.On the other hand, the suffix might be encrypted in format compliance and constant bitrate only when the LocSumAbs does not change the cRiceP aram value of the neighbor coefficients.
+abs_remainder mod bitM ask //where a mod n gives the remainder of the euclidean divison of a by n

Proposed VVC selective encryption
This section presents a new selective encryption scheme for VVC standard.The proposed selective encryption fulfills two important features: standard format-compliant encryption (i.e. the bitstream must be decodable by any VVC decoder) and constant bitrate encryption (i.e.preserve the VVC compression efficiency).
Figure 4: TCs dependencies: coefficients highlighted in green are used to compute the local absolute sum of the current coefficient in yellow for (4a) TC mode and (4b) TS mode.S 1 and S 2 are two sets of green coefficients in (4a) and (4b), respectively.
The encryption is performed at the CABAC level of the encoder.Fig. 1 depicts in green the position of the selective encryption in the CABAC engine.The encryption is performed after the binarization process, and only a set of selected syntax elements, listed in Table 1 are ciphered.The encryption involves syntax elements from different coding tools including transform block, intra and inter predictions, and in-loop filters.This ensures the encryption of both intra (I) and inter (P and B) coded slices included in the VVC video sequence.The syntax elements, listed in Table 1, have been selected based on following two criteria: • The syntax element is bypassed: this restriction preserves the VVC coding efficiency.
• Changing any bin will not change how the binstring is read by the decoder: this restriction ensures formatcompliant encryption by excluding most of flags and syntax elements binarized by variable length codes.
The encryption of the most syntax elements listed in Table 1 is straightforward except the TCs that requires a specific processing to search for the encryptable bins.In the next section, we describe the encryption of the TCs since it is the Figure 5: The current coefficient (in yellow) is used to compute the local absolute sum of the coefficients highlighted in green for (5a) TC mode and (5b) TS mode.Coefficients highlighted in red are used along the current coefficient in the prediction of the parity of the coefficients in green.S1 and S2 are two sets of green coefficients in (5a) and (5b), respectively.
most challenging syntax element to encrypt.The coding of the TCs introduces dependencies that need to be carefully addressed to perform format-compliant and constant bitrate encryption.

TRANSFORM COEFFICIENT ENCRYPTION
This section presents how the TCs are encrypted.As explained in Section 3.2.3, the binarization of the TCs and especially the length of the suffix depends on the previously encoded TCs.Therefore, encryption that changes the value of the coefficients may introduce bitrate increase.Indeed, the binarization depends on a rice parameter cRiceP aram ∈ {0, 1, 2, 3} derived from previous TCs.This rice parameter defines the fixed length of the suffix and therefore it corresponds to the size of the encryptable bins.After an analysis of the binarization algorithm, multiple conditions ensuring constant bitrate have emerged and are presented below.
First, it is important to note that the coefficients are binarized in two different ways depending on whether they are processed by pass 2-1 or 2-2, as presented in Fig. 2.
• The encryption must not change the parity of the coefficient: changing the parity will result in changing the state of the CABAC context.The state is updated using the previous state value and the parity of the current coefficient.• The encryption must not change the rice parameter: this will affect the bitrate.
• The encryption must not change the V value for coefficients processed by pass 2-2: changing the value of this parameter can result in changing the parity, and thus the CABAC context.
Considering those conditions, Algorithm 3 is proposed to identify the bins within the TCs that can be encrypted in constant bitrate and format compliance.The rice parameter of each TC is derived from a saturated absolute sum of the local neighborhood of the current TC.Fig. 5a depicts in green the set S1 of affected TCs if the current TC in yellow is modified by encryption.Therefore, for each affected coefficient, Algorithm 4 checks whether the changes in the local absolute sum will affect the context, the rice parameter and the V value.To make sure that the parity is not changed, the encryption excludes the least significant bit (LSB) of the suffix and will perform encryption only when the rice parameter is greater than 1 (cRiceP aram ∈ {2, 3}).
The encryption of the coefficients in TS mode is similar.The main difference lies in how the rice parameter is derived.Fig. 5b shows the set S2 of affected coefficients in green when the current coefficient (in yellow) is modified by encryption.The current coefficient is binarized using a prediction based on the top and left coefficients.Therefore, the ciphered value of the current coefficient must remain lower or equal than the coefficient depicted in red in Fig. 5b according to the used prediction scheme.Algorithms 3 and 4 are used to check that the encrypted bins of the current binarized coefficient are not affecting how the neighbor coefficients will be encoded.This enables defining the bins that can be encrypted in format-compliant and constant bitrate.The proposed solution is carried out as follows: • Algorithm 3 checks at the binarization process whether the TC can be encrypted or not.The encryption is possible only when the absolute value of the coefficient is different from 0 (absLevel = 0) and the value of the derived rice parameter is above 1 (cRiceP aram > 1).
• The algorithm then computes the minimum and maximum values of the encrypted remainder (remM in, remM ax) of the current coefficient.The minimum (absCM in) and maximum (absCM ax) absolute values of the coefficient are derived from their respective remainders.
• Algorithm 4 checks for all coefficients in S1 depicted in green in Fig. 5a (when they exist) whether ciphering the current coefficient (X c , Y c ) will affect its neighbor coefficients (X p , Y p ), the rice parameter or the V value.This operation is performed in five steps as follows: 1.The algorithm computes a saturated absolute sum of the tested coefficient of coordinates (X p , Y p ) (AbsSumP 1 = i∈S1 min(4 which is used for the context computation, with S 1 the set of neighbor coefficients of (X p , Y p ).This operation is performed at the first pass (P1) to check the CABAC context change and set the N oCtxChange flag to true if the changes on the AbsSumP 1 will not affect the context.
Step 5 return N oCtxChange and N oRiceP arChange and N oV Change 3.Then, the algorithm checks whether the rice parameter will be affected with the different computed sums in step 2 and sets a flag N oRiceP arChange to true if the rice parameter remains unchanged with all possible tested conditions.I R is a LUT containing, for each rice value, the interval in which the the local absolute sum does not change the rice parameter.4. The parameter V is computed only in pass 2-2.At this fourth step, the algorithm checks for the processed coefficients if the parameter V remains unchanged to set the flag N oV Change to true.Similar to I R , I P returns, depending of the state and V values, the interval in which the local absolute sum does not change the V value. 5. Finally, Algorithm 4 returns true when N oCtxChange, N oRiceP arChange and N oV Change are all equal to true.
The decoder performs inverse operations performed by the encoder for deciphering.The decoder first decodes the TCs and then it searches for the encryptable coefficients using Algorithms 3 and 4. Finally, the deciphering will process only the identified encrypted bins.

ENCRYPTION METHOD AND SYNCHRONISATION
The syntax elements to cipher are now defined.To cipher the syntax elements of a variable length, a stream cipher is more suited for this application.As the minimum error propagation is one of the most desirable properties in video encryption, we use the AES algorithm in CTR mode as a pseudo-random number generator (PRNG) to encrypt the identified syntax elements.It is important to note that CTR counter value should not be reused, which is adopted in our solution [26].Meanwhile, other stream ciphers such as Rabbit [27], light weight chaos-based (LWCB) stream cipher [28], HC-128 [29] or even block ciphers like AES in CFB mode, can be used as well.A stream cipher produces a cipher text C using a XOR operation between the plain text P and the output steam X g produced by a PRNG, To revert the encryption, a XOR between the cipher text and the same PRNG output is performed.Thus, a perfect synchronization between the encoder and the decoder is required.Most of the syntax elements are systematically ciphered and do not dependant on the position or the context.However, the syntax elements associated to the TCs are ciphered only if they meet conditions previously described in Section 4.1.One of this conditions relies on the neighbor coefficients, implying that the last decoded coefficient needs to be deciphered first.To allow this behavior, for each significant coefficient, encryptable or not, the PRNG generates a sample equal to the size of the rice parameter, e.g. the maximum encryptable size.The unused samples are discarded to keep the encoder and the decoder perfectly synchronized.In CTR mode, one bit flipping caused by transmission errors will only affect one bit during the deciphering process which minimizes the error propagation.

Results and Discussions
In this section, we first present the experimental setup, followed by an assessment of the video degradation introduced by the selective encryption, then a security analysis will be presented, and finally a complexity evaluation is provided.

EXPERIMENTAL SETUP
The experiments are carried-out under the CTCs of the VVC standard.The CTCs define several test video sequences of different resolutions, and five quantization parameters (QPs) are used QP ∈ {17, 22, 27, 32, 37}.The proposed encryption solution is implemented in the VVC test model (VTM) [30] version 6.0.VTM is the reference software implementation of both encoder and decoder of the VVC standard.The coding configuration without encryption is referred to as the Anchor.The video sequences are encoded with encryption in Random Access (RA) coding configuration.This latter is the common coding configuration used in broadcast and over-the-top (OTT) applications with an Intra period of 32 frames.The complexity measurements are performed on a desktop computer equipped with an Intel i7-7700 processor running at 3.60 GHz on Ubuntu 18.04 OS.

VIDEO QUALITY
The distortion introduced by the proposed solution on the test video sequences is assessed in this section.Three full-reference objective image and video quality metrics are computed on the encrypted video sequences with respect to  the original.PSNR is used to evaluate the video quality based on the mean squared error computed over the frame pixels [31].The PSNR is computed as a weighted sum of the PSNR scores of the three color components.SSIM explores the structural similarity between the original and the decoded frame.It is important to note that a SSIM value close to 1 refers to decoded frame of a similar quality as the original frame [32].Finally, VMAF is a video quality metric that predicts the perceived quality score of a video sequence [33], where a score of 100 indicates a good perceptual video quality and 0 refers to a very low perceived video quality.
Table 2 presents the PSNR performance over all video sequences at the five considered QPs.We can notice that the PSNR drops at QP 17 from 44.77 dB in average to around 10.17 dB.The same PSNR values of encrypted videos are reached on different QPs values.This indicates that the proposed solution significantly decreases the objective quality of the encrypted video.Campfire and SlideShow encrypted video sequences have a very low PSNR values.For Campfire, it can be explained by its texture and complex shapes associated to high motion that increase the encryption  ) and with selective encryption (6f -6j) at five QPs space and thus improving the quality of the encryption.Concerning SlideShow, shapes are less complex however the encryption is able to flip the colors causing noticeable quality degradation with lower PSNR scores than the average.Table 3 presents the average SSIM scores for seven video classes at different QPs.The proposed encryption solution enables to reduce the SSIM from around 1 to 0.25.The obtained SSIM value confirms that the proposed solution introduces a drastic distortion on the structure information within the encrypted video frame.We can notice that SSIM scores of class E video sequences are higher than the average scores.These video sequences have low motion and less texture compared to other sequences.This improves the coding efficiency and decreases the performance of the selective encryption since less syntax elements are encrypted.
Finally, Table 4 presents the VMAF scores which also emphasize the large degradation of the subjective video quality as a result of using the proposed encryption solution.
Fig. 6 illustrates the frame #10 of RaceHorcesC video sequence decoded at five QP values with and without encryption.The visual quality of decoded encrypted video is very low making difficult to recognize objects and colors in the video frame at all QP values with PSNR scores around 11 dB.

ENCRYPTION SPACE
The computational time of encryption mainly depends on the encryption space of any ciphering process.However, the robustness and security level will be enhanced by increasing the encryption space.In selective encryption, the somehow robust encryption algorithm and low computational overhead as outcome of the encryption is the target.Table ?? presents the encryption space of the proposed encryption solution as the percentage of encrypted bits by syntax element on the whole bitstream.The quality degradation of selective encryption is achieved by ciphering only 26.66% and 15.42% of the bitstream at high and low bitrates, respectively.We can notice that the largest encryption space is enabled by the encryption of the TCs while the part of other syntax element less present in the bitstream remains negligible (< 2%).

SECURITY ANALYSIS
In the previous section we only assess the visual degradation achieved by the proposed encryption.In this section we focus on the quality of the proposed encryption and its robustness against different types of attacks.

ENCRYPTION QUALILTY (EQ) ANALYSIS
The algebraic summation of differences between pixels distributions of the original frame H(P ) and the encrypted fame H(C) is called EQ.This latter is computed as follows [17] EQ =  The higher EQ value is, the more secure is the selective encryption solution.Table 6 presents the EQ values for all video classes at the five considered QPs.The presented values are the average EQ over encrypted frames and video sequences of each class.The EQ does not have a relative point for comparison.A derivation from ( 8) is proposed to compute the upper bound value of the EQ [10] as follows where H and W are the video height and width, respectively and d is the bit depth.The upper bound value of the EQ is reached when the histograms of the two frames H Z (C) and H Z (P ) are not overlapping.
The average EQ values are within the interval [2407, 2680] in average with a theoretical average upper bound of 5199.
The HEVC selective encryption solution proposed in [10] achieved an EQ value for Kimono video sequence higher than 38.54% of its maximum EQ, and an EQ value for PeopleOnStreet video higher than 40.92% its maximum EQ.The proposed solution of different videos at different configuration ranges between 46.29% and 51.56% of the maximum EQ which confirms that the proposed solution has a high security level regarding the encryption quality metric.

HISTOGRAM ANALYSIS
Histogram of encrypted frame should be more uniform than original frame histogram in order to resist to statistical analysis based attacks [34,35].Fig. 7 illustrates the histograms of frame #10 of RaceHorcesC video sequence before and after selective encryption at five considered QPs.The histograms of the encrypted frames is completely different from the original frame histogram.In fact, the proposed encryption solution changes the distribution of the decoded pixels toward different pattern which is close to uniform distribution especially at lower bitrate (ie.high QP).We can also notice that in contrast to full encryption, it is difficult for constant bitrate and format compliant selective encryption to reach the uniform distribution of the histogram at all coding configurations and video contents.

EDGES AND STRUCTURAL INFORMATION PROTECTION
Edge detection enables assessing the ability of an encryption solution to hide the edge information in the encrypted frame.This section evaluates the ability of the proposed encryption solution to hide the edge in the encrypted video

SENSITIVITY TO SECRET KEYS
Key sensitivity attacks are mainly based on the fact that the adversary tries to decipher the encrypted frames using a key close to the secret key used for encryption.The second adversary scenario is to guess the key if the encryption system provides information related to the used secrete key such as the sensitivity of the encryption regarding small change in the key.The proposed encryption algorithm should produce a completely different encrypted frame when a slight change (one bit change) on the used secrete key [37].Evaluation of the system robustness against key sensitivity attacks can be assessed using many existing tools such as UACI and NPCR [38,39].To compute these metrics, one random key is generated K 1 and a key with only one bit difference K 2 is created.The two keys are then used to cipher  the same frame of width W and height H and a bit depth d.The result will create a ciphered frame C 1 using K 1 , and C 2 using K 2 .The UACI and the NPCR are defined as follow with: The optimal NPCR and UACI values of a secure image encryption scheme against key sensitivity attacks are 99.58% and 33.46%, respectively [40].
Table 8 presents the obtained NPCR and UACI values on the CTCs for all video classes at three QPs.Here, it is important to note that the NPCR and UACI results of the selective encryption should not be analysed as in full image encryption.However, the obtained values can give an indication on the ability of the selective encryption to resist key sensitive and differential attacks.The average NPCR values at the three QPs for all classes are very close to the optimal value of a secure encryption scheme against key sensitivity attacks.Moreover, the average UACI values lie in the interval [19.60, 37.79] with an average value over all classes that converges to the optimal value of 33.64.This performance in terms of both NPCR and UACI proves the robustness of the proposed selective encryption solution with regards the key sensitivity and differential attacks.

BRUTE FORCE ATTACK
Brute force attack or exhaustive search attack performs testing all possible values of the used secret key in order to partially or completely break the cipher.[41] It is well-known, that any encryption algorithm with at least 128 bits as secret key is considered as resilient to brute force attack, which is the case for the used AES algorithm.In selective encryption, the total number of tries to correctly guess the selected encrypted bits should be at least 2 128 tries in order to resist to brute force attack [42,43].Our proposed selective encryption algorithm relies on AES in counter mode as stream cipher with a secret key size of 128 bits.Moreover, the size of the encryption space is very large.

ERROR CONCEALMENT ATTACK
The error concealment attack is a kind of attack based on guessing the encrypted bits based on some assumptions.However, since the encryption space of a VVC video is large, the only scenario that the adversary can follow is to try replacing all encrypted bits with the same value (zero or one) and decipher the modified encrypted frame [44,45].In order to evaluate our proposed solution regarding error concealment attacks, all encrypted bits are replaced by zero and then PSNR, SSIM and VMAF are calculated again under the same CTCs.Table 9 gives the average PSNR, SSIM and VMAF scores of video sequences deciphered with replacement attack at the five QPs.The obtained quality scores are similar or even worst compared to encrypted video with AES generator presented in Section 5.2.This confirms that the proposed selective encryption solution is robust against attacks based on replacement bits.

COMPLEXITY ANALYSIS
The aim of this section is to assess the complexity of the proposed encryption solution.The complexity overhead is computed only for the decoder, since the encryption overhead is negligible with respect to the encoding time.
Table 10 gives the deciphering time ∆ SE in second and the deciphering complexity overhead in percentage for all video classes at three QPs.The deciphering time does not exceed 3 seconds even for high bitrate and high resolution 4K videos of classes A and B. This corresponds to less than 6% of the total decoding time.The average deciphering overhead remains lower than 4.23% observed at high bitrate presenting more TCs to cipher.

Conclusion
In this paper a new selective encryption solution for the VVC standard was proposed.This solution encrypts at the CABAC level a set of VVC syntax elements in format-compliant and constant bitrate.The coding of the TCs in VVC introduces several dependencies making constant bitrate encryption more challenging.We have proposed an original algorithm that analyses the coding dependencies of the TCs to determine the number and positions of encryptible bins for each coefficient.The proposed encryption solution was integrated in both encoder and decoder of the VVC reference software VTM 6.0.The quality of the encrypted video was assessed under the VVC CTCs with three objective quality metrics including PSNR, SSIM and VMAF.The low obtained quality scores clearly show the quality degradation enabled by the encryption.Security analysis was also conducted to asses the robustness against several attacks including statistical, key sensitivity and brute force attacks.Finally, the complexity overhead of the deciphering at the decoder side is estimated and remains lower than 6% of the decoding time confirming the lightweight advantage of the proposed encryption solution.

Figure 1 :
Figure 1: Overall architecture of the CABAC engine in VVC.The selective encryption block is illustrated in green are used scan order in context coding mode scan order in bypass coding mode

Fig. 8
Fig.8illustrates the edges of the decoded frame #10 of RaceHorsesC sequence without encryption (first row) and with encryption at the second row for five different QPs.This figure clearly shows that the ciphered frames are noisy caused by high frequency structure introduced by the selective encryption.Therefore, structural information including edges in the ciphered frames are hidden and can be hardly explored by the EDR based attacks.
Algorithm 1 abs_remainder BinarizationInput: abs_remainder is the unsigned integer to binarize cRiceP aram is the rice parameter log2T rRange is the log2 of the TC range BinReduc ← 5 is the value used to determine the threshold between TRp and Limited_EGk

Table 1 :
Encrypted syntax elements in the proposed VVC selective encryption solution, all these syntax elements are bypass coded.
Algorithm 3 nbEncryptable = isEncryptable( X c , Y c , Coef f Arr, nbEncryptable, bypass, V ) Input: (X c , Y c ): the coordinate of the current pixel, Coef f Arr[][]: the array of coefficient value, nbEncryptable: the number of encryptable bits to test, bypass: true if the current coefficient is bypass, V : of the current pixel, set to 0 if bypass is false.Output: the number of encryptable bits.absLevel ← |Coef f Arr[X c ][Y c ]| if absLevel = 0 and cRiceP aram > 1 then absCM in, remM in ← computeM in(absLevel, nbEncryptable, bypass, V ) absCM ax, remM ax ← computeM ax(absLevel, nbEncryptable, bypass, V ) encryptable ← not (bypass and V ∈ [remM in, remM ax]) for p ∈ S1 do encryptable ← encryptable and checkSumChange(X p , Y p , absLevel, absCM in, absCM ax) end for if not encryptable and nbEncryptable > 1 then return isEncryptable(X c , Y c , Coef f Arr, nbEncryptable − 1, bypass, V ) Yp): the coordinate of the tested coefficient , absLevel: the absolute value of the coefficient (Xc, Yc) before encryption, absCM in: the minimal possible encrypted value of the coefficient (Xc, Yc), absCM ax: the maximal possible encrypted value of the coefficient (Xc, Yc).Output: true if ciphered value does not affect the context, the rice parameter and the C0, false otherwise.
2. The local absolute sum T rAbsSumP 21 is then computed by (3) for the coefficient of coordinates (X p , Y p ).The minimum possible value T rAbsSumP 21M in and the maximum value T rAbsSumP 21M ax are also computed by (3) with BaseLvl equals to 4.Input: (Xp,

Table 2 :
PSNR performance of the proposed selective encryption for all video sequences at five QPs.Anchor and ciphered configurations correspond to the video decoded without encryption and with selective encryption, respectively.

Table 3 :
Average SSIM performance of the proposed encryption solution for all video classes at five QPs

Table 4 :
Average VMAF performance of the proposed encryption solution for all video classes at five QPs

Table 5 :
Encryption space in percentage (%) per syntax element at five QPs

Table 6 :
Encryption Quality for CTCs video classes at five QP values

Table 7 :
Average EDR for CTCs video classes at five QP values EDR QP 17 QP 22 QP 27 QP 32 QP 37

Table 10 :
Deciphering time ∆ SE in second and deciphering overhead CO SE in % on Intel i7-7700 processor at 3.6 GHz.The average decoding run time is computed based on 100 decodings without deciphering (DecT Ref ) and with deciphering (DecT SE ).The deciphering run time ∆ SE is computed as a difference between decoding times with and without deciphering ∆ SE = DecT SE − DecT Ref , while the percentage of deciphering complexity overhead CO SE is derived as follows CO SE = ∆ SE DecT Ref 100%.