Start Code-Based Encryption and Decryption Framework for HEVC

In this article, we propose a new selective encryption and decryption framework based on the start code for high efficiency video coding. There is a growing need to encrypt video information to protect video content from privacy invasion and intellectual property infringement caused by information leakage. Although encrypting an entire video is a straightforward approach, the cost to encrypting a large amount of video data is substantial, considering the resulting computational complexity. Consequently, selective encryption algorithms have been actively researched in recent years and have contributed to reducing the computational complexity. However, existing selective encryption algorithms have certain drawbacks. For instance, it is difficult to separate the video encryption algorithm from the video compression algorithm because the encryption framework is based on the syntax elements. Further, a partial reconstruction of the encrypted video bitstream is often unavoidable. To solve these problems, the proposed method encrypts the bitstream based on the start code rather than on the syntax elements. Encrypting the bitstream partially, based on the start code, makes it easy to separate the video encryption algorithm from the video compression algorithm. Furthermore, encrypting the part adjacent to the start code protects the video content, as video reconstruction using a video decoder is impossible, unless the correct start code is returned to the bitstream. The experimental results show that the proposed method reduced the encryption and decryption times by approximately 97% and 98%, respectively, compared to the encryption and decryption of the entire video bitstream.


I. INTRODUCTION
With the increasing usage of video applications such as video on demand, video conferencing, and video surveillance, it has become very important to protect video content from unauthorized access and usage. Hence, there is a dire need for video content encryption, ranging from commercial videos to home security camera recordings, to protect video data from unauthorized access that could lead to leaked or hijacked videos, and consequently, illegal usage and privacy exposure.
Most video encryption algorithms can be classified into two types: naïve encryption algorithms (NEAs) and selective encryption algorithms (SEAs). A NEA encrypts an entire bitstream of video content using standard encryption algorithms, such as the data encryption standard (DES) [1], Rivest, Shamir, and Adleman (RSA) method [2], or advanced The associate editor coordinating the review of this manuscript and approving it for publication was Zahid Akhtar . encryption standard (AES) [3]. Because it encrypts an entire video bitstream, a NEA provides the best security when confidentiality is the top priority. Further, it is easy to implement when integrated with existing multimedia systems because it does not depend on video compression algorithms. However, it provides its functionality at the cost of a substantially increased computational complexity owing to the encryption of an entire large bitstream.
By contrast, SEAs reduce the computational complexity of encryption by encrypting only the highly sensitive portions of a video, thus overcoming the shortcomings of the NEA. The video bitstreams encrypted by SEAs result in a distorted visual quality when the video is decoded without the appropriate decryption, which degrades the information in the video and renders the video less comprehensible. The SEA might be more suitable for real-time applications owing to its low computational complexity compared to that of the NEA. However, the SEA remains vulnerable to partial decoding attempts, making them less desirable for highly sensitive video content demanding full protection. In addition, the SEA is, to a large extent, closely connected to video compression algorithms. It is often difficult to imbed a SEA within a video codec if it is impossible to implement the algorithms during the internal process of the video encoding and decoding (using either software or hardware). Because most standard video codecs are already implemented in hardware, the SEA is highly unlikely to be a viable solution.
In this article, we propose a novel video encryption algorithm that encrypts a portion of the bitstream adjacent to the start code. The proposed method achieves its protection by preventing an unauthorized decoder from decoding the bitstream unless the proper header information is restored. The computational complexity of the proposed method is comparable to that of the SEA because the bitstream is only partially encrypted. In particular, the proposed algorithm encrypts only the header information of the bitstream, which is typically one-byte long. In the worst-case scenario, 256 combinations would be required to restore the header information. The use of many start codes in the bitstream makes it virtually impossible to break the encryption in real time. In addition, the proposed method is easily integrated with existing multimedia systems, similar to the NEA, because it can be applied even after an entire bitstream has been generated.
The remainder of this article is organized as follows. In Section II, we describe the works related to NEAs and SEAs. Details of the proposed method are provided in Section III. In Section IV, we present the results of experiments conducted to evaluate the proposed scheme. Finally, we present the conclusions of the paper in Section V.

II. RELATED WORK A. NAÏVE ENCRYPTION ALGORITHM
Naïve encryption entails fully encrypting the video content following compression using standard encryption algorithms such as the DES, RSA, or AES. The NEA processes a video bitstream as a stream of binary data and encodes every word in the bitstream, regardless of the type of video codec. The NEA is one of the most secure video encryption algorithms because it is applied to entire bitstreams using standard encryption algorithms. In general, however, NEAbased encryption is not recommended for use in real-time video transmission applications of large video data because it is computationally expensive.
For the NEA, there are many standard encryption algorithms such as the DES, RSA, and AES. In [4] and [5], following evaluation on video encryption, the AES was reportedly the fastest of these standard encryption algorithms. Furthermore, according to some studies, including [6], the AES reduces the computational complexity to an extent. However, the overall improvement is marginal because the overall complexity of the AES is determined by the size of the input data.

B. SELECTIVE ENCRYPTION ALGORITHM
The primary limitation of the NEA is that it encrypts the entire data; hence, the computation time is directly proportional to the amount of data. For example, applying the NEA to a high-capacity video bitstream, such as a typical two-hour movie that is stored and transmitted in gigabits following compression, presents the problem of computational complexity. Therefore, the SEA has been studied as an alternative. A video bitstream consists of minimal data through which the original video is reconstructed by exploiting the redundancy of the original video data. Therefore, most (if not all) parts of a video bitstream are interdependent and the corruption of a small fraction of the bitstream may be sufficient to damage the entire bitstream, which could make it impossible to reconstruct the original video. The SEA exploits this characteristic by encrypting the video bitstream only partially. This approach enables it to protect the video data with much less computational complexity compared to that of the NEA.
Meyer and Gadegast [7] proposed a selective video encryption method called Secure MPEG (SECMPEG). In addition, Maples and Spanos [8] proposed a selective video encryption method called Aegis. Both SECMPEG and Aegis encrypt only the I-frame or keyframe information, which is critical for a decoder to decode normally. Furthermore, it was considered effective to encrypt only the I-frames using the standard encryption algorithm, the DES, because it made it difficult to properly reconstruct even P-and B-frames that were reconstructed with reference to the I-frame. However, the computational complexity was not significantly improved compared to that of the NEA because the I-frames in the video bitstream usually constitute between 30%-60% of the video size.
Tang [9] proposed the zig-zag permutation algorithm that reordered the transform coefficients in a zig-zag format after a discrete cosine transform (DCT) in the process of generating an I-frame in the video compression process. Because the zig-zag permutation algorithm rearranges the order of data in units of macroblocks constituting I-frames, the computational complexity is low. Experiments have shown that the computational complexity of the zig-zag permutation algorithm was only 1.56% of that of the NEA. Although the zig-zag permutation algorithm has a fast encryption speed, it is problematic in that it increases the size of the bitstream by approximately 50%.
Shi and Bharagava et al. [10] [11], Shi et al. [12] and proposed SEAs, such as the video encryption algorithm (VEA), modified VEA (MVEA), and real-time VEA (RVEA), which encrypt the sign bits of the DCT coefficient of the I-frame and the sign bits of the motion vector of the P-and B-frames. Because the sign bits of the DCT coefficients and motion vectors occupy only a small portion of the entire bitstream, the computational complexity was evaluated to be only 10%, compared to that of the NEA. However, these methods cannot guarantee full security because useful video information can be recovered by simply changing all the encrypted DC coefficients to 128 and all the encrypted AC coefficients to positive numbers.
Most SEAs designed for the latest video codec, i.e., high efficiency video coding (HEVC), encrypt the bitstream based on the syntax elements [13]- [21]. The syntax elements selected for encryption are encrypted during the compression process; hence, video encryption based on the syntax elements is closely coupled with the video compression algorithm.
Overall, the portion of the entire bitstream encrypted by SEAs is as small as possible and includes the I-frames, DCT coefficients of the I-frame, sign bits of the DCT coefficients of the I-frame, sign bits of the motion vectors of the P-and B-frames, and other syntax elements. This significantly improves the encryption speed compared with the NEA. However, the existing SEAs have critical drawbacks that should be resolved. The computational complexity is not significantly reduced compared to that of the NEA. The encrypted portion causes the size of the video bitstream to increase. Because the encrypted part is easily recoverable, the level of security becomes more vulnerable. Moreover, in most SEAs, it is difficult to separate encryption algorithms from compression algorithms.

III. PROPOSED METHOD
The proposed method can be classified as a SEA-based method. However, it differs from many existing SEAs because it can be operated on top of a video codec. This concept is illustrated in Fig. 1. As shown in Fig. 1(a), conventional SEAs were designed to depend on the video codec using the video encryption algorithms and video decryption algorithms implemented in the video encoder and decoder, respectively. Thus, it is difficult to separate the video security algorithm from the video encoder and video decoder. In this regard, conventional SEAs imbedded in video compression algorithms are less viable because, to integrate the video encryption algorithms, they require the standardized video compression algorithms to be modified. Generally, standardized video codecs are implemented according to the standard specification, which makes it difficult, if not impossible, to modify a standard codec to embed additional encryption algorithms. Therefore, the proposed method is designed to be independent of the video codec and video security systems. This was achieved by applying the video encryption algorithm after video encoding and the video decryption algorithm before decoding, as shown in Fig. 1(b).
Unlike traditional SEAs that encrypt certain syntax elements in the entire video bitstream, the proposed method encrypts an important part of the bitstream that determines the decoding process and is adjacent to the start code in the video bitstream. Because the start code can be searched after or during compression, the proposed method can be incorporated into a video codec, as shown in Fig. 1(a).
As shown in Fig. 2(a), each top-level unit of the bitstream of the general standard video codecs (i.e., H.264/AVC, HEVC, and IVC) contains the start code as a prefix, which is a strong separator between the top-level units. For example, as shown in Fig. 2(b), the video bitstream output by the HEVC encoder can be separated into top-level units by the start code, as shown in Fig. 2(a). The start code of the HEVC bitstream is three-byte (i.e., 0 × 000001) long, as shown in Fig. 2(b). The start code pattern in the HEVC is designed to exploit the high unlikeliness of the arithmetic encoders to generate the same pattern as the start code. Thus, a parser attempting to locate the start code in a bitstream can quickly split the bitstream into top-level units, without having to parse every syntax element of the bitstream. As shown in Fig. 2, in a bitstream compressed using the standard video codec, the header is usually the sequence following the start code. As shown in Fig. 2(b), the first byte of the network abstraction layer (NAL) unit header consists of forbidden_zero, nal_unit_type, and nuh_layer_id. Because the one-bit values of forbidden_zero and nuh_layer_id are zeros, the first byte of the NAL unit header is determined by the six bits of nal_unit_type.
The value of the first byte of the NAL unit header by nal_unit_type in the six-bit range from zero to 63, is listed in Table 1. In addition, as shown in Fig. 2(b), the NAL unit is composed of the header and raw byte sequence payload (RBSP). The RBSP is defined in Table 1 by thenal_unit_type of the header preceding the RBSP. Based on the first byte of the NAL unit header, the RBSP can either be a non-VCL NAL unit payload, such as VPS, SPS, and PPS, or a VCL NAL unit payload, such as TRAIL_N, TRAIL_R,  and TSA_N. Each RBSP in the non-VCL class contains basic video information such as the color sampling format, image width and height, and the initial quantization parameter. Each RBSP in the VCL class contains compressed video information by classifying coded pictures such as I-, P-, and B-frames into different coded slices according to the network layer. More information on the RBSP can be found in the HEVC standard [22]. Therefore, the first byte of the NAL unit header is first parsed during the decoding process. The remaining compressed video information is then extracted from the RBSP based on syntax matching with the NAL unit type. Failure to properly assign the first byte of the NAL unit VOLUME 8, 2020 header makes it very difficult, if not impossible, to decode the video bitstream because how to decode the subsequent RBSP would be unclear. Furthermore, the first byte of the NAL unit header occupies a very small portion of the entire HEVC bitstream. Therefore, the proposed encryption of the first byte of the NAL unit header can effectively improve the encryption speed.
The proposed method encrypts a video bitstream by scrambling the values of the first bytes of the NAL unit header instead of using standard encryption algorithms such as the AES and DES. Most existing SEAs encrypt only a selected part of the bitstream by applying a standard encryption algorithm. Generally, however, standard encryption algorithms introduce computational complexity in terms of the number of word units because they encrypt data using various processes that use these units. The proposed method increases the encryption speed because it does not use a standard encryption algorithm.
The first byte of the NAL unit header can be one of 25 possibilities, excluding ''Reserved'' and ''Unspecified,'' which are not yet used in the HEVC standard, as evident from Table 1. The bitstream may contain as many as 25 headers, although overlapping is not permitted. Therefore, in the worst-case scenario, the number of cases required to reconstruct the original bitstream would be 25^25. Assuming a linear search is used, the average number of cases would be (25^25)/2. In addition, after the video decoding process, it can be confirmed whether each case will be recoverable.
Most importantly, in contrast to conventional SEAs, the proposed encryption of the first byte of the NAL unit header by scrambling ensures that decoding is impossible. Therefore, the proposed method can provide security superior to that of the existing SEAs.
A flow chart of the proposed encryption and decryption processes is presented in Fig. 3. As shown in Fig. 3(a), the first step in the encryption process is to generate an encryption lookup table (ELUT) to encrypt and decrypt the first byte of the NAL unit header. For encryption and decryption, the rows of the first and second columns of the ELUT have different values, as shown in Fig. 3. The next step in the encryption process is to read a word corresponding to the start code size from the bitstream and verify whether the current word is the start code (i.e., 0 × 000001). If the current word is the start code, the byte succeeding it in the bitstream is searched in the first column of the ELUT, to be replaced by the value in the second column corresponding to the first column of the ELUT. For example, if the first byte of the NAL unit header is equal to 0 × 10 in the first column of the ELUT, as shown in Fig. 3, it is replaced with 0 × 4A in the second column. The encryption process is repeated according to the loop flow shown in Fig. 3(a) until the first bytes of all the NAL unit headers in the bitstream are encrypted.
In the decryption process, the ELUT generated during the encryption process is used to reconstruct the original bitstream from the encrypted bitstream, as shown in Fig. 3(b). In the encrypted bitstream, the byte following the start code is searched in the second column of the ELUT for replacement by the value in the first column corresponding to the second column of the ELUT. The decryption process is repeated according to the loop flow shown in Fig. 3(b) until all encrypted bytes following the start code in the encrypted bitstream are decrypted.
The computational complexity (T P-E ) of the proposed encryption algorithm (PEA) 1 is formulated as follows: where C A , C B , C C , and C D indicate the complexity values that are terminated at Processes A, B, C, and D, respectively, and returned to the beginning of the loop in Fig. 3. In addition, P A , P B , P C , and P D are the probabilities corresponding to C A , C B , C C , and C D , respectively. The sum of P A , P B , P C , and P D is 1.
When the operation applied in each process is expressed as T A , T B , and T C , (1) can be reformulated as follows: Because T A , T B , and T C are all comparison operations, T comp = T A = T B = T C . In addition, because P B , P C , and P D P A , (2) can be approximated as follows: The proposed decryption performs the same operation as the proposed encryption shown in Fig. 4; thus, the computational complexity (T P−D ) of the proposed decryption is formulated as follows: The computational complexity (T AES-E ) of the AES encryption and that (T AES-D ) of the AES decryption, which are commonly used in the NEA and SEA, as introduced in Section II, are formulated in [23] as follows: where N b is the block size and R is the number of rounds. T a , T o , and T S indicate the operations of the bytewise-AND, bytewise-OR, and bytewise shift, respectively. Generally, if N b is 4 and R is 10 when the key length is the shortest at 128 bits, (5) expresses the computational complexity for encrypting 16-byte data, which is formulated as follows: Equation (6) is the computational complexity of decrypting 16-byte data, which is formulated as follows: M. K. Lee, E. S. Jang: Start Code-Based Encryption and Decryption Framework for HEVC  The comparison of the number of operations for PEA 1, calculated using (3) and (4), and those for NEA, calculated using (7) and (8), are shown in Table 2. Here, N is the size of a byte unit of the bitstream to be encrypted and decrypted. Based on the experimental results, the average value of P A was 0.99. Because the NEA encrypts and decrypts the entire bitstream in units of 16 bytes using the AES, the number of operations for the NEA encryption can be obtained by multiplying (7) by N /16. Thus, because PEA 1 requires fewer operations for encryption and decryption, it is expected to be significantly faster than the NEA, as evident from the results in Table 2.
In addition, we propose PEA 2, which improves on the security of PEA 1. The encryption space is small because PEA 1 scrambles one byte after the start code. Therefore, to improve on the security afforded by PEA 1, PEA 2 increases the encryption space by encrypting the 16 bytes after the start code using the AES.

IV. EXPERIMENTAL RESULTS
The performances of the proposed methods were evaluated in two experiments: verification of the encryption and decryption results and a comparison of the processing speed of the NEA and PEAs. The results of the first experiment are shown VOLUME 8, 2020  [24] containing various NAL unit types compressed using various HEVC encoder options. In the HEVC decoder, the HEVC reference software HM-16.20 [25] and the VLC media player were used [26]. This media player is a free, open-source cross-platform multimedia player that can decode and play most multimedia files, including HEVC bitstreams.

V. VERIFICATION OF ENCRYPTION AND DECRYPTION RESULTS
The encryption was verified to determine whether decoding is possible when the bitstream encrypted by the proposed methods is decoded with the HEVC decoder, as shown in Fig. 4(a). The decoding of all encrypted bitstreams using HM-16.20 was terminated when a decoding error occurred. In this case, the size of the reconstructed video output was zero. For the case of the VLC media player, a pop-up indicating that the video could not be played appeared, and no frames were played when all encrypted bitstreams were decoded. The results showed that the bitstream encrypted using the existing SEAs (introduced in Section II) could be decoded, and the reconstructed video was encrypted with a distortion in the video quality. However, it was impossible to decode the bitstream encrypted using the proposed methods and no video information could be found. Therefore, the proposed methods were more secure than the existing SEAs.
The purpose of verifying the decryption was to confirm whether the video reconstructed on the basis of the decrypted bitstream completely corresponded with the video reconstructed from the original bitstream. First, all the decrypted bitstreams were decoded using HM-16.20, following which all the original bitstreams were decoded using this software.
The results indicated that the decoded frames and the size of the decrypted bitstreams were completely consistent with those of the original bitstreams. A binary comparison was used to confirm that the videos reconstructed from the decrypted bitstream and the original bitstream were exactly the same. The videos that were reconstructed from all decrypted bitstreams that were decoded using the VLC media player played normally without a distortion in the video quality. The verification results with all the conformance bitstreams clearly demonstrated the efficacy of the proposed methods.

A. COMPARING THE SPEED OF NEA AND PEAS
The encryption speed of the NEA and PEAs was compared by measuring the time taken to encrypt all original bitstreams using each algorithm, as illustrated in Figs. 4(b) and (c). The electronic code book (ECB) mode and the 128-bit key of AES were used to encrypt and decrypt the test bitstreams using the NEA. The ECB mode and the 128-bit key were the fastest options for the AES.
The results of the speed measurement experiment are listed in Tables 3 and 4. The experimental results for each file name are for 11 representatives of all the test bitstreams. As shown in Table 3, the times required for encryption and decryption by PEA 1 are approximately 3% and 2% of that required by the NEA, respectively, which is a difference of approximately 1%. This is to be expected, based on the number of operations in PEA 1 and the NEA, as described in Section III. In the case of PEA 1, the same number of operations is required for encryption and decryption. By contrast, in the case of the NEA, the number of operations required for decryption was more than the one required for encryption. Furthermore, as shown in Table 4, the times required for encryption and   decryption by PEA 2 are approximately 6% and 4% of those required by the NEA, respectively. Table 5 presents a comparative analysis of PEAs and the existing SEAs [7], [9], [12] described in Section II. Although the zig-zag permutation algorithm [9] is the fastest, it is impractical because it increases the size of the bitstream by approximately 50% and is not compliant with the standardized video codecs. The existing SEAs for the HEVC [13]- [21] described in Section II are not compliant with the standardized video codes, and thus they are not included in the comparison in Table 5. Non-compliance with the standardized video codec implies that the encryption algorithms modify the standardized video codec, which makes a standard decoder become not interoperable. Therefore, it is impractical to embed encryption algorithms by modifying standardized video decoding procedures. In this respect, PEA 1 was the fastest encryption method that was compliant with standard video codecs and did not decrease the compression efficiency. Although the encryption speed of PEA 2 was slightly less than that of PEA 1, it was more secure.

VI. CONCLUSION
In this article, we proposed a novel selective encryption and decryption framework based on the start code for HEVC. The proposed methods had several advantages compared to the existing SEAs. First, they were independent of the video codec because they encrypted the portion adjacent to the start code, which was generally included in the video bitstream. Furthermore, the video codec-independent encryption framework enabled compliance with the standard video codec, which was not supported by the existing SEAs that depend on a video codec. Second, the proposed methods disallowed access to any information from a video that could not be decoded. By contrast, existing SEAs partially enable information acquisition from the decoded video with a quality distortion. In addition, because they disallowed decoding, the proposed methods offered higher security than the existing SEAs. Finally, the proposed methods are highly competitive in terms of the processing speed when compared to selected encryption algorithms. They are also compliant with a standard video codec, with no decrease in the compression efficiency. In this regard, the encryption and decryption framework of the proposed methods based on the start code was practical and valuable.