Preserving Chain-of-Evidence in Surveillance Videos for Authentication and Trust-Enabled Sharing

Surveillance video recording is a powerful method of deterring unlawful activities. A robust data protection-by-design solution can be helpful in terms of making a captured video immutable, as such recordings cannot become a piece of evidence until proven to be unaltered. Similarly, video sharing from closed-circuit television video recording or in social media interaction requires self-authentication for responsible and reliable data sharing. This article presents a computationally inexpensive method of preserving a chain-of-evidence in surveillance videos by means of hashing and steganography. The method conforms to the data protection regulations, which are increasingly adopted by governments, and is applicable to network edge storage. Encryption keys are stored in a hardware wallet independently of the video capture device itself, while evidential information is stored steganographically within video frames themselves, independently of the content. Added protection is provided by hiding information within the two least-valued of pixel bitplanes, using a newly introduced technique that randomizes the pixel storage locations on a per video frame and video-capture device basis. Overall, the proposed method has turned out to not only preserve the integrity of stored video data but also results in minimal degradation of the video data resulting from steganography. Despite the inclusion of hidden information, video frames will still be available for common image-processing tasks such as tracking and classification, as their objective video quality is almost unchanged.


I. INTRODUCTION
It is no longer the case that only special places, such as international airports, are kept under surveillance for security and safety purposes. In fact, surveillance is rapidly becoming a requirement for almost every house, office, and public place. In those environments smart tracking can be applied [1], along with the classification of objects within the video. This development is further fuelled by the availability The associate editor coordinating the review of this manuscript and approving it for publication was Zhenhua Guo . of low-cost and small-sized surveillance cameras. These cameras potentially help law-enforcement agencies to utilize the resulting video recordings as proof of a crime or illicit activity. However, due to equally rapid advances in the field of image processing, surveillance video data can be easily tampered with. Examples of tampering include [2]: regional alteration of intra-frames, through cut-and-paste; and inter-frame forgery, by the insertion of video frames. Furthermore, transmission errors also contribute to the alteration of videos, if they are not suitably protected with forward error correction or other error protection methods. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Consequently, such recordings are not directly admissible in a court of law as an item of evidence until they are proven to be authentic through forensic analysis [3]. Unfortunately, applying forensic analysis is not a cheap operation in terms of time and expense. Thus, an important objective for a Closed Circuit TeleVision (CCTV) systems is that it should provide a chain-of-evidence, according to stated rules. In that way, the videos can prove themselves to be an authentic piece of information, avoiding or reducing the need for forensic analysis. In other words the videos themselves can self-authenticate. A driver towards self-authentication is the need for storage and analysis of huge amounts of multimedia data, which has propelled companies towards cloud storage for efficient access [4]. However, following on from the European Union's ratification of the General Data Protection Regulation (GDPR) in May 2018, cloud storage may be unacceptable, due to the potential for unauthorized access to video data by third parties. Other similar data privacy regulations to GDPR now exist in other countries. GDPR itself focuses on reversible data protection via encryption [5] as a data-protection safeguard [6], which encryption is also employed herein. For example, in [7], a chain-of-evidence is kept for forensics purposes but a blockchain is employed to store hashes of information gathered, which may add to the complexity of the method. On the other hand, this article presents a multifaceted solution: to the problem of self-authentication; to the requirements of data protection regulations; and in respect to non-cloud storage. The proposed solution works as follows: 1) Every frame holds the hash of its previous two frames and, therefore, a chain-of-evidence is created. 2) Each video frame is also protected by calculation and storage of its hash, after the insertion of the hash of step 1. 3) These two hashes are stored inside each video frame as hidden information. 4) Information is hidden using a modified version of the already efficient Least Significant Bit (LSB) steganography algorithm. This is achieved by unpredictable placement of the information, according to a technique introduced in this article. Information is stored in the LSBs of the three color component (RGB) values. 5) The technique introduced is capable of generating random positions for placement of information in each video frame. Furthermore, it will be shown that the generation of randomized locations for information concealment may also be performed in two additional ways. This can be firstly through identification information from the capturing device (e.g. the Medium Access Control (MAC) address of the camera) and secondly through the video frame information, as mentioned above. This ensures that the identity information of the capturing device also plays a role in the generation of the randomized path for information storage.
6) The path to the random positions of those pixels that carry the hidden information is stored in the second lowest bitplane (second least significant bits in each of the RGB values). 7) All video frames are encrypted and stored on the network edge, i.e. close to the video capture device itself, without being kept in cloud storage or transmitted over a wide-area network. 8) To further enhance the security, encryption keys are stored in a hardware wallet, separately from the storage device holding the actual video. The strength of the proposed method is that it hides evidential data inside the video frames themselves. That is achieved with a limited effect upon the image content, because the hidden data is negligible in quantity and exists only in the lowest two bits of pixels. Selecting a modified LSB steganography makes for video rate insertion of hidden information. This rapid insertion rate can be compared with (say) Bit-plane Complexity Segmentation (BPCS) [8], which needs to search for noisy blocks within bit-planes before hidden information can be embedded. Furthermore, the proposed method makes it easier to maintain, synchronize, and prevent loss of evidence. Such a chain-of-evidence is desirable not only in the case of surveillance videos but could also be useful in ascertaining the originality and authenticity of videos uploaded to social-media sites.
The rest of this article is organized as follows. Section II establishes the main components of the chain-of-evidence method. Then Section III describes and analyses that method in respect to use of steganography. However, the usage of encryption in the methodology is treated separately in Section III-E. Following on, Section IV is an evaluation of the proposed chain-of-evidence method, suitable for GDPR usage, while Section V summarizes the paper's contribution.

II. BACKGROUND
As is well known, steganography refers to the hiding of information (text, audio, image, video) in another carrier media (usually referred to as the 'cover'). There are four essential properties of a good steganographic system [9], viz. 1) imperceptibility, 2) security, 3) information hiding capacity, and 4) robustness. There have been several different approaches to achieve the goal of information hiding in the cover media. Thus, an effective approach works by manipulating LSBs of the three color channels (Red, Green, Blue (RGB)), in the light of the fact that the LSBs carry minimal information. This spatial approach ensures that the overall visual aesthetic of the image is not significantly altered. Initial research into LSB steganography concentrated on designing the system to increase the payload capacity by utilizing most of the cover-image pixels [10]- [18]. However, steganalysis techniques soon became strong enough to break such systems using statistical analysis, which was developed to identify the regular patterns by which data were stored inside the cover [19]- [21]. Therefore, there was a need for robust LSB techniques based on cryptography-steganography, which can evade such steganalysis attacks. The subsequent research into the Stego Color Cycle (SCC) approach [22], Magic LSB technique [16], hash-LSB [17] and nearest-centroid clustering (LSB-M) [18] are comparatively recent such additions to the field of steganography. In this article, as described in detail in Section III, after per video frame identification of a starting pixel or seed, random pixel selection for embedding the data within LSBs takes place within the neighborhood of the seed. Relatively few of a cover image's pixels are chosen for data hiding, avoiding another problem with early usage of LSB steganography.
The principal advantage of the LSB technique is the inherent simplicity of embedding and decoding process, making video rate operation possible. However, the decoding of such stego-images is affected by the levels of noise in the communication channel. Nonetheless, in the envisaged applications, surveillance video frames after capture are stored at the network edge, without transmission over a communication channel. Data transfer is by portable Universal Serial Bus (USB) drive (see Section III). Storage at the network edge is anyway advisable because of the risks arising from third-party intervention during cloud storage.
For encryption there exists a number of block-based, symmetric encryption techniques, such as the Blowfish, RC5, and Advanced Encryption Standard (AES) algorithms [23]. All of the latter algorithms avoid the need for a Public Key Infrastructure in the case of asymmetric cryptography. Even so, in terms of computational overhead, their calculations still involve considerable processing. This is especially so if real-time operation on resource-constrained edge devices is required. However, considering the sensitivity of the surveillance data, the AES algorithm was selected. As AES is a symmetric block cipher, so a single key is used for the encryption and decryption processes.
AES (also known as Rijndael) has been widely deployed as an encryption standard since 2000 [24]. Indeed, AES is considered a robust industry standard cipher and, hence, is extensively utilised to provide confidentiality in cyber-physical systems [25]. Overall, AES is widely employed because of its: ease of implementation; defences against threats and attacks; as well as flexibility in the cases of encryption/decryption modes and keying material. In terms of keys, AES is a symmetric-key block cipher, which supports either a 128-bit key for 10 rounds, a 192-bit key for 12 rounds, or a 256-bit key for 14 rounds of operation. Further information about the operational mode of AES utilized herein can be found in Section III.E.
AES acts upon a 4 × 4 byte matrix, which is called the algorithm's state. After initiation of the algorithm, every round comprises of four stages/phases: (1) Byte-substitution; (2) Shift Rows; (3) Mix Columns; and (4) Add Round Key. Stage 1 provides non-linear substitution by substituting each byte in the state with a byte from a lookup table or S-Box. In that way, the substitution part of a classic substitution-permutation cipher takes place. Permutation occurs in stage 2 through cyclic left-shifts of the state's rows.
In stage 3, the four bytes of each column of the state undergo a linear transformation. Finally, stage 4 derives a sub-key from the main-key. The sub-key has the same number of bytes as the state and, hence, can be added to it. Because addition is defined as an Exclusive-OR (XOR) operation in stages 3 and 4, again a non-linearity is introduced into the processing. Likewise, modulo arithmetic in stage four provides multiplication, again introducing further non-linearity. Given that substitution provides confusion and stages 2 and 3 supply diffusion, the whole algorithm meets the need to contain confusion and diffusion in any acceptable cipher.
Turning to the hashing functions used in this article to detect tampering of video files, these functions work by extracting a unique fixed-length bit-string from a given message (text, image, video). However, by 2015 one of the principal such functions, Secure Hash Algorithm 1 (SHA-1), was found by the software industry to be suspect because of likely collisions, i.e. the same hash produced by different messages. Therefore, in this article, the SHA-256 hashing algorithm [26] has been used, as, differing in design from SHA-1, it is currently thought to be collision free. SHA-256, operating with 32-byte words is one of the two members of the SHA-2 family of hash functions, the other being SHA-512 with 512 byte words. There is another later family of hash functions known as SHA-3. However, industry opinion is that for software implementations SHA-3 is slower than the SHA-2 family. To avoid that risk on constrained CCTV devices processing video rate data, this article's implementation retained SHA-256 from the SHA-2 family of hashes. However, given the ongoing development of attacks, it is wise to keep the situation under review, with a view to possible eventual implementation of a SHA-3 hash function. In general, altering even a single bit of the hash input changes the output hash completely. For the purposes of the forensic use of video surveillance, no image processing is permitted, which is why this article retains the use of SHA-256. However, for the video sharing applications mentioned in Section I, such multimedia data may go through various manipulations such as cropping, scaling, enhancement, and compression. In that case, possible incorporation of one of the robust image hashes such as [27] is possible.

III. ANALYSIS OF THE METHOD
The following, after providing a system overview, describes and motivates the various aspects of the method, before giving a detailed step-by-step analysis of the method employed.

A. OVERVIEW
As illustrated in the overview of Fig. 1, a chain-of-evidence is stored along with the originally captured visual data so that the authenticity of the video content can be established afterwards. Firstly, individual frames are extracted from the captured surveillance video. Next, salted hashes are embedded through steganography in each video frame (see Section III-C). A hash of each video frame is also taken and embedded through steganography. These latter hashes are needed as a way to verify that the video frames have not been tampered with. Following encryption with the AES [23], the video data are stored in close proximity to the video capture device at the network edge, without transmission over a network. Symmetric encryption keys are stored in a hardware wallet, which may be accessed by a combination of a smart card and a biometric identifier. A sequence of video frames is first retrieved from storage. After decryption, each frame's image data and its embedded salted hashes are checked by comparison with the additional embedded hash, after it too has been extracted. A detailed description of this procedure is given in Section III-F.
Notice that hardware wallets are thought to be [28] the most secure option for storing the public-private, asymmetric keys needed in access of cryptocurrencies, though, herein, symmetric keys are employed. The process is reversed when a video frame is checked for tampering before transfer in encrypted form, as required by GDPR.

B. STORAGE OF HASHES
The Secure Hashing Algorithm (SHA)-256 has been used in this article, as, differing in design from the deprecated SHA-1 (since 2015), it is currently thought to be collision free (as already mentioned in Section II). In order to create a chain-of-evidence by means of hashes, a salted hash is first created by concatenating half of the bytes of the hashes of each of the previous two video frames, as described by equation (1). Therefore, each frame will store a salted hash of data derived from the previous two frames (256 bits in all). Other optional data such as the camera identity, with the date and time of video capture, and the Global Positioning System (GPS) location, could also be included in the salted hash. However, there is a trade-off according to the amount of optional data included because, currently, relatively few of a cover image's pixels are chosen for data hiding, avoiding the excessive hidden bits of early usage of LSB steganography [10].
Subsequently, the frame's own hash (256 bits), after the insertion of the evidential data, is calculated and also inserted. The essential size of the data to be stored (without optional data) becomes 512 bits per frame. These data provide authentication or guarantee of the integrity of the stored video frames, to guard against the possibility of tampering.
In fact, various ways exist to store such data, including: a metadata stream [29]; using a frame's subtitles (as available in current surveillance systems) [30]; or through steganography. However, a metadata stream is not used in this article's method because storing data in a separate channel may lead to loss of evidence, misalignment in the retrieval module, and increasing management costs. Similarly, saving data in subtitles directly threatens the secrecy of data and, more importantly, disrupts the actual image content, which may be important to data controllers when performing a query-based search. Therefore steganography is selected as a means of storage, as it avoids the delays of side-channels and the insecurity of subtitle storage.

C. MODIFIED LSB STEGANOGRAPHY
LSB steganography has the advantage that its impact on the measured PSNR is reduced compared to some other forms of steganography [31]. However, even though the cover image quality is relatively improved, if an attacker suspects that an image holds steganographic information then extracting data hidden by LSB-based steganographic is relatively easy. This is because methods of statistical analysis have been developed to identify the regular patterns by which data used to be stored inside a cover [19].
To counter such attacks, herein we introduce random pixel selection starting from a seed pixel's neighbourhood, storing the path to the random pixel locations as steganographic data. Further, the seed pixel is also now selected using a procedure dependent on the current video frame and (optionally) the video capture device identity. The selection of the seed pixel, the generation of random locations for information storage, and the actual storage of the information are explained next.
Seed Pixel: The seed pixel refers to a randomly-identified starting pixel. The seed pixel is the start of the random path containing the pixels identified for data storage. The process of seed selection, and other pertinent issues are discussed in Section-III-D. Notice that the red pixel in Fig 2(a) is an example of a randomly-chosen seed pixel.
Path Selection: Any pixel under consideration (except the border pixels) has eight 'neighboring' pixels. For path selection, three bits are needed to generate eight different codes for eight neighbourhoods of a pixel, as shown in Fig. 2(b). These three bits are stored in the second-from-last LSB in each of the three color channels for the chosen pixel, while the last LSB of each color channel represents the data bit, as shown in Fig. 2(c). In this way, each selected pixel will store 3 bits of the data, together with the 3-bit code of the path to the next storage pixel. Notice that the distance between two randomly-selected pixels is set in this article's experiments at five, though this value can also be customized. Thus, the codes shown in Fig. 2(b) only denote the directions of the pixels to be selected (not the gap between them). Only denoting the directions helps to ensure that no two pixels have overlapping neighbourhoods. It also avoids any path reversals. A sample random path generated from a dummy random sequence of codes is also presented in Fig. 2(a).
Storage: For storage, two bits per color channel of a pixel are used for data (1 bit) and path (1 bit) storage. This allows for 6 bits of a given pixel encoded in the standard RGB format. Thus, the LSB technique is modified to possibly change the value of pixels in the two lowest bit-planes rather than just the lowest bit-plane. By comparison, BPCS steganography also selects bits from differing bit-planes, though it is sensible to avoid the higher bit planes [8] because of their greater correlations between neighboring bits. Using this article's method, 171 pixels will be required to store 512 bits of data. These consist of 256 bits of salted hash as evidential data and 256 bits of a frame's own hash. Also stored is the path to be followed for the storage and retrieval of hashes.

D. SEED IDENTIFICATION
The process of seed selection to identify the starting point of the random path is illustrated in Fig. 2(d). A rectangle is selected from each video frame by setting an offset value from the edge of the image, as shown in Fig. 2(d). In Section IV the offset is set to 50 pixels for all video frames for testing purposes. However, this rectangle size can be varied on a per capture-device basis and varied for each video frame, bearing in mind that the MAC address of each device is one way to start with a unique number from which the rectangle's sizes can be generated.
Subsequently, the corner pixels of the image rectangle inside this border region are then identified. The first two bits of the each of the RGB components of each of these corner pixels are concatenated and converted to a denary or ''decimal'' value. Thereafter, the four decimal values obtained are averaged to get a single number (rounded to the nearest integer), which is used to point to the seed pixel in the image. Because the value of the corner pixels will vary from frame to frame, this procedure will normally result in different seed locations within the rectangle on a per video frame basis. As shown in Fig. 2(d), this average decimal value is referred to as v 1 pixels, and the seed pixel is identified in this article as v 1 pixels horizontally to the right of the center of the image (the reference point), and then v 1 pixels vertically after that. Notice that, because the value of the corner pixels contains the location of the storage starting point or seed pixel, these pixels should not be altered in the steganographic process.
As an example, consider the corner pixel values from Fig. 2(d). The concatenated binary numbers obtained from the first 2 bits of the RGB components of each pixel are: VOLUME 8, 2020 Binary: 100000, 000000, 010101, 000011 Equivalent decimal: 32, 0, 21, 3 Average (rounded to an integer): 14, which could be used for v1 or v2 (if average is large) in Fig. 2(d).
This method of seed location, however, is prone to a pitfall, in that the average value calculated could turn out to be too large for a given rectangle, causing the seed pixel to lie outside the rectangle. To circumvent this issue, the reference point may be shifted by (I h /4, I w /4), as also shown in Fig. 2(d). Then the seed pixel is located relative to this new reference, when using larger calculated value, v2. Here, (I h , I w ) are the image height and width values respectively in units of pixels. Should this procedure still result in a seed location outside the rectangle then the reference point can be further adjusted in a similar fashion. The procedure is illustrated by the blue text in Fig. 2(d). Notice that the maximum value of the calculated value for v1 (or v2) is 64 pixels, which implies that this hazard will only occur for particularly small rectangles. In fact, setting horizontal and vertical minima to the rectangle's size is another way to avoid the need for recalculation of the reference point.
A random path is started from a seed pixel's location. That path is constructed in such a way that if the path reaches the edge of the image region under consideration, the path is continued from the opposite edge of the image (as in the classic PacMan video game). If the random path touches any corner pixel then it should be modified to avoid changing that pixel's value. This is because corner pixels are used to retrieve the seed pixel's location. Fig. 3 shows the pixels in which data are stored, marked in blue. The two video clips illustrated are (upper) VISOR1, available from [32], resolution 640 × 480 pixels/frame, surveillance video captured at 7 fps, and Duval_street_cam, available from [33], street scene, resolution 946 × 360, captured at 25 fps. It can be seen that every frame has a different distribution of pixels to store this data. Furthermore, the limited data are stored uniformly in RGB color channels and use only two bits per channel for each pixel selected. Notice also that the proposed approach is specially suitable for an application to surveillance video. This is because no image processing is allowed on the data captured and all modifications of the video data need to be identifiable. Both requirements serve to ascertain the authenticity of the video.

E. ENCRYPTION PROCEDURE
AES has multiple operational modes available [34]. The mode used herein is Output Feedback (OFB) mode. One reason for utilizing OFB mode is because the same program code serves for both encryption and decryption, thus saving on the coding space within an embedded device. Another reason for choosing OFB mode is that, consequently, AES then operates as a stream cipher (rather than a block cipher), in which a continuous stream of bytes are encrypted. Any modifications to a plaintext block P t are reflected in the corresponding ciphered block C t , where t = 1, 2, 3, . . . n with n the number of plaintext blocks. However, other ciphered blocks remain unaffected. Thus, a significant reason for choosing OFB mode is that it provides a degree of transmission error resilience. Moreover, if any modification/error occurs during the transmission, that error is not propagated. (Also see the remark on OFB's avoidance of padding at the start of Section IV.) However, OFB lacks self-synchronization, being independent of the previous ciphertext. Therefore, if synchronization is lost during transmission then a new IV needs to be established to enforce explicit re-synchronization.
In OFB, suppose X t−1 is an input block from the t −1 stage, which has been AES encrypted, using key K e . Then X t−1 is again AES encrypted using key K e to produce X t . After that X t and the next plaintext block P t are XORed together to output encrypted block C t . For encryption of the following plaintext block, AES encryption with K e is again performed on the X t of the previous stage to produce X t+1 , after which XORing is again performed with the plaintext P t+1 to output C t+1 and so on. Moreover, OFB normally generates different output C t for the identical input P t because the process is initiated with a random Initialization Vector (IV) [35]. The following equations (2) and (3) represent the encryption and decryption processes in OFB mode, respectively.
where t = 1, 2, 3, . . . n, for n stages of block encryption, and X t = Encrypt(K e (X t−1 )) Symmetric encryption keys are generated at run-time for each protected video, by using a pseudo-random function (PRF). To keep the procedure simple, encryption keys are generated on a per video sequence basis. They are then stored separately in a Hardware Wallet as shown in Fig. 1. Key security can be enhanced by using any standardized key-management scheme [36]. This would be the case for real-time key distribution during transmission of CCTV videos over a network. However, in this article, we only assume the secure physical transmission of data via a Universal Serial Bus (USB) connection directly to and solely to an authorized party. Additionally, an 128-bit key is secure enough because, for current computing powers, a key space greater than 2 100 is considered resilient to key guessing or brute force attacks upon keys [37]. It should be mentioned that the person responsible for key management in any well-managed organization should not also be the person with the authority to alter data in any way. This is an application of the well-established security principle of the separation of roles.

F. DETAILED DESCRIPTION
A detailed description of the steps in the method is provided next, with video frames indexed by i: 1) Generate the current storage rectangle's 'offset' value (see Fig 2(d)) by means of a unique identifier of the video capture device. 2) Create a 256-bit salted hash (S # = F i−1 # + F i−2 #) from the hashes of the previous two frames. This is a protection against video-frame insertion forgery.  . Random distribution of pixels selected to store a chain-of-evidence using LSB steganography in Raspberry Pi board captured videos [38], [39]. The blue markings of the location paths are only for illustrative purposes.
3) Generate the seed pixel's position using the four corner pixels of the current frame's rectangle. 4) Generate a random sequence of 170 directions starting from the seed pixel. The path must avoid the four corner pixels used in the calculation of the seed position and may wraparound if the rectangle's edge is reached. 5) Embed the evidential data S # in 85 pixels according to the random sequence of directions in the current video frame, F i . The directions to the next pixel in the path are also stored in each pixel of the current video frame. 6) Embed zeros in a further 86 pixels' data bits to act as a container for a hash of the current frame. The directions to each pixel in the path are also stored in these further pixels, with a zero direction to terminate the path. 7) Calculate the hash of the current frame, C # .

8) Insert C # in F i 's reserved bits (from Step-6) by
replacing the zeros previously set. 9) Check whether a video session has finished and if not return to step one. 10) Encrypt a session of video frames by means of AES and storage of the symmetric key (K p ) in a hardware wallet, along with the corresponding session id. As mentioned above in step five, the evidential data, i.e. the salted hash created from the previous two frames, are stegnographically embedded in each frame. It is convenient to use the two previous frames after the additional hash of each frame, C # , has been inserted. Otherwise, it would be necessary to remove the additional hashes from the two previous frames before creation of the salted hash for the current frame. It is also important to mention that, for the first two frames, the salted hash is created by generating two AES keys using a key management module. However, these keys do not need to be stored or sent to the receiver because they are only employed to generate a salted hash. That salted hash is stored in the initial two video frames for their own hash calculation, i.e. creation of each of their C # .
The retrieval procedure works as follows: 1) Retrieve the symmetric key (K p ) using its corresponding session id. from the hardware wallet for the video session. Then decrypt the video frames. 2) For each video frame, re-generate the current storage rectangle's offset value (see Fig. 2(d)) by means of a unique identifier of the video capture device. 3) Re-generate the seed pixel's position using the four corner pixels of the current frame's rectangle. 4) Retrieve the current frame's hash C # , changing its storage bits back to zero. 5) Recalculate the hash of the current frame, now containing the evidential data (in the form of hash S # ) and the zeroized bits reset in the previous step. Call this new hash C # .
6) Attempt a match between the C # with the corresponding hash, C # , retrieved from steganographic storage in the current frame. If there is a match, the video frame is proven, with high certitude, according to the strength of the hash, to be authentic. In the case of a mismatch, the video data have been compromised and cannot be validated. If a frame cannot be validated leave the procedure, as a chain-of-evidence cannot be established. 7) Check whether the video session's frames have all been validated and if not return to step two.

IV. RESULTS
The proposed security system has been developed for resource-constrained devices and, therefore, has been tested on a Raspberry Pi-4 board with 4 GB RAM and a 64-GB Secure Digital (SD) memory-card for data storage. The Pi was loaded with Python and OpenCV modules to process the videos, along with the Crypto cipher module for AES encryption. The AES block size was 64-bits, with AES operating as a stream cipher in OFB mode. The OFB mode avoids the overhead of padding and, if USB storage errors occur, these are not necessarily propagated, allowing partial stream recovery. (See also earlier comments on OFB mode in Section III-E.) The key size was 128 bits rather than 256 bits, in order to reduce the computational overhead. Pi camera videos were downloaded from a project's source pages [39], which were captured with object detection in mind. The videos were recorded in an indoor multi-space environment, using six Raspberry Pi 3 model B1s with the camera module v2.12 [38]. For performance evaluation, processing time and video quality metrics were determined for six videos mentioned above and compared with related methods as shown in Table 1. Previously proposed techniques demonstrated their steganography results on cover images and therefore, cannot be directly compared with videos. However, we have averaged the quality measures over 100 frames to be compared with these methods. Magic LSB uses a more complex method of converting an RGB image into an equivalent Hue, Saturation, Intensity (HSI) color space and then rotating the I component at four different angles. Each rotated I component of the image stores one block of data (divided into four blocks) and then an RGB image is reconstructed from the HSI image. Although complex and difficult to break, the scheme is most likely to be too costly for constrained devices. In fact, that computational overhead motivated the development of the proposed technique to modify LSB steganography. Similarly, LSB-M [18] first identifies the intensity clusters in the image and then hides data inside those clusters, again a time-complex process, which is not suitable for video steganography. Other techniques including a DWT-Haar wavelet-based method [13], hash-LSB [17] and secret key LSB [14] suffer the same problems of not being able to run fast enough to match the video speed.

A. TIMINGS
The timings for the processing of a video stream across the four main security steps has been measured. The chain-of-evidence and steganographic processing consumes relatively less time compared to the encryption/decryption processes, i.e. 0.01 seconds and 0.02 seconds respectively (for 100 frames) using the Raspberry Pi-4 board as shown in Fig. 5. Overall, video processing on the Raspberry Pi-4, including steganography and encryption, with the previously given pixel resolution, supported a frame rate of 25 fps, which is acceptable for surveillance videos.

B. STEGANOGRAPHIC CONDITIONS USING VIDEO QUALITY METRICS
The effect on the quality of the image of storing of the path and data information in the original image is typically represented by the Peak Signal to Noise Ratio (PSNR) [40]. Toward that end, first the Mean Square Error (MSE) is calculated, which, for a monochrome image, is given by (4). For RGB images, such as those under consideration in this work, the expression for MSE changes to (5).
Using the MSE RGB from (5), the PSNR (in dB) is defined as: which can be simplified to: PSNR = 20 · log 10 (Max i ) − 10 · log 10 (MSE RGB ) (6) where, Max i is the maximum possible pixel value of the image.
Similarly, the Structural Similarity (SSIM) index [41] is a metric that estimates the structural similarity between original and reconstructed video frames, having a range generally from 0 to 1 (eq. 7). Values of SSIM nearer to 0 means less structural similarity between the plaintext and the reconstructed encrypted video frame, which means higher distortion has occurred. Values nearer to 1 means more structural similarity. The SSIM formula is based on three comparison measurements between the samples of i and j (two compared image windows n*n): luminance (l), contrast (c) and structure (s). SSIM is then computed as weighted combination of these three comparative measures. Where α, β, γ are set to 1. These video quality metrics have been used to evaluate the performance of proposed algorithm for standard steganographic conditions given below. A reliable technique for steganography needs to fulfil the following conditions [42]: • Invisibility: The data concealed in randomly-selected pixels of each frame is not visible to the human eye because it is stored in the two lower bit planes of only 171 pixels and a human eye cannot normally identify such abnormalities at Pi-camera resolution. Thus, the Structural Similarity (SSIM) index (output range 0 to 1) is a well-established metric, which measures the response of the human visual system to changes between an original and modified image. For a surveillance video captured by us, the SSIM value is very high for the proposed method, as can be seen in Fig. 6b, implying that the original video and the video with data inserted are very similar. The average SSIM index was 1.00 using the proposed method, confirming that the hidden evidential data are visually unidentifiable.
• Payload capacity: The payload of the proposed method is 2 bits per color component per pixel and the data stored only uses 171 pixels of the whole image, which is particularly small. Although the payload of Magic LSB and LSB-M are also small, with the use of only one bit per pixel. However, due to their complexity and processing requirements, they are unsuitable for resource-constrained devices.
• Robustness against statistical attacks: Common methods used to detect steganography are through taking the SSIM index, the PSNR, and the Mean Squared Error (MSE). For these metrics, with the proposed method, FIGURE 6. Video quality metrics for the Pi videos. These videos belongs to publicly-available datasets for object detection at [38]. The box and whisker plots show average measures for 200 frames per video.
the modified video is always more similar to the original video in the comparisons.
• Non-suspicious files: The file formats and file sizes remain the same during storage, whether there is embedded information or not.
• Steganographic quality: As previously mentioned, the presence of steganography is commonly detected by finding changes in video quality, which is demonstrated in Fig. 6c and 6a. The average PSNR after steganography is around 88.0 dB, indicating high similarity (maximum PSNR = 96.33 dB for 16 bit depth image ). This high value is attributed to the small amount of data embedded in the image (1026 bits), also resulting in an average MSE of 0.002, implying that it will be very difficult to detect the presence of steganography by such video metrics.

C. COMPARATIVE ANALYSIS
The proposed method has been compared with existing schemes published in the last decade or so for the above-mentioned performance metrics and other prominent features as shown in Table 1. Almost all methods have been developed and tested for images and, therefore, cannot be compared directly with our method. However, we took average PSNR, SSIM and MSE of 200 video frames to compare with these methods. The results indicate that our method has significantly better performance as compared to others, with the highest PSNR and the lowest MSE value. Furthermore, the SSIM index of near to 1.00 also shows that the hidden data made no visual change in the image that can be identified by a naked eye. Lastly, the proposed solution, including features from [5], has also been compared with some well-known products in the marketplace, when it can be seen that the solution is competitive with these products, as is evident from Table 2.

V. CONCLUSION
Provision of a chain-of-evidence in surveillance videos is not only a computer security application but a legal requirement for current surveillance systems, due to the ratification of the EU's GDPR and privacy laws in other countries. The proposed method provides a comprehensive solution by storing such evidence steganographically embedded alongside the video content, with overall encryption. Though relatively simple, the method is not only a GDPR protection-by-design aimed at surveillance video but is also capable of being implemented on resource-constrained devices such as the Raspberry-Pi processor and associated boards. Moreover, the video data can be marked with unique identification at source, using salted hashes that can later serve to verify the originality of the shared video content, when they are exchanged within social networks. It needs to be pointed out that the integrity of the physical recording device (camera) is assumed to be guaranteed in the current paper. In fact, making a recording device physically tamper-proof is beyond the scope of this article, though we pinpoint this as an issue for future research. Future work will also consider the best mechanism to generate each rectangle's location from the video capture device's unique identifier.