Multi-Level Reversible Data Anonymization via Compressive Sensing and Data Hiding

Recent advances in intelligent surveillance systems have enabled a new era of smart monitoring in a wide range of applications from health monitoring to homeland security. However, this boom in data gathering, analyzing and sharing brings in also significant privacy concerns. We propose a Compressive Sensing (CS) based data encryption that is capable of both obfuscating selected sensitive parts of documents and compressively sampling, hence encrypting both sensitive and non-sensitive parts of the document. The scheme uses a data hiding technique on CS-encrypted signal to preserve the one-time use obfuscation matrix. The proposed privacy-preserving approach offers a low-cost multi-tier encryption system that provides different levels of reconstruction quality for different classes of users, e.g., semi-authorized, full-authorized. As a case study, we develop a secure video surveillance system and analyze its performance.


I. INTRODUCTION
M ANY emergent smart surveillance applications (i.e., buildings, infrastructure, stores, ambient-assisted living, public areas) necessitate time-continuous data gathering and processing.Upcoming 5G and IoT technologies will enable continuous data collection and processing for persistent monitoring [1].For example, an intelligent building system equipped with monitoring sensors such as CO2 meters, thermometers, cameras or other types of IoT devices will be instrumental in effectively automating tasks of heating, ventilation, and conditioning (HVAC) systems, or in improving the fault and hazard detection performance [2].Another case in point is an intelligent network of cameras for continuous site surveillance or a health monitoring system [3], which gathers users' bio-signals along with video/speech data to be processed remotely.These applications and their variants that collect data via sensors or edge devices bear the concern of the privacy of people and possibly of sites.In fact, the European General Data Protection Regulation (GDPR) legislation [4] has specifically addressed these privacy concerns in data collection and processing.
Currently, there exist a plethora of privacy-preserving technologies that vary in the data type and in the application scenario.Even the definition of privacy is up to change for different application areas and use cases, depending on whether it is signal processing, a database system, secure communication, etc. [5].Under privacy concern, documents are considered to consist of private, i.e., sensitive parts, those parts that could potentially expose compromising information to unauthorized users, and of public, i.e., non-sensitive parts.Privacy-preserving data processing then aims to encrypt the private parts of a document without deteriorating its public parts.Recent comprehensive surveys provide useful guidelines in privacy-preserving data mining [6], [7], signal processing [8], [9], and privacy metrics [5].
In principle, a naive application of strong cryptography methods such as AES [10] or RSA [11] would provide a high degree of security, in addition to privacy.However first, these encryption methods are relatively costly; but more importantly, it is neither useful nor necessary to encrypt the whole signal in real-time multimedia applications such as in video [12], image, health [13] monitoring systems or other types of IoT applications.Only the selected parts of the multimedia document deemed to carry private information need to be protected; this then gives rise to a two-tier approach.More generally in a multi-tiered approach, different parts of the document can be privacy protected at differential levels, the most strongly protected parts accessible by the highest authorization level, and so forth.We can also state the three desiderata of privacy-protection algorithms: a) The technique should be able to secure the privacy of selected sensitive portions of the data; e.g., for face hiding, it should be stronger than any automatic face recognition algorithm; b) The method should not degrade the non-sensitive parts of the documents; c) It should be able to reverse the sensitive part encryption (for authorized users) in good quality.A concomitant desideratum is that the computation cost of the data encryption should be reasonably low.
Although Compressive Sensing (CS) [14] is an alternative data acquisition strategy to conventional Nyquist/Shannon based technique, it also provides encryption with a reasonable security level via its randomized sensing mechanism.In consequence, using CS setup alone or with another lightweight encryption shell applied on top of it has recently been a popular approach for multimedia applications [15].
In this work, we pursue the approach of compressive sensing to accomplish both compression and cryptographic security on the whole data, and data hiding technology [16], [17] to hide and then recover the masked-out private parts of the document.The novel method achieves privacy protection by obfuscating the sensitive parts of the document while the CS-encryption is applied to the whole document, i.e., the combined public and private parts.We assume that the document has been pre-processed and segmented into its sensitive and non-sensitive parts.We use terms de-identification and anonymization interchangeably, in the sense of rendering unintelligible the privacy bearing segments of a document.Although our method is applicable to any document type, images, video, audio, etc., with appropriate modifications, in the sequel we will consider images as an application case.
Our scheme provides a two-tiered privacy, in which the semi-authorized user, i.e., the entity with lower authorization level can decode and view only the non-sensitive parts of the image, while the fully-authorized user decodes and sees the entire image.The semi-authorized one with only key A (CS-Encryption matrix) is able to recover images whose sensitive parts remain obfuscated after decoding whereas the fully authorized person with keys A and B (the latter being watermark embedding matrix) is able to recover the whole image.In both cases, the image quality is stipulated to remain close to the original quality.The significant merits of our proposed method are first to enable a low-cost, two-level encryption and second to provide reversible anonymization for the selected authorized users.Although the experiments are run only on image data, our method is general enough to be applied to any data involving privacy concerns, such as to videos as detailed in Section VI, or to bio-signals.In this work, we select face de-identification problem [18], [19] as a case study, within the context of a privacy-preserving image/video monitoring system.
The privacy protection concern in image/video has been addressed in a plethora of papers in the last decades.In summary, the technical solutions can be discussed in three groups: a) automatic blurring of faces, context-dependent blurring, e.g., bystanders only; b) blacking out of faces with random patterns, and recently; c) anonymous face substitutions or iterative regeneration schemes.Our method is in line with the noise pattern overlay methods in the literature.However, we differ from these methods in two respects: i) while we are able to fully remove the obfuscating noise pattern, we provide multitier differential protection; ii) we use compressive sensing for data reduction and cryptographic security, and watermark the compressed signal with the data hiding pattern).
A privacy-preserving method to which our method has some resemblance was recently described in [20].In the method of [20], the images are first processed through a parallel group of trained auto-encoders, each generating its own sufficiently diversified sparse code.They obfuscate the sparse code by adding random noise with statistics similar to sparse code statistics to coefficients to a group of coefficients outside the sparse code support set.The support set is predefined or shared via a secret channel to the trusted user.Only the trusted user possesses the key to recover the support set of the sparse code coefficients, and thus is able to decode the sensitive image (the face).Codes from multiple auto-encoders are used to successively refine the results, i.e., incrementally improve reconstructed image quality.In contrast, our method is not face specific, does not need to find sparse codes in the encoding part, does hence not require a separate secret channel to share the obfuscation key.In addition, data reduction via CS-compression is a byproduct of our scheme.
A preliminary version of this work was presented at [21].This early version had briefly introduced the methodology and presented some test results on a token dataset (6 faces in a controlled laboratory environment).In this article, we provide a theoretical worst-case analysis on the watermark guarantee conditions (Lemma 1, Theorem 4).We have extended the paper by incorporating a discussion on the design of alternative obfuscating matrices [21] as well as on the alternative designs of the watermark embedding matrix (see Section IV-E).Simulation experiments are run on a realistic public dataset with a much bigger size (a subset of YouTube Faces Database [22]) containing 100 classes (videos of 100 identities).We have also briefly described two extensions of the proposed method: 1.Its adaptation to video signals, beyond the simple frame-byframe privacy processing; 2. A three-tiered privacy protection in images.In the detailed performance evaluation, we illustrate the reconstruction accuracy of masked regions as a function of watermark embedding power and the choice of obfuscating masks, both being user-defined parameters.Recognition accuracies with original faces, with de-identified faces, and with faces reverse de-identified via recovered watermark are given.The result of a test against an adversary with a strong computational capability and with access to the full labeled training set is also reported.
The rest of the paper is organized as follows.The notation is provided in Section II.We give a brief overview of compressive sensing and its usage in encryption systems in Section III.We emphasize the compressive sensing properties that we have exploited in our proposed scheme.In Section IV, the proposed two-tier privacy-preserving system is presented in detail.Section V introduces a case study of the proposed method in video monitoring and gives the results of the extensive simulation studies.Finally, conclusions are drawn in Section VII.

II. NOTATIONS
In this work, the p norm of a vector x ∈ R N is given as for p ≥ 1.We also define the 0norm of the vector x ∈ R N as x 0 = lim p→0 N i=1 |x i | p = #{l : x l = 0}.The exactly (or strictly) k-sparse signal in some appropriate domain is the signal, x ∈ R N with x 0 ≤ k.On the other hand, the approximately k-sparse signal (or compressible) is a signal x with x − x 2 ≤ κ, where κ is a small constant and x is obtained via zero-outing the elements of x except the ones with k-largest magnitude.For convenience, we show in Table I the list of frequently used symbols, the terminology used in paper and their synonymous definitions in the cryptography literature.

III. PRELIMINARIES AND PRIOR ART
Our interest in compressive sensing is twofold: to compress the signal if it is already sampled or to sample analog signals directly at rates below the Nyquist-Shannon bound and to exploit the inherent cryptographic capability of compressive sensing.

A. Compressive Sensing
Compressive sensing (CS) theory has significantly impacted the field of signal processing since its inception in 2005 [14].According to the CS theory, a signal can be sampled using far fewer measurements than the traditional Nyquist-Shannon acquisition rate, provided it is sparse or compressible in some proper domain.CS-based MRI imaging [23], radar monitoring systems [24], [25], and ECG measurements in a health monitoring system [13] are some of its success stories.It is also seen as a potential solution for hardware/software design in the applications requiring very high sampling frequencies such as wideband spectrum sensing [26] and ultra-wideband communication schemes [27].In fact, CS is expected to play an important role in the next-generation communications systems such as 5G [28].
Let us consider the linear mapping of a discrete signal s ∈ R N as y = As, where A ∈ R m×N is known as the measurement matrix with m < N .The minimum-energy solution for the underdetermined linear system of equations ( 1) is given by The solution of ( 2) is unique and has a closed form solution, ŝ = A T AA T −1 y provided that rank (A) = m ≤ N which makes AA T invertible.The minimum achievable reconstruction error is s − ŝ 2 = s T I − A T AA T −1 A s, which shows that exact recovery is not possible since I = A T AA T −1 A when m < N .The CS theory addresses signals that are sparse in a proper domain, Φ ∈ R N ×N , i.e., s = Φx with x 0 ≤ k.Therefore, (1) can be re-formulated as follows, where H = AΦ, and even if (3) has infinitely many solution we can look for the sparsest one, Eq. ( 4) is also known as sparse representation of y in H and it is unique, provided that the minimum number of linearly independent columns of H, as defined in [29], is greater than 2k.Thus for spark(H) ≥ 2k, any two distinct k-sparse signals x , x can be uniquely recovered from their undersampled measurements y , y if m ≥ 2k.Put differently, one has the surprising result that, while it is not possible to recover s exactly using minimum norm decoder as in (2), exact recovery of the signal is possible in the sparsifying domain.
The nonconvex problem (4) with 0 -quasi-norm can be relaxed to its closest convex form, 1 as where Υ (y) = {x : Hx = y}, an optimization problem that is also known as Basis Pursuit [30].The equivalence of 0 -1 minimization problems is well investigated in the literature in terms of the properties of H.For instance, the Null Space Property (NSP) [31] not only satisfies the 0 -1 equivalence but also comes very handy for the recovery performance analysis when x is not exactly k-sparse but only compressible.In the case we deal with approximately sparse signals or/and with a case where the measurements are contaminated by additive noise, the problem can be relaxed with Υ (y) = {x : Hx − y 2 ≤ }, where is a small positive constant.Problem (5), with this new constraint, is known as Basis Pursuit Denoising (BPDN) [32].The stability conditions of CS signal recovery techniques are also well understood: a stable solution, x, is expected to obey x − x 2 ≤ κ z with a small constant, κ for additive noise z perturbation in the measurements, y = Hx + z.
When approximately sparse signals are measured under noise, a property stronger than NSP gives a stable recovery guarantee.This property is called Restricted Isometry Property (RIP), which is defined as follows: Definition 1. (Restricted Isometry Property) A matrix H ∈ R m×N has RIP with order k, if there exist a smallest δ k (H) that satisfies for all k-sparse signal, x ∈ R N .The constant, δ k (H) is called the Restricted Isometry Constant (RIC) of order k for matrix H.
The stability and 0 -1 equivalence conditions w.r.t.RIC of a measurement matrix are thoroughly studied in the literature.
The authors in [33] show that the 0 -1 equivalence is achieved when Likewise, the stability of the 1 minimization problem is investigated in Basis Pursuit Denoising [34] and Dantzig Selector [35].In [34], it is shown that for Υ (y) = {x : Hx − y 2 ≤ } and z 2 ≤ , the solution of (5) satisfies where C 0 depends on δ 2k (H) < √ 2 − 1 [33].Notice that the recovery guarantee conditions of an arbitrary k-sparse signal enforce 2k-order RIC, δ 2k (H) instead of δ k (H).The intuition behind this is simply that for noise-free measurements, the null space analysis indicates that spark(H) ≥ 2k in order for H not to map any two arbitrary but distinct k−sparse signals x and x to the same point, so that one always has Hx = Hx .In this sense, RIP gives us a stronger guarantee that after mapping with a H, the distance between points x , x should be preserved at least as follows: The good measurement matrices A that preserve the information in the sparse domain Φ, or alternatively AΦ = H are the ones that satisfy the RIP property.Certain random measurement matrices are known to satisfy this property, one popular such case being the matrix whose elements A i,j are i.i.d.(independent identically distributed) and drawn from a Gaussian distribution, i.e., and for m > k(log(N/k)), and H inherits this property as well.We recall the following lemma that gives the stability condition of BPDN for measurements under additive white Gaussian noise (AWGN) contamination, since it will be handy in the sequel for the stability analysis of our encryption scheme.

B. Compressive Sensing Based Encryption
Since in the CS setup, a signal is linearly sampled using random or pseudo-random measurement matrices, there exists an inherent capability to provide privacy and cryptographic protection [36], [37].One advantage of CS-based encryption is that the linearity and the dimensionality reduction of the CS scheme result in low-cost operations.This could be a crucial advantage for data encryption carried out on the edge devices before data transmission to a cloud or a fusion center.In fact, it has been reported in several works [38], [39] that CSbased encryption has a much lower cost as compared to wellestablished encryption standards such as AES [10] or RSA [11].
The idea of formally using CS theory in the encryption system was first introduced in [40].These authors have considered a sparse signal x as a plain-text input signal and encrypted it in cipher-text y.A Gaussian measurement matrix, as in (8), was used in the role of the CS-encryption matrix, i.e., y = Hx.They consider the Shannon perfect secrecy [41] definition as a metric of security.CS-based encryption can be viewed as a particular case of a multiplicative randomization technique, which is also a well-known privacy-preserving method.Using the definition of Shannon [41], CS-based encryption literature generally defines the perfect secrecy in the informationtheoretical sense as follows: Definition 2. (Perfect Encryption System) A perfect encryption system satisfies for any plain-text x and cipher-text y pair.
The authors of [40] conclude that even if the Shannon perfect secrecy is not satisfied with the CS-based encryption scheme since the CS-measurements preserve the energy of plain-text as H must satisfy the condition, they argue that CS-based encryption guarantees computational secrecy, i.e., an attacker with bounded time.In a later work, it is shown that the CS-based encryption with the Gaussian compression matrix used only once and re-drawn for each coding instance reveals only the energy of x [42].Therefore, a Gaussian CS-encryption can be said to satisfy perfect secrecy if the cipher-text, y is normalized to some constant energy [36,Theorem 4].Efforts on giving privacy guarantee conditions for both normalized and unnormalized energy cipher-texts for different measurement matrix schemes continue [43], [44] (using different security metrics).Similarly, instead of Shannon perfect secrecy, Wyner-sense perfect secrecy, or their extended version have also been used in security analysis for CS-based encryption schemes [45].In the meantime, the robustness of the CS-based encryption against attacks is investigated in [46], [47].In [46], the authors consider a brute force and structural attack where an adversary tries a grid search to estimate the CS-encryption matrix, A. This attack type can be considered as a known cipher-text attack under onetime usage (or one-time secret, OTS).They conclude that the computational complexity of such an attack makes this type of brute-force attack infeasible.The known plain-text type attack (KPA) under one time usage is addressed in [47], where the adversary captures the plaintext and ciphertext pair, (x, y).Furthermore, the systems that use the same CS-encryption matrix many times are well known to be unsecure against this type of attacks [42], [40].
Due to interest in application scenarios of CS-based encryption, recently hybrid models that use both CS and conventional cipher systems have become popular.For instance, [48] applies a homomorphic cryptography function on top of the CSencryption in a wireless sensor network system.In that sense, even in multi-usage of A, the system can be made resilient against KPA.In another vein, authors in [49] have proposed a multi-class encryption system where the CS-encryption matrix is partially corrupted differently for each user, i.e., A = A + ∆A, ∆A being the partial perturbation matrix.Their scheme suggests a framework to partially corrupt the CS-encryption matrix in order to obfuscate the sensitive region of the signal.However, it is not obvious how one transmits ∆A to the receiving party for reversible de-identification.One intuitive approach would be sending ∆A in a secure channel, which could be problematic, especially when the obfuscation pattern changes from usage to usage.Another solution is to use steganographic methods [16], [17], [50] to embed ∆A directly on CS measurements, that is, by encoding the obfuscation matrix directly on the cipher-text y.This is the path we follow and its details are introduced in the following section.
It is worth mentioning some recent work in the vein of compression (via sparsification) and encryption strategy.These methods extract a sparse code, x, of the private signal and then obfuscate it.In [51], [52], [53] a ternary representation of the signal is extracted from its sparse code.Then this code is ambiguated for the privacy-protected data-sharing applications, e.g., outsourced media search or person identification applications.In [54], the authors study the reconstruction capability of sparse ternary codes given the information loss during its encoding to a ternary code.A more recent work [20] ambiguates the sparse code directly by noise addition while enabling high-quality recovery with successive refinement user.

IV. PROPOSED TWO-TIERED ENCRYPTION
The proposed method exploits techniques of compressive sampling, compressive encryption and data hiding [36], [37], [14], [55], [56], [57], [16], [17].The advantage of the CSbased technique is, on one side, that exact recovery (in strictly sparse case) or stable recovery (in approximately sparse case) of the undersampled signal is possible, and on the other side, cryptographic security can be provided.
As shown in Fig. 1, one tier of the security consists of the generation of a random corruption mask (one-time usage) to obfuscate the sensitive parts of the image.This information is then embedded directly onto the CS-encrypted signal with a ternary watermark.This data hiding scheme provides reversibility and one-time usage of the random corruption mask, which is essential for secure de-identification.In the two-tiered protection scheme, the semi-authorized user will be able to recover only the non-sensitive part while a fully authorized user is allowed to recover the whole signal.

A. Problem Definition
In the following section, we first start by giving a formal definition of the two-tiered protection scheme in the spirit of Shannon secrecy.We will define the desiderata that the ideal triple consisting of two decoders (type A, B) and an encoder must satisfy.The problem becomes then formally the design of the three mappings that guarantee the recovery and secrecy properties.Following these definitions, we give our compressive sensing based solution to the problem with a discussion of the advantages of the proposed system.
The signals of interest, s ∈ R N is composed of a sensitive part and a non-sensitive part, denoted as an orthogonal sum where s s is the sensitive part of the signal that can be obtained by zero-outing the coefficients of s which are not indexed by the corresponding index set Λ p , and s n is the remaining nonsensitive part of the signal whose non-zero coefficients are indexed by Λ c p .In what follows, we state the informationtheoretic desiderata of the encoder and of the two decoders.

Definition 3. Fully Secure and Stable Encoder-Decoders
Triple: E * (.) , D * 1 (.), D * 2 (.) 1) We define the data coding operator (CS-Encryption) as E * (.) that encrypts both the sensitive and non-sensitive parts, which is perfectly secure in that the coded signal, y does not reveal any information about s, i.e., Pr (s|y) = Pr (s).
2) The first-tier decoder, D * 1 (.) which stably recovers the non-sensitive part while not disclosing any information about the sensitive part is characterized as follows and where e is a possible additive perturbation on y, i.e., y = E * (s) + e. 3) Finally, the second-tier decoder that stably recovers both sensitive and non-sensitive parts is defined as The goal now is to find a practical coding operator, E (.) that jointly encrypts the sensitive and non-sensitive parts, which is as close as possible to the ideal operator E * (.).

1) Obfuscation of the Sensitive Part within CS-Encryption:
The proposed embedding operator obfuscates the sensitive part s Λp of the signal with the masking pattern ∆ Λp , and then compressively samples the whole, consisting of the combination of the non-sensitive part s n and the masked sensitive part.The resulting intermediate code y d is given by: where s Λp and s Λ c p are the extracted sensitive and nonsensitive parts of s, respectively.Here ∆ Λp ∈ R |Λp|×|Λp| is the multiplicative obfuscation operator, i.e., a diagonal matrix consisting of random numbers and operates only on the (vectorized) sensitive part of the signal, s s .In other words, are the matrices consisting of the subsets of columns of A that are indexed by index sets Λ p and Λ c p , respectively.The encoding in y d can also be formulated as an additive mask: where M ∈ R m×N is the masking matrix with all zeros except the columns, M Λp ∈ R m×|Λp| .The non-zero columns of the masking matrix form can be easily calculated from Eq. ( 16), i.e., M Λp = A Λp ∆ Λp − A Λp .
2) Data Hiding with Reversibility: The obfuscation matrix ∆ Λp and its location information (if necessary) are converted to a binary code to be secretly embedded on top of the compressively sensed (encrypted) signal y d .The conversion of this information to a binary code is necessary to achieve reversibility.Indeed, the exact recovery of the watermark sequence is possible [16], even in noisy case (In our scheme, noise corresponds to the masking in the sensitive part) provided the signal, s, is sparse.In a practical application, errors in a few bits on the recovered watermark is tolerable.We can define a procedure that spits out a watermark w corresponding to the binary representation of β ∆ Λp , where β ∆ Λp is sufficient information to re-produce ∆ Λp .
An example of such an operator is given in Eqs. ( 36) -(37c).We also need an inverse operator of ( 18) in order to reproduce ∆ Λp from watermark signal, i.e., w as ŵ β −1 → ∆Λp .This operator is defined in Eqs. ( 25)- (28).Note that the length of the watermark, T , can change for each use case.To accommodate varying length watermarks one can fix a maximum watermark length, T , and extend the binary code w to a ternary one by stuffing with zeros the remaining T − T bits, i.e., Data hiding limits [16], [17] determine the maximum steganographic capacity T one can expect to realize.Finally, a watermark embedding matrix (based on the second authorization key) B ∈ R m×T , T < m is generated to linearly spread the watermark w directly onto the CS-encrypted signal, i.e., the cipher-text An embedding power constraint Bw ≤ P E must be imposed in order to limit the degeneration of the recovered (non-sensitive) part of the image for semi-authorized users.
The proposed embedding scheme, E (.) is given in Algorithm 1.
Algorithm 1 Proposed Embedding, E (.) Input: s, A, B; 1. Determine the mask and the obfuscation matrix, ∆ Λp 2. Generate the watermark: β ∆ Λp → w ∈ {−a, +a, 0} T 3. Joint CS-encryption and sensitive part obfuscation: Users (type A or B) receive the watermarked and encrypted signal, y w which can be re-cast as where Hx = AΦx = As, and x ∈ R N is the sparse representation of s in Φ, and the masked part can be expressed as noise term, i.e., n For the receiver of Type-A (the semi-authorized user A) only the key A is available.Since this user does not have the watermark encrypting key, B, (s)he will perceive the cypertext as where z behaves like an additive structural noise, i.e., z = Bw + n.In the light of the discussion in Section 1, the The receiver of Type-B, (the fully-authorized user B) will possess both CS-encryption key, A, and watermark encryption key, B. Type-B decoder must recover the whole signal s n + s s with as low a reconstruction error as possible.A three-stage recovery scheme is proposed, which is adapted from the recovery method proposed in [16]: First, a raw estimate of the sparse signal is obtained by disregarding the watermark part Bw and using the 1 -minimization (5).Second, after having a preliminary estimation of x, the watermark can be recovered from the over-determined system of linear equations by subtracting the estimated x component from y w .In the final stage, the masking matrix, M, can be produced via the recovered watermark, and an improved estimation is obtained using the A + M as CS-encryption matrix and 1 -minimization.The details of the proposed scheme are as follows: First, we produce a left annihilator matrix F ∈ R p×m of B ∈ R m×T so that FB = 0, where p = m − T .Left multiplying y w with F we obtain, where n = Fn.Eq. ( 23) is also an underdetermined linear system of equations and can be solved via 1 -minimization as discussed in Section 1: After inserting the pre-estimation of x in Hx and subtracting it from the y w , we get an over-determined system of linear equations: y w − Hx = Bw.Therefore, a raw estimation of the watermark can be obtained via The 0's in the ternary watermark, w can be extracted using simple thresholding if the length of active bits T is unknown to user B: where η is the threshold value ) 0 else, (27b) and denotes the element-wise multiplication operator between two vectors.In some practical applications such as person de-identification on video streams (details will be given in Section V), this step is simplified to w = w 1 T , where 1 T is T -length vector with the first T elements 1's and the rest is all zeros.The locations of the non-zero elements of 1 T can be found using the information of Λ p , inherent in the preestimated signal, Φx.Alternatively, a pre-allocated set from watermark, w, can be dedicated to secretly carry information about T .Hereafter, the finer estimation of w can be easily found via ŵi = a × sgn( wi ).

D. Impact of Random Matrices on CS Encryption Performance
Generations of the CS-encryption matrix A and of the watermark embedding matrix B play an important role for the security and recovery robustness of the encryption scheme E (.) , D 1 (.) , D 2 (.).The choice of random Gaussian matrices as in (8) for A is convenient because they are known to be universally optimum in the sense that they satisfy both robustness and security conditions regardless of the sparsifying basis Φ.These matrices have been well investigated in the literature in terms of both recovery performance as in Corollary 1 and in terms of security metrics as discussed in Section III-B.In the sequel, we will consider A as in (8) and B consisting of orthonormal columns.For this scenario, we make a RIP based theoretical guarantee condition in watermark recovery for D 2 (.).The following lemma will be useful for the stability analysis of the decoder type-B: Lemma 1.Consider that the embedding, E (.), given by Algorithm 1 produces an encrypted signal y w from s with keys A and B, i.e., E (s) = y w = Hx+Bw+n.Let s p ∈ R |Λp|×1 denote the perturbation on the sensitive part of the signal such that s p = ∆ Λp s Λp − s Λp .Let also A be an m × N CSencryption matrix with elements A i,j drawn i.i.d.according to N (0, 1 m ).Therefore the noise pattern n in ( 21) is also a Gaussian random vector which has i.i.d.elements Proof.Let A i,Λp be the the i th row of A Λp .Then the elements of the vector, A Λp s p ∈ R m×1 will be n i = A i,Λp , s p independent Gaussian random variables with zero means, where v 1 , v 2 refers to inner product of vectors v 1 , v 2 .Therefore, it remains to prove that E( m , which can be straightforwardly obtained (using i.i.d.property) Having Lemma (1), and using Corollary 1 from the literature, we are ready to state the following theorem for watermark recovery probability of D 2 (.): Theorem 4. Consider the Gaussian CS-encryption matrix defined in Eq. (8).Let the watermark-encoding matrix B have orthonormal columns.δ 2k (H) < √ 2 − 1 and δ 2k (FH) < √ 2 − 1 are given.Let also the annihilator matrix F have orthogonal rows such that F i,: 2 = m p , where F i,: denotes the i t h row of F. For a marked ciphertext, y w , for a particular setting of = (1 + γ) √ mσ n , Eq. ( 28) to be used in Algorithm 3 can recover w i , the watermark bits, correctly Pr(w i = ŵi ) with probability at least where C = 4 2)δ 2k (FH) and a = a − η, where a, and η are hyper-parameters used in Algorithm 3.
The proof of the theorem is given in Appendix A. Theorem 4 establishes a bound on the watermark recovery probability as a function of the energy of perturbation on the sensitive part, RIC of the matrix FH and watermark embedding strength a.This type of analysis based on RIP for the CS reconstruction Fig. 1: Proposed Reversible Privacy-Preserving Video Monitoring algorithm as in Corollary 1 is known as theoretical guarantee conditions in worst-case scenario [58].In general, for most of the practical applications, the algorithms perform much better than the performance bounds given by this kind of RIP based analysis.Nevertheless, it gives us an indication on how to design the related matrices for the encoder (such as A, B, H) and how to choose hyperparameters for the decoders.For example, choosing both F and H as Gaussian matrices may not be the right decision since the product of two random Gaussian matrices is a random matrix with coefficients drawn from a heavy-tailed distribution [59], which yields a δ 2k (FH) bigger than the Gaussian case.

E. Choice of the encryption matrix
Although random measurement matrices are optimal in the universal sense, they become computationally unwieldy for realistic signal and measurement dimensions, N and m, respectively.Recall that the iterative signal reconstruction algorithms require transposition and multiplication of the measurement matrix several times.To ease this computational burden, one can choose the rows of the measurement (CSencryption) matrix randomly as a subset of an orthonormal and fast implementable transform base such as Fourier, DCT, or Hadamard.In other words, one can choose m rows randomly out of the N the rows of an orthonormal transform, Θ.These rows are indexed by Ω ∈ {1, 2, 3, ..., N }, i.e., with cardinality |Ω| = m.Thanks to these types of structural CS matrices, the computational cost of As can be reduced significantly, i.e., down to O(N log N ) flops from O(m × N ) flops for general random CS matrices.For a good choice of the measurement matrix, A = Θ Ω in terms of a sparsifying basis Φ the rows of H must be as flat (dense with nonzero elements) as possible.This can be satisfied when the rows of the measurement matrix A are not sparse in the sparsifying basis Φ.This requirement can be quantified via the "mutual coherence" functional, i.e, µ(H) = max i,j |H i,j |.The performance limits of the 1decoding schemes such as (BPDN) case are given in terms of the functional µ(H).If one chooses randomly m rows of an orthonormal basis, Θ, indexed by Ω ∈ {1, 2, 3, ..., N } to build a measurement matrix A, then a k-sparse signal can be exactly reconstructed as a solution of the 1 -decoding (BP) in ( 5), satisfying m ≥ O(µ 2 (Θ) × k × log N ), with an overwhelming probability [60].
We have chosen the Noiselet basis and the 2-D Wavelet basis to create a CS-encryption matrix and a sparsifying matrix, respectively.First, since these two transforms are known to be maximally incoherent with each other, and second because they have fast implementations.The indices of the chosen rows are randomly drawn and then permuted to increase the security level.

F. Design of the annihilator matrix F and its corresponding watermark embedding matrix B
The watermark embedding matrix B, which must be the right null space matrix of F, can also be chosen from a fast transform.For example, one can constitute the columns of B by choosing randomly a subset of the rows of DCT basis matrix, then, the rows of F can be made up of the remaining rows of this DCT matrix.
Theorem 4 implies that the choice of matrices F and H influences the performance of the Algorithm 3. To investigate the impact of the choice of F on FH, we compare the performance of the 1 minimization on the recovery of sparse signal x from y = FHx, for three different settings: (i) First, with the random Gaussian measurement matrix FH as in Lemma 1, Theorem 4. (ii) Second, for the case where F is made up of a subset of the rows of DCT, A is similarly made of a subset of Noiselet basis, sparsifying matrix Φ is chosen as Haar basis.Figure 2a shows the average mutual coherence values of FH under different setups.Figure 2b shows the exact recovery probabilities at different measurement rates for the three different choices of F. These results prove that even if the random measurement matrix is universally optimum in the sense that it guarantees the exact recovery for any sparsifying basis in the worst case scenario, in practice structured matrices obtained from orthonormal transforms can perform even better.We make use of the mutual coherence functional; the formula below is slightly different from that given in the previous subsection, though related to it: where h i is the i th column of matrix H. (iii) Alternatively, based on the arguments in [61], a randomization matrix can be applied to F, i.e., F = FR, where R is m × m matrix of all zeros, except the diagonal terms that are drawn from the Bernoulli distribution.In [61], it is proven that the matrix FRH with any orthonormal basis pair, F, H and randomization matrix R with diagonal Bernoulli elements, approaches a Gaussian matrix.This is, in fact, illustrated in Figure 3 as quantile-quantile plots.Although, this does not result in any performance increase vis-à-vis mutual coherence and recovery performance as shown in Figure 2, this scheme will enhance the security level with only negligible additional computation in the recovery part.In Figure 3, the vertical axis denotes the level at which the empirical distribution falls below a Q level (e.g., 50%), while the horizontal axis indicates the quantiles for the standard Gaussian distribution.In all cases, the similarity between the distribution of the FH sensing matrices and that of a Gaussian sensing matrix is obvious.Distribution of sensing matrices approaching that of a Gaussian is a desirable characteristic both for data hiding and CS-encryption purposes.

G. Design of the obfuscation matrix
The region of interest (e.g., a face) to be obfuscated is delineated by Λ p .Obfuscation matrix is constituted with all zero entries except for the diagonal elements that are drawn from a Bernoulli distribution with probability p 1 , i.e., The corresponding masking matrix M will be Thus, the watermark generating procedure, will be where T bits w are allocated for the location information of the sensitive part, i.e., the starting and ending points of rectangular region of interest including faces in the image and Alternatively, having the intermediate estimation of image s = Φx, the obfuscated region can be easily deduced and extracted, without the need of data hiding the location information in w .
H.More secure obfuscation with a key for a Gaussian vector A semi-authorized user with only key-A may try to make a brute-force attack, by trying out all possible binary combinations of ∆ Λp i,i 's to un-hide the obfuscated region.Even though the computational complexity of this attack is impractically high, i.e., 2 |Λp| , to make the privacy protection stronger one can make use a third key, g.This can be realized using a predefined vector g ∈ R N , that is known only to fullyauthorized user (type B), which is used to generate another obfuscation matrix as where g j ∼ N µ g , σ 2 g .

V. A CASE STUDY: REVERSIBLE PRIVACY-PRESERVING VIDEO MONITORING
As a use case of the proposed two-tier image encryption algorithm, we investigate a video surveillance application where sensitive segments are to be concealed from semiauthorized users and revealed only to fully-authorized users.The sensitive parts of the image are the faces of people in the scene.
For face de-identification performance, we use two criteria: i) the Structural SIMilarity (SSIM) index [62] to measure the quality of the decoded and reconstructed image parts [63]; ii) face recognition accuracy via a machine learning algorithm as an indicator of privacy protection [64], [65].For the semiauthorized user (with only key A), we aim to have both minimum classification accuracy in the concealed parts and also minimum degradation in the reconstructed non-sensitive parts.For the fully-authorized user, we want to achieve the highest classification accuracy and highest reconstruction accuracy when both A and B keys are used for decoding.
We also test an attack scenario where the malicious user (e.g., a semi-authorized one or an attacker who has stolen the CS-encryption key, A) has access to the labels of face images in the training set, so that (s)he can train a classifier to make Fig. 4: Sample recovered frames for the semi-authorized (User A) and authorized (User B) (measurement rates 0.6, 0.7).inferences from de-identified images.The experimental results (Section V-C and Table VI) show that our one-time usage of random obfuscation matrix prevents an adversarial from making an inference (identify the faces) even if the labels of the training set are captured.

A. Experimental Setup
The experimental evaluation is conducted on the YouTube Faces Database [22] to demonstrate the viability of the proposed method in such applications as video surveillance, intelligent access control, and in general, analytics for intelligent buildings.Accordingly, we have randomly chosen 5000 frames from YouTube Faces Database corresponding to 100 identities (50 frames per identity).Recovery performances are reported using 3000 frames while non-overlapping 2000 frames are collected to build the training set for privacy preservation performance evaluations.The Matlab implementation of the experiments and additional demos can be downloaded from https://github.com/mehmetyamac/CS-Privacy-Protection.We use a randomly chosen subset of the rows of noiselet basis as the measurement matrix.The implementation of the real-valued "dragon" noiselet is borrowed from [66].As the sparsifying matrix, we choose wavelet "Coiflet 2" and use WaveLab850 [67] wavelet toolbox 1 .The columns of the encoding matrix B were chosen from the random subset of the columns of m × m DCT basis, and then were shuffled.Therefore, the rows of the annihilator matrix, F has been picked from the remaining columns and shuffled (i.e., H = Noiselet × Wavelet and F = DCT).Moreover, Gradient Projection for Sparse Reconstruction (GPSR) [68] was used for 1 -minimization.
The various parameters taking place in the experiments are listed in Table II.For different watermark embedding powerto-signal ratio, Bw y d , and compression (measurement) rates, the performance of the decoders is reported in Section V-B.B. Recovery Performance of D * 1 (.) and D * 2 (.) Choice of the watermark amplitude, a or alternatively the watermark embedding power is the determining factor in the watermark recovery performance (recall Theorem 4).In other words, the embedding power-to-signal ratio, Bw y d , forms the trade-off between the type A non-sensitive image recovery quality and type B sensitive image recovery quality.On the one hand, a should not be too small since the erroneous estimation of the watermark bits affects the recovery of w and ∆Λp , hence the quality of the reconstructed sensitive part.On the other hand, increasing a could impede the decompression performance compromising the overall,s s + s ns signal recovery, because the embedded watermark Bw acts as an additive noise in the decoder (Eq.( 22)).This trade-off, recovery quality of sensitive regions (type B) and non-sensitive region for type A user, is observed in Figure 5.We have found empirically that good values of a are in the [0.085, 0.15] range, based on peak signal-to-noise ratios (PSNRs) and quality of recovered images.Table III, we can say that User A's reconstructed faces are unrecognizable, whereas their outside regions have adequate quality, albeit around 5 dB lower in PSNRs as compared to those of User B, especially at low MRs.For User B, the reconstruction quality of both the concealed regions and the whole frame are satisfactory; there is only small detail losses in the privacy-sensitive parts.
In Table IV, SSIM values for the concealed region of reconstructed images are reported.It can be seen that faces in recovered images with using only Key A result in very low SSIM scores, making the unrecognizable, while their SSIM scores are very high for user type B, especially at MRs above 0.5.

C. Performance in Privacy Preservation
Privacy-preserving performance of the proposed method is evaluated by demonstrating its robustness against the stateof-art face recognition attacks.To this end, we employed a pre-trained Convolutional Neural Network (CNN) provided by the dlib library [69] to extract the facial features.Then, face recognition is performed as follows: We extract 128-dimensional embedded (CNN) face recognition features and build a database consisting of labeled faces for the query; then, perform a nearest-neighbor search and select the first nearest identity as the classification output.The experimental results are evaluated for two types of attacks.
1) Attack Type I: Known plain-text (original faces), known labels: In this scenario, a malicious user with the stolen Key A (or a malevolent type A user) may capture the training set with its labels to train a classifier to decipher the anonymized faces.The experiment designed to test the de-identification robustness against this type of attack is as follows: We construct a query database consisting of 2000 original clear frames (20 frames per identity).Then, we perform face recognition in the face regions that have been reconstructed with User A Key A and with the two User B keys.The recognition accuracies are reported in Table V.The performance of User A is about 1%, which is like a random guessing score while accuracies for User B are very satisfactory, i.e., around 75% for high MRs.This is comparable to the recognition rate achieved when the same face recognition software is tested on the original images.2) Attack Type II: Known plain-text (original faces), known anonymized and their labels: The ability of the proposed method to withstand a more challenging case, the parrot attack [70], where the user with Key A has captured both labeled clear images and their anonymized counterparts in the training set, is tested in the following experiment: The aforementioned query with NN-search is constructed in a way that each identity has 20 clean and 10 anonymized images with true labels.Face recognition algorithm is run over face regions in recovered images of type A. The results in Table VI reveal that the reconstructed faces for User A do not leak any useful information that can be exploited in a parrot attack since a different randomized corruption matrix was employed for each frame, i.e., the occurrence of the face with the same identity.

VI. DISCUSSION
Privacy protection in video: We have so far tacitly assumed that privacy protection in video were to be realized in a frameby-frame privacy processing mode.Thus, the sensitive part in each frame, e.g., face region was to be separately obfuscated and each such frame CS-encrypted via the B, i.e., y w = (A + M)s + Bw formulation.A simple extension to a multiframe video case would be to vectorize groups of frames, and straightforwardly adapt the above methodology, where now s Λp and ∆ Λp denote the sensitive parts and masking patterns striding over the frames in the group.A more principled way to extend the scheme to multi-frame video must leverage a tensor based CS-encryption scheme [71].The video is considered as a 3-D signal, S ∈ R n1×n2×n3 , which is a sequence of n 3 consecutive n 1 × n 2 images.Then, the CS-encryption matrices, A 1 ∈ R m1×n1 , A 2 ∈ R m2×n2 , A 3 ∈ R m3×n3 , can be applied over to S in order to obtain an encrypted and compressed tensor, i.e., Y = S × 1 A 1 × 2 A 2 × 3 A 3 , where S × i A i is the i-mode product of tensor S and matrix A i .Let S s be the sensitive part of the video that is obtained by zero-outing the coefficients of S and S n is non-sensitive part of it.Similar to our matrix-vector notation, jointly CS-encrypted and anonymized tensor can be obtained via Y d = (S n + P • S s ) × 1 A 1 × 2 A 2 × 3 A 3 where P is the degradation tensor and • is element-wise (Hadamard) product of two tensors.Then, the marked vector, y w can be easily obtained i.e., y w = vec(Y d ) + Bw.In the decoder part, a recovery algorithm with D 1 (.) and D 2 (.) similar to those in Algorithm 2 and Algorithm 3 can be used with replacing 1 based sparse vector recovery to a sparse tensor estimation method.
Multi-tier privacy protection: It is possible to extend the proposed scheme to more than two-tiers by replicating the scheme outlined in Subsection IV-C and Figure 1.Recall that the obfuscation mask encoded as w and embedded via an appropriate watermarking matrix B resulted in the expression y w = (A + M)s + Bw.Consider, for example, a threetier scenario, where s s1 and s s2 are identified as sensitive parts, the higher indexed components having, for example, a higher privacy concern.The respective obfuscation matrices, M 1 and M 2 are encoded by their corresponding watermarks w 1 and w 2 .These watermark signals can be spread over y d , for example, as y w = (A + M 1 + M 2 )s + B 1 w 1 + B 2 w 2 or y w = (A + M 1 + M 2 )s + [B 1 B 2 ] [w 1 ; w 2 ].If desired, the resulting signal y w can be finally subjected to another layer of light-weight encryption.The decoding of the three-tier scheme follows steps similar to Section IV-C and Algorithm 3.
In this work, we have considered privacy protection in images and video as an application case.However, the proposed signal acquisition, privacy-protection and encryption scheme can be applied to any multimedia data that can be differentiated into sensitive (private) and non-sensitive (public) parts.A case in point could be the monitoring data of a wireless sensor network [72].In such a distributed sensing mechanism, one may want to hide data in the sensor readings that would lead to traffic analysis and flow tracing.Another example would be a CS-based telehealth system [13] where health personnel with different authorization would have differential access to parts of medical data and biosignals.
Furthermore, using CS-encryption together with other lightweight encryption techniques is a common practice in the literature.For instance, in [72], the authors used Pailier cryptosystem over y = As, to strengthen the security.Similar approaches can be applied over y w provided that invertibility of the applied encryption method.

VII. CONCLUSION
We have presented a two-tiered (potentially, multi-tiered) privacy-preserving scheme based on compressive sensing theory.The scheme accommodates two levels of users: A public user A (with only Key A), who can recover only the nonsensitive portions of the document, and private B, i.e., the fully-authorized user who (with keys A and B) who can recover the whole document.This prioritization is enabled via a data hiding technique such that the full user in possession of (Key B) can undo the obfuscation from within the CSenciphered signal.
The watermark capacity of the system allows one-time usage of the obfuscation matrix, which in turn provides a higher level of security against any attacker, e.g., a curious semiauthorized user.In conclusion, the proposed approach satisfies all the criteria of privacy-protecting encoding, as itemized in the introduction section.Security can be corroborated by extra randomization as in Eq. (38a)-(38b).Extensive tests on a face anonymization use case revealed that the system is robust against cipher breaking attacks (i.e., face recognition) and that the image recovery quality is adequate for measurement rates m/N above 0.5.The experiments yielded guidelines for the selection of system parameters like compression rate and watermark embedding strength.
The proposed scheme with its experimentally proven merits of reversible anonymization provides a promising alternative of privacy-protecting encryption.An application scenario would be a video surveillance system where the collected real-time data must be transmitted and uploaded in a security monitoring center.

Fig. 2 :
Fig. 2: Average mutual coherence of the matrix FH = FAΦ for different realizations of A and calculated probability of exact recovery over 250 trials.An exactly sparse signal is synthetically produced for N = 256 and k = 30.

Fig. 5 :
Fig. 5: Peak signal-to-noise ratios (PSNRs, dB) over recovered non-sensitive part (red curve), and sensitive part (blue curve) with the keys, respectively, of User A and User B. Measurement rate is fixed at 0.6.In TableIII, we show the recovery performance of type A and type B decoders for the concealed region, for the nonconcealed region, and for the whole frame.Recovery qualities are reported for different compression rates (CS measurement rates: MR = m/N) and for two chosen values of Bw y d , namely, 0.15 and 0.085.Based on the visual assessment of the sample frames in Figure4and on the reported PSNR values in

TABLE I :
From left to right: a) Symbols of the frequently used variables in the article.b) Denotations of these symbols.c) The corresponding cryptographic terminology, if applicable.d) The conditions the variables must satisfy for the encryption scheme to work properly.
d = (A + M)s w Ternary WatermarkLength, T , and watermark magnitude, a, are predefined: wi ∈ {a, 0, −a} (5)nimization scheme in(5)can be used to recover x with Υ (y) = {x : Hx − y 2 ≤ }.Afterwards, using the outcome of the 1 minimization technique, x, one can obtain an estimate of the signal s with mask, ŝ, straightforwardly via ŝ = Φx.The decoding algorithm for semi-authorized users, D 1 (.) is given in Algorithm 2.

TABLE II :
List of the user defined parameters.

TABLE III :
PSNR values over sensitive and non-sensitive regions of the frames for different measurement rates (MR) with a binary mask and a binary masked Gaussian for masking, and for embedding strengthBw

TABLE IV :
Structural Similarity Index (SSIM) over anonymized regions for different measurement rates (MR) using binary mask and binary masked Gaussian for masking for embedding strength Bw y d = 0.085 in (a) and 0.15 in (b).

TABLE V :
Face recognition rates of User A and User B, the semi-authorized and authorized users, respectively, for different measurement rates (MR) using a binary mask and a binary masked Gaussian for masking, at Bw y d = 0.085 (Tablesa and b), and at Bw y d = 0.15 (Tablesc and d), respectively.The recognition accuracy on original frames is 77.37%.

TABLE VI :
Face recognition rates of the semi-authorized user when the corrupted images from User A are added into search space for nearest-neighbor.The accuracies are reported for different measurement rates (MR) withBw