Covert Communications Over Non-Orthogonal Multiple Overt Channels

The paper exploits overt information of non-orthogonal multiple access (NOMA) systems for camouflage to hide covert information. The optimum strategy of hiding information at the transmitter, accounting for the probability of the warden’s detection error and the resulting spectral efficiency of the covert information under the covertness requirement, is determined. The results show that the optimum strategy of hiding the covert information is to superimpose it onto the overt NOMA information which is hardest-decoded, increasing to 1 the probability of detection error at the warden when the number of NOMA users accretes. This indicates that the multiplicity of NOMA users can be leveraged to hide the covert information in NOMA systems. In addition, various results prove the increase in the covert spectral efficiency with increasing the number of NOMA users while guaranteeing the probability of the warden’s detection error close to 1.


I. INTRODUCTION
As a promising candidate to utilize and share spectrum efficiently among users in future networks, non-orthogonal multiple access (NOMA) has demonstrated prospective gains over the single-antenna [1], [2] and multi-antenna [3] orthogonal multiple access (OMA). Under security requirements and relying on physical layer security (PLS) perspective, several papers have revealed that NOMA-based systems outperform its OMA counterpart in terms of secrecy sum rate [4], [5], [6], [7].
Recently, covert communication has received much attention due to its crucial role in improving user privacy by hiding transmission towards a warden. It has found impor-The associate editor coordinating the review of this manuscript and approving it for publication was Yan Huo . tant applications ranging from military to commercial wireless networks. The authors in [8] pioneeringly studied the theoretical limit of covert communications by considering square root law (SRL) tools: the transmitter can send covertly and reliably O( √ U ) bits over U channels. This means that O( √ U )/U tends to 0 as U goes to infinity. Then, [9], [10],and [11] expanded the SRL for discrete memoryless channels (DMCs). In [9], [10], and [11], the Big-O notation completely characterized the constant hidden. Additionally, [12] studied covert communications in DMCs from the perspective of the second order asymptote subject to distinct covertness constraints. Owing to the asymmetry property of Kullback-Leibler divergence, [13] proved the Gaussian signalling to be optimal for one covertness constraint yet for another. Several efforts have been strived to surpass the SRL in the manner that a positive covert rate is achievable for infinite number VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ of channel uses or O(U ) instead of O( √ U ) bits can be sent reliably and covertly in U channel uses. To guarantee a positive covert rate, the noise uncertainty at the warden was exploited in [14], [15], and [16]. More specifically, the signal transmission was proved not to be detected if the detector's signal-to-noise ratio (SNR) does not exceed a threshold, a.k.a. the SNR wall in [16]. This is because the estimated noise power does not agree with the real one, so called noise uncertainty. The positive covert rate probability was analyzed theoretically in [15] as the adversary suffers noise uncertainty. Then, [14] analyzed the covert throughput under two practical noise uncertainty paradigms in additive white Gaussian noise (AWGN) channels. The core idea is to exploit the warden's noise uncertainty to supply covertness and achieve a non-zero covert rate. Jamming [17], [18], interference [19], [20] and artificial noise [21], [22] are applicable sources of noise uncertainty. Other uncertainties at the warden were also exploited, such as the uncertainty of transmission time [23], that of transmit power [24], and that of channel [25], [26] as an additional factor, along with the noise uncertainty, to improve covertness and increase the covert rate.
Most recently, superimposing (embedding) the covert information into an overt transmission has been developed from an information theoretic viewpoint in the broadcast channel [27] and in the multiple access channels [28], which was inspired by [29] wherein information transmission is hidden by exploiting hardware imperfection (dirty constellation). Then, [30] investigated superimposing the covert information into the overt information at the relay in cooperative networks with amplify-and-forward relay. The covertness has been also studied in the NOMA networks with two-users and exploiting the warden's uncertainties of random transmit powers [31] or employing intelligent reflecting surfaces [32]. The idea of hiding information into overt transmissions was to exploit existing (overt) transmissions combined with warden's uncertainties to enlarge the dynamic range for covert transmissions.
In literature, covert communication can be categorized into two mainstreams: hiding information in noise [8], [9], [10], [11], [12], [13], [14], [15], [16], [17], [18], [19], [21], [22], [23], [24], [25], [26] and hiding information in existing transmissions [27], [28], [29], [30], [31], [32]. The former focuses on sending only the covert information and mainly exploiting the warden's uncertainties, such as noise and channel, which depends on the warden's hardware performance and cannot be predicted or unknown to the transmitter. Thus, it is impossible for the transmitter to design and control the practical covert transmissions since the warden's uncertainties are unknown to the transmitter. On the other hand, hiding information in existing transmissions can significantly increase the covert throughput and can control the covertness requirement since the uncertainty of the existing transmissions at the warden, i.e. whether the covert information is hidden into them, can be controlled by the transmitter. Although covertness has been considered in NOMA systems [30], [31], [32], its analysis relies on the energy detection assumption at the warden, which is not optimal especially when the warden succeeds in decoding the overt message and knows its codeword [33]. As a result, the covertness in NOMA systems has not been quantified accurately.
Therefore, this paper considers hiding information in multiple overt transmissions of NOMA systems with multiple users and assuming that the CSI of existing transmissions is publicly known. It is determined the optimum strategy of the warden to optimize its detection probability and that of the transmitter to induce maximum detection error at the warden and the spectral efficiency of the covert information under the covertness requirement. The results prove that the covert information can be decoded only if all overt information wherein the covert information is superposed onto, can be restored. This can be referred to conceal the covert information by exploiting the decoding uncertainty of overt information at the warden and suggests to superimpose it onto the overt information which is hardest to be decoded. The results also show that it can guarantee the maximum probability of the warden's detection error close 1 for sufficiently large number of NOMA information. Therefore, the multiplicity of NOMA overt information can be leveraged to hide information. Then, the covert spectral efficiency increases, in trade-off of the high loss of the overt spectral efficiency owin to covert communications, as the number of transmit antennas or NOMA users accretes. This paper contributes the following: • A paradigm to conceal the covert information by utilizing the multiplicity of NOMA overt users is proposed.
• The optimum strategy of transmission at the transmitter to minimize the detection probability of the warden is determined. By leveraging the multiplicity of NOMA overt users without requiring any warden's uncertainty, such as noise or channel gain, this strategy can achieve an increasing-to-1 detection error probability at the warden. This finding is under practical consideration where the transmitter can base on the NOMA network parameters to control and design its covert transmission.
• The warden's detection error probability, the covert spectral efficiency and the loss of the overt spectral efficiency for the covert transmission are analyzed and characterized with general number of NOMA users.
• A non-zero covert rate can be obtained whilst guaranteeing the decoding error probability at the warden close to 1. This can efficiently achieved by exploiting the existing NOMA network with the multiplicity of overt users.
The next part of this paper presents the system model. Then, the decoding strategy of the warden, along with the optimum decoding threshold to optimize its detection probability, are described in Part III. Next, Part IV determines the best cover set chosen by the transmitter to minimize the warden's detection probability, while Part V presents the derivation of the decoding error probability at the warden. Subsequently, Part VI derives the detection outage probability of the covert information. The spectral efficiencies of the overt and covert information and the spectral efficiency loss of the overt information for the covert information are derived in Part VII. Part VIII extends the analysis for the multi-antenna transmitter case, whilst Part IX presents numerous results along with insightful discussions. Eventually, Part X makes the paper conclusion and mentions the future work. Fig. 1 illustrates a downlink NOMA system composed of L NOMA information s 1 , . . . , s L , where s l = (s 1,l , . . . , s n,l ), l ∈ [1, L], of L overt users and the transmitter (Alice) opportunistically transmits covert information u = (u 1 , . . . , u n ) to a covert user (Bob) toward a warden (Willie), who is not to decode the covert information, but to detect if any information other than NOMA information or the transmission of the covert information. The random coding argument is employed to generate codewords [27]. That is, L overt codewords s l for 1 ≤ l ≤ L and one covert codeword u are produced by taking symbols independently from a complex Gaussian distribution with zero mean and variance of P l and l∈S P l , respectively, where L l=1 P l = P. The designed specifications of the system, i.e. P l , l∈S P l and φ are supposed to be available at all receivers including Willie. It is also supposed that the codebooks for overt information are publicly known while that for the covert information is secretly shared between Bob and Alice, and unknown to Willie.

II. COVERT COMMUNICATIONS IN NOMA SYSTEMS
Considering superimposing u onto a set, denoted as S, of some overt NOMA information where |S| ≤ L and its complement is denoted as S c . For example, S = {1, 5, 9} means that u is superimposed onto s 1 , s 5 , and s 9 . The optimum cover set S, or the set of overt messages where the covert message is superimposed onto, to maximize the warden's detection error will be discussed in Section IV. Since Willie does not know whether Alice transmits the covert message or not, the transmitted signal is viewed under two hypotheses of no transmission of u, denoted as H 0 , and the alternative, denoted as H 1 , where φ ∈ (0, 1) is the power ratio allotted to the overt information. It should be noted that, under both hypotheses H 0 and H 1 , all NOMA messages are still transmitted to avoid being detected by the warden and that the total transmit power is identical. All nodes are assumed to have single antenna and denote the fading channel from Alice to the j-th overt user, Bob and Willie, as f j , h and g, respectively, which are independent and identically distributed (i.i.d) complex Gaussian random variables with zero mean and variance of σ 2 f , σ 2 h , and σ 2 g , respectively. The assumption of i.i.d. channel gains have practical appeal in scattered networks [30], [31], [33]. The multi-antenna scenario at the warden and transmitter will be extended in Sections V and VIII, respectively. Assuming the forward channel training (Alice sends a pilot based on which the overt users estimates the channel gains and fedback to Alice), then Alice perfectly knows f j , j = 1, . . . , L. However, Bob does not feedback h to Alice to prevent Willie from detecting his presence and thence, h is unknown to Alice. The channel gain at the overt users, without loss of generality, are assumed in descending order, i.e. |f 1 | 2 ≥ |f 2 | 2 ≥ . . . ≥ |f L | 2 . We denote n x = {n 1,x , . . . , n n,x } ∼ CN (0, σ 2 n I ) 1 as a noise vector at the receiver x for x = {j, w, b}, where j, w and b indicates the j-th overt user, Willie and Bob, respectively. For compactness, Table 1 lists frequently-used notations.

III. WARDEN's DETECTION
This part describes the detection strategy at the warden and derives the optimum detection threshold to maximize the warden's detection probability as well as the resulting minimum detection error probability. It follows from (1) that Willie receives the signal to be Due to the forward channel training at Alice, Willie is assumed to perfectly know its channel gain g. Since Willie has to decide H 0 or H 1 , regarding the transmission of u from his observation of y w with the length of n, it applies the optimum method of the likelihood ratio test (LRT) for its detection to maximize the detection probability.
If assuming that Willie knows the overt information {s l }, the LRT test or the optimum detection at Willie is given by [17] Pr(y w |g, where Pr(y w |g, {s l }, H 0 ) and Pr(y w |g, {s l }, H 1 ), when u in (2) is treated as noise because the codebook of u is unknown to Willie, are given by [33] Pr(y w |g, and respectively, and δ is a detection threshold. It should be noted that the optimum detection in (3) depends on Willie's knowing all overt information. Then, it is necessary to consider another statistic test, rather than (3). In the next subsection, we will describe the detail of warden's detection strategy and its optimum detection. Then, we will prove that, to detect the covert information, the warden must decode all overt information where the covert information is superimposed onto.

A. DETECTION STRATEGY
For convenience, we notate the set of indices of the overt NOMA information which Willie decodes after the successive interference cancellation (SIC) is finished, as D and its complement as D c . Since Willie does not know s l∈D c but their codebook (their distribution) is known, he can produce a series of likely codewords for s l∈D c and perform the marginalized LRT as the optimal detection [33], [34], where E H [·] notates the expectation operator with respect to the random variable H , and (7) and (8) are provided in Appendix A.
Different from previous studies where the overt information is supposed to be available at the detector [30], [31], [32] and the detection error mainly relies on the LRT in (3), our work considers unknown overt information and thus, Willie should perform the optimum marginalized LRT as in (6), i.e. averaging over unknown overt information s l∈D c .
To determine the probability of detection error at the warden, the following considers two scenarios that Willie decodes unsuccessfully all s l∈S (S ∩ D = ∅) and Willie succeeds in decoding at least one s l∈S (S ∩ D = ∅).

1) IF S ∩ D = ∅, I.e. WILLIE FAILS TO DECODE ALL s l∈S
Since S ∩ D = ∅, we obtain l∈S∩D c P l = l∈S P l and l∈S c ∩D c P l = L l=1 P l − l∈S P l − l∈D P l , which yields σ 2 1 = σ 2 0 . It follows from (7) and (8) that which, from (6), yields = 1. Then, the missed detection probability, P M := Pr( ≤ δ|H 1 ), and the false alarm probability, P F := Pr( > δ|H 0 ), are (P F , P M ) = (0, 1) or (P F , P M ) = (1, 0). Hence, the sum of the missed detection and false alarm probabilities reduces to for all δ. This means that Willie is unable to detect the transmission of u if he fails to decode all s l∈S .

2) IF S ∩ D = ∅, I.e. WILLIE SUCCEEDS IN DECODING AT LEAST ONE s l∈S
Taking the natural logarithm on in (6) and dividing it by n, we obtain which, from (7) and (8), yields = ln As n → ∞, converges to → ln where Then, the entire detection error probability reduces to by adopting the detection threshold δ ∈ (ln(σ 2 1 /σ 2 0 ) + τ 0 , ln(σ 2 1 /σ 2 0 ) + τ 1 ). Remark 1: If any s l∈S is decoded successfully by Willie, the covert information u can be detected perfectly for sufficiently large n, i.e. Willie is permitted to inspect the received signal in a long duration. This is equivalent to that if Willie fails to decode all s l∈S , then it is unable to detect the transmission of u.

B. OPTIMUM DETECTION THRESHOLD FOR WILLIE
Willie aims at minimizing the entire detection error probability by adopting properly the detection thresholds δ and δ. The entire detection error probability, averaging over two scenarios that Willie decodes successfully at least one s l∈S and Willie decodes unsuccessfully all s l∈S , can be expressed as For S ∩ D = ∅, Willie can choose δ ∈ (ln(σ 2 1 /σ 2 0 ) + τ 0 , ln(σ 2 1 /σ 2 0 ) + τ 1 ) to induce the second and fourth terms in (15) to be zero, which reduces (15) to or which (16) is derived due to = 1 for S ∩ D = ∅ as shown in Section III-A1. It follows from (2) where the signal to interference plus noise ratio (SINR) for s l∈S under H 1 is reduced by a factor of φ compared to that under H 0 , we have Pr(S ∩ D = ∅|H 0 ) < Pr(S ∩ D = ∅|H 1 ). Hence, Willie should choose δ < 1 to minimize P M + P F and then, the resulting minimum entire detection error probability is obtained as where R j is the transmission rate of the j-th overt information, is the received signal after successively cancelling s l , l ∈ D j , and D j is the decoding set before decoding s j . One should note that the minimum detection error probability in (18) represents the best detection capability at the warden and depends on the cover set S or the transmitter's choice of which set of overt information wherein the covert information is superposed onto. Relied on this, the next following section then considers the optimum transmission strategy at the transmitter to minimize the warden's detection probability by properly choosing the optimum cover set S.

IV. OPTIMUM COVER SET S FOR ALICE
In this section, we determine the optimum cover set chosen by Alice for maximizing the minimum detection error possibility ξ min and then, derive the detection error probability, denoted as ξ max , when this optimum cover set is chosen. Alice aims to maximize ξ min by finding the optimum cover set S. One can obtain from (18) that the optimum |S| to maximize ξ min is 1, meaning that the covert information should be superposed onto one overt information. Then, the resulting maximum entire detection error probability is expressed as where |S| = 1 and under H 0 , and γ j = P j /σ 2 n . VOLUME 10, 2022 Since |S| = 1 and j ∈ S, S = {j}. This indicates that the covert information should be superimposed onto only one overt NOMA information. Next, we determine the overt transmission rate R j in NOMA systems and then, which overt NOMA information should be chosen for the covert information to superimpose onto in order for the entire detection error probability to be maximized. Before deriving the expression of ξ max of (20), we describe the design of L NOMA overt information to guarantee that all overt users can decode their own information regardless of the transmission of the covert information.
Overt Transmission: Assuming the SIC is applied by each user, i.e., the k-th user will firstly recover s m , m > k, and then remove them from y k in a successive manner [2]. The remaining signals s m , m < k, are considered as noise. It follows from (1) that, when S = {j}, the j-th overt user receives the signal under two hypotheses H 0 and H 1 to be One can see that R j under H 0 is higher than that under H 1 due to additional interference of the covert information u under H 1 . Then, R j should be determined under H 1 to guarantee that all overt users can decode their own information regardless of the transmission of the covert information. Hence, the j-th user after suppressing s l , j < l, from y j obtainŝ As a result, the maximum rate of s j under H 1 is expressed as Then, ξ max in (20) is obtained from (21) and (24) by where (x) + = max{0, x}. When g and D c j are unavailable at Alice, the maximum of (25) is obtained by minimizing j l=1 γ l and maximizing |f j | 2 . Because j l=1 γ l reduces and |f j | 2 increases with declining j, the maximum of (25) is attained as j = 1, viz. superposing u onto s 1 , yielding Remark 2: The best hiding strategy for Alice to maximize the entire detection error probability is to superpose the covert information u onto the overt information s 1 that experiences the highest channel gain hence can be transmitted with the highest data speed.

V. MAXIMUM ENTIRE DETECTION ERROR PROBABILITY
This part evaluates the maximum entire detection error probability, ξ max and derives it in closed-form expression. The challenge is that D c 1 depends on |g| 2 . Based on the law of total probability, ξ max is derived as where |f 0 | 2 = ∞ and |f L+1 | 2 = 0. To compute ξ max , the probability in (27) is computed for the case of k = 0 and 1 ≤ k ≤ L as follows.
i) k = 0: For |f 1 | 2 ≤ |g| 2 , Willie can apply SIC to decode s 2 , . . . , s L since R j,1 = log 2 (1+|f j | 2 γ j /(1+|f j | 2 l<j γ l )) and I (s j , y w,j ) = log 2 (1 + |g| 2 γ j /(1 + |g| 2 l<j γ l )), for j > 1, after cancelling s l , l > j, from y w . Hence D c where the Beta function [35] is notated as B(p, q) Since Pr(ν k (|g| 2 ) ≥ |f 1 | 2 , |f k | 2 > |g| 2 ≥ |f k+1 | 2 ) = 0 if ν k (|g| 2 ) ≤ |g| 2 or equivalently φ k ≤ |g| 2 , wherein we obtain where (31) (27), (28) and (31), the maximum entire detection error probability is expressed as where is the probability density function (PDF) of the random variable |g| 2 . Note that, the PDF in (33) will be changed to if Willie has N antennas rather than single one. Remark 3: ξ max → 1 as L → ∞. That is, if L is sufficiently high, then the transmission of u is undetectable. Appendix B provides the proof. This result indicates that the multiplicity of overt users in the scattered NOMA network can be leveraged to hide the covert information. This is under practical consideration in which no uncertainty, such as noise or channel gain, at the warden is required and can be controlled by the transmitter for the case when the warden is a node in the network.
Remark 4: If D c 1 = {1}, namely Willie can decode all information prior to decoding s 1 , which is adopted in the secrecy analysis of NOMA [6], [7], we obtain from (27) that Fig. 2 compares ξ max evaluated using (32) and (34). One can see that the difference is small at low SNR (P/σ 2 n ) but it becomes larger for larger SNR, indicating that the simplifying assumption of D c 1 = {1} is valid only at low SNR. Covertness requirement: if ξ max ≥ 1 − for any > 0, where represents the covertness requirement [14], [17], [33], then a covert transmission is achievable. This means ξ max → 1 at sufficiently small → 0 or the inefficient detection.

VI. DECODING OUTAGE PROBABILITY OF THE COVERT INFORMATION
This part determines the decoding outage probability of the covert information at Bob (covert user) assuming that the covert information is transmitted at a constant rate R u . When u is superimposed onto s 1 , Bob receives the signal as where n b ∼ CN (0, σ 2 n ). Thence, the decoding outage probability of the covert information is computed as Relied on the law of total probability, P o,u reduces to If |f k | 2 > |h| 2 ≥ |f k+1 | 2 , Bob can restore s k+1 , . . . , s L and cancel them from y b to get Then, Bob can achieve the rate to be , 1 ≤ k ≤ L,  Therefore, we obtain where Then, one obtains where the incomplete Beta function [35] is notated as dy wherein B(0; p, q) = 0 and B(1; p, q) = B(p, q) [35]. Fig. 3 shows the decoding outage probability, P o,u , against the transmit SNR P/σ 2 n for distinct values of L. This figure shows that the decoding outage probability P o,u declines when the SNR, P/σ 2 n , accretes and the decline is more significant for smaller L. This is because of the increasing interference of the overt information. Thus, although multiplicity of overt information in NOMA systems can help increasing ξ max (shown in Fig. 4), it will decline the decoding performance at the covert user. Moreover, this figure illustrates the agreement between the simulation and analytical results.

VII. SPECTRAL EFFICIENCY
This part analyzes the covert spectral efficiency, interpreted as the maximum reliable transmission rate (bits/s) over a given bandwidth between Bob and Alice constrained by the covert condition of ξ max ≥ 1 − for some that stands for the covertness requirement. We also analyze the overt spectral efficiency and its loss due to sharing the transmission power for the covert information.

A. COVERT INFORMATION
Because the covert information is successfully decoded on 1 − P o,u transmissions, the average rate received over several transmission bursts is R u (1 − P o,u ) [37]. Then, the covert spectral efficiency (bits/s) is defined as [33] for a positive value wherein B is the bandwidth of a single information when the orthogonal multiple access (OMA) is applied. Note that, when NOMA is applied, the bandwidth of L channels of L overt NOMA information is L × B [2], which is used to compute the spectral efficiency in the objective function of (43). Since ξ max is a declining function of P, the maximum allowed power, denoted as P * , can be found by having ξ max = 1 − . Also since P o,u is a decreasing function of P (shown in Fig. 3), the covert throughput is obtained as by numerical search.

B. OVERT INFORMATION
The average spectral efficiency (bits/s) of the overt information s 1 where the covert information is superimposed onto can be obtained from (24) as where is the probability density function (PDF) of |f 1 | 2 [38]. The loss of overt spectral efficiency, which is computed by the difference of the overt spectral efficiency when the covert information is not sent (φ = 1) and is sent (φ = 1), is given by Note that the loss of overt spectral efficiency indicates the performance degradation of NOMA systems when the covert transmission is embedded. At low SNR (γ 1 1), the loss of overt spectral efficiency can be approximated from ln(1 + x) x for x 1 as which is derived from [38]. One can see that the loss of overt spectral efficiency in (48) increases with a coefficient of L × ln(L) as L increases.

VIII. MULTIPLE ANTENNA AT ALICE
This part considers the case that Alice has M antennas and the NOMA signal is transmitted through one of them. Since merely one RF chain is used, this case reduces energy consumption and hardware complexity without losing the diversity gain [2], [39]. Given that the m-th antenna is selected, let |f m,1 | ≥ |f m,2 | ≥ . . . ≥ |f m,L | notate the ordered channel gain between Alice and L overt receivers. Then based on (26), the maximum entire detection error probability given that the m-th antenna is selected is given by where g m is the channel gain between the m-th antenna of the transmitter and Willie, D 1 (m) is the decoding set of Willie before restoring s 1 and D c 1 (m) is the complement of D 1 (m). Next, we will derive the optimum antenna selection at Alice to minimize the warden's detection probability. Finally, the spectral efficiency and the maximum entire detection error probability will be derived when the optimum antenna selection is applied.

A. OPTIMUM ANTENNA SELECTION FOR ALICE
Selecting the antenna to maximize ξ max (m) yields When |g m | 2 and D c 1 (m) are unavailable at the transmitter, the maximum of (50) is obtained by choosing the antenna that maximizes |f m,1 | 2 . If we let |f m * ,1 | 2 = max l∈ [1,L],m∈ [1,M ] |f m,l | 2 , where m * = arg max m,l |f m,l | 2 , then the resulting maximum entire detection error probability is expressed as 2 where |f m * ,0 | 2 = ∞, |f m * ,L+1 | 2 = 0 and it is assumed that Next, the maximum entire detection error probability at Willie as well as decoding outage probability at Bob when the optimum antenna selection is applied at Alice will be derived.

B. MAXIMUM ENTIRE DETECTION ERROR PROBABILITY
Conditioned on |f m * ,1 | 2 = t, where its PDF is given by [36] the common un-ordered CDF of (M − 1) ordered variables |f m * ,2 | 2 ≥ . . . ≥ |f m * ,L | 2 is given by Then, we have for given |f m * ,1 | 2 = t, It follows from (52), (53), and (54), and the binomial expansion of that, since |g m * | 2 is distributed exponentially with mean σ 2 g , we obtain (56), as shown at the bottom of the next page, for 1 ≤ k ≤ L, and for k = 0, Thence, based on (51), (56), and (57), the maximum entire detection error probability when the optimum antenna selection is applied at Alice is obtained by (58), as shown at the bottom of the next page. When M = 1, (58) reduces to (32).

C. SPECTRAL EFFICIENCY
Substituting |h| 2 with |h m * | 2 and |f k | 2 with |f m * ,k | 2 for 0 ≤ k ≤ L, one obtains from (40) that where the second term of (59) is derived similar to (56). When M = 1, (59) reduces to (42). Therefore, the spectral efficiency of the covert information can be derived from (50), (59) and the optimization problem (43). The spectral efficiency of the overt information with its loss due to the covert transmissions can also be obtained from (45) and (47), respectively, where the PDF of |f 1 | 2 is changed to

IX. NUMERICAL RESULTS
This part presents the numerical results. For comparing the spectral efficiency of covert and overt information for NOMA with L users, we consider the standard bandwidth of B = 200 KHz in GSM. Fig. 4 shows the MEDEP, ξ max , when the total SNR, P/σ 2 n , increases with L, versus L for different values of M . One can see that, as L increases, ξ max increases and converges to 1 even when the total SNR, P/σ 2 n , increases with L. This demonstrates that the multiplicity of overt information (users) in NOMA can be leveraged to conceal the transmission of covert information. One can also see that the larger M , the faster the convergence speed. This is because the degree of freedom of the highest channel gain of the overt message where the covert message is superimposed onto, increases with increasing M , making faster the convergence speed of the MEDEP (as proved in Appendix B). Moreover, this figure illustrates the agreement between the simulation and analytical results. Fig. 5 plots the MEDEP, ξ max , versus L for different number of the warden's antenna. One can see that, as N increases, ξ max decreases significantly. This is because the multi-antenna warden has higher chance to decode the overt messages and, consequently, to detect the covert information. Fig. 6 shows the MEDEP, ξ max , against the total transmit SNR, P/σ 2 n , for distinct values of L. This figure exposes that ξ max decreases as the SNR increases and that the decrease is negligible for large L. One can also see that for L > 1, ξ max converges to a non-zero constant even if the transmission power is very large. This exposes that the interference of many overt information is the helpful source to hide the presence of the covert information. Fig. 7 shows the MEDEP, ξ max , against α, for distinct values of L. One can see that ξ max increases as α increases.   This is because of less power located to the covert information, harder detection at Willie. One can also see that ξ max increases significantly in all ranges of α as L increases.    8 shows the MEDEP, ξ max , against σ 2 g , for distinct values of L. One can see that ξ max decreases dramatically as σ 2 g increases (Willie is located closer to Alice). This is because Willie with better channel has more chance to successfully decode the overt information, hence detect the covert information. Fig. 9 shows the covert spectral efficiency, η u (Kbits/s), against the MEDEP, ξ max , when the SNR is changed. This figure demonstrates that the covert spectral efficiency can be traded with ξ max , i.e. η u decreases as ξ max increases. One can also see that when ξ max is close to 1, which is the region of interest, the covert spectral efficiency can be increased if L is increased. This indicates that the multiplicity of overt information in NOMA can help hiding the covert information. However, for less strict requirements of covertness (small  ξ max ), an increase of L will significantly increase decoding outage of u (as shown in Fig. 3), resulting in a decrease of the covert spectral efficiency, as comparing η u when L = 4 and L = 8 in the Fig. 9. Fig. 10 shows the overt spectral efficiency, η s (Kbits/s), versus the MEDEP, ξ max , as the SNR is varied. One can see that the overt spectral efficiency decreases as ξ max increases, which is because the transmit SNR decreases with increasing ξ max . One can also see that the overt spectral efficiency accretes with accreting L. Fig. 11 shows the loss of overt spectral efficiency, η s,loss (Kbits/s), against the MEDEP, ξ max , when the SNR is changed. This figure illustrates that, to provide a certain covert spectral efficiency, the loss of overt spectral efficiency is significantly high. For example, to provide about 7 (Kbits/s) of the covert spectral efficiency (in Fig. 9 at ξ max = 0.9 and L = 2), the 100 (Kbits/s) loss of overt spectral efficiency is required (in Fig. 11 at ξ max = 0.9 and L = 2).  One can also see that the loss is more significant for larger L or less strict ξ max . Fig. 12 shows the covert spectral efficiency (Kbits/s) versus M , as the SNR is varied. One can see that the covert spectral efficiency accretes as the number of transmit antennas M accretes and that the larger L, the more considerable the increase. One can also see that, when M is less than a threshold (at 2 and 4 for L = 4 and L = 8, respectively) or the number of transmit antenna is not sufficiently high, the covert spectral efficiency goes to zero. Fig. 13 shows the covert spectral efficiency (Kbits/s) versus σ 2 g , as the SNR is varied. This figure demonstrates that the covert spectral efficiency declines drastically to zero, which the covert communication may not be achieved, when σ 2 g accretes (Willie is located closer to Alice) regardless of number of overt information L. This is because Willie easily decodes all overt information, hence detect the covert information, when located close to Alice (as seen in Fig. 8).

X. CONCLUSION AND FUTURE WORK
This paper investigated hiding a covert information under other information in NOMA systems. The optimum detection method at the warden, which minimizes the entire detection error probability, and the optimum superimposition method at the transmitter, which maximizes the minimum total detection error probability, were proposed. Numerous results proved the increase and the convergence to 1 of the entire detection error probability with accreting the number of users (information). We also found that the covert spectral efficiency can be increased if the number of overt users is increased while guaranteeing the entire detection error probability close to 1. The covert performance is much better if the transmitter has multi-antennas. These results showed the practical application of the proposed scheme, in which the transmitter can control and design the covert transmission by adjusting the NOMA network parameters for the case when the warden is a node within the network. For the future work, a complicated NOMA network, such as clustering or beamforming with multi-antenna at the transmitter and the overt users should be considered to characterize the improvement of the current work. While the proposed scheme showed its promising to apply in the real networks, it has not also considered the multi-antenna or arbitrary location of the warden, which should be considered in the future work. After all, designing a hidden network or covert transmissions under existing networks is a difficult task and many other existing networks, such as cognitive radio or massive MIMO, are also interesting work for the study of hiding covert information.

APPENDIX A
This Appendix provides the proof of (7) and (8). Based on (4) and (6) Let x = e jθ l∈D c s l , g = |g|e jθ and y = y w −g l∈D s l . Because s l , l ∈ D c , is an i.i.d circular symmetric complex Gaussian vector (cscGv), the PDF of x is a cscGv with mean 0 and common variance l∈D c P l for any θ. Then, one obtains To complete the derivation of (62), one needs to solve the last integral. Towards this end, one needs to solve the following integral as where y and x are real, σ 2 0 = σ 2 n + |g| 2 l∈D c P l . Hence, we obtain where Im{x} and Re{x} notate the imaginary and real parts of x, respectively. Then, we obtain from (62)  Similarly, since S = (S ∩ D) ∪ (S ∩ D c ) and S c = (S c ∩ D) ∪ (S c ∩ D c ), we obtain from (5) and (6)  where σ 2 1 = σ 2 n + |g| 2 (φ l∈S∩D c P l + (1 − φ) l∈S P l + l∈S c ∩D c P l ).