Joint Pilot and Payload Power Allocation for Massive-MIMO-Enabled URLLC IIoT Networks

The Fourth Industrial Revolution (Industrial 4.0) is coming, and this revolution will fundamentally enhance the way factories manufacture products. The conventional wired lines connecting central controller to robots or actuators will be replaced by wireless communication networks due to its low cost of maintenance and high deployment flexibility. However, some critical industrial applications require ultra-high reliability and low latency communication (URLLC). In this paper, we advocate the adoption of massive multiple-input multiple output (MIMO) to support the wireless transmission for industrial applications as it can provide deterministic communications similar as wired lines thanks to its channel hardening effects. To reduce the latency, the channel blocklength for packet transmission is finite, which incurs transmission rate degradation and decoding error probability. Thus, conventional resource allocation for massive MIMO transmission based on Shannon capacity assuming the infinite channel blocklength is no longer optimal. We first derive the closed-form expression of lower bound (LB) of achievable uplink data rate for massive MIMO system with imperfect channel state information (CSI) for both maximum-ratio combining (MRC) and zero-forcing (ZF) receivers. Then, we propose novel low complexity algorithms to solve the achievable data rate maximization problems by jointly optimizing the pilot and payload transmission power for both MRC and ZF. Simulation results confirm the rapid convergence speed and performance advantage over the existing benchmark algorithms.


I. INTRODUCTION
Industry 4.0 has been envisioned as the future paradigm for the next generation of industrial systems, which integrates advanced manufacturing functions with the industrial internet-of-things (IIoT) to create a more intelligent and automatic digital manufacturing system [1].Traditionally, industrial control systems mainly rely on wired connections such as cables or optical fiber, since the current wireless networks cannot meet their stringent latency and reliability requirements.
However, there are some drawbacks to deploying wired lines.First, significant cost will be incurred by the installation and maintenance.Second, wired lines are vulnerable to wear and tear in motion control applications, and suffer from aging.Finally, they cannot be deployed in some harsh environments, such as those with high temperatures and rotating part.Hence, to make Industry 4.0 a reality, it is imperative to design wireless networks tailored for industrial applications to replace the traditional wired lines.Typical industrial applications require deterministic communications with ultra reliability (1 − 10 −9 ) and low latency (1 ms), such as factory automation (FA) [2], power system protection (PSP), and power electronics control (PEC) [3].Significant research efforts have been devoted to the design of wireless communications in industrial applications.However, most of the existing works mainly focused on the adaption of the upper layers of conventional wireless networks to achieve deterministic communications, while keeping the physical layer untouched.Some related standards are WirelessHART, wireless interface for sensors and actuators (WISA), and Wireless Networks for Industrial Automation/Process Automation (WIA-PA) [4].Although keeping the wireless standards of physical layers can allow faster design and better compatibility, it leads to a fundamental bottleneck for the system performance.None of the above standards can meet the stringent demand requested by the most critical FA, PSP and PEC applications.As a result, more efforts should be devoted to the design from the physical layer perspective of view.
From the physical layer perspective, the dominate feature of the industrial applications is that the packet transmission should be completed within short blocklength due to low latency requirement [5].Hence, the transmission is not error-free with any finite/short blocklength channel codes.In this case, Shannon capacity formula is not applicable since it is based on the principles of the law of large numbers, and we need to design the resource allocation by considering the decoding error probability requirement.In [6], Peter et al. have derived the approximation formula of the maximum achievable data rate with finite blocklength transmission, which characterises the complicated relationships among decoding error probability, channel blocklength, and signal-to-noise ratio (SNR).Unlike Shannon capacity formula, the data rate expression under short packet transmission is neither convex nor concave with respect to the SNR or blocklength [7].As a result, the optimal resource allocation under this formula is difficult to obtain.
Recently, there are increasing research studying the transmission design based on the short packet transmission capacity formula [8]- [14].In specific, the effective throughput maximization was studied in [8] for a two-device downlink non-orthogonal multiple access (NOMA) system.
The overall error probability is minimized in [9] for a simultaneous wireless information and power transfer (SWIPT)-enabled decode-and-forward (DF) relaying network.The decoding error probability minimization was investigated in [10] for a unmanned aerial vehicle (UAV)-enabled DF relay system.We recently proposed a low-complexity power and blocklength optimization algorithm for both orthogonal multiple access (OMA) and NOMA in a two-hop relay system in [11].However, all these contributions are limited to a simple scenario with two devices, where one device acts as a relay and the other as the destination node.In industrial applications, the central controller needs to support a large number of devices [4].On one hand, Chen et al. in [12] investigated the effective capacity maximization problem for wireless-powered IoT network with multiple devices operating in a time division multiple access (TDMA) mode.However, the time budget is already tight, and the portion allocated to each device will be marginal.Hence, TDMA strategy is not suitable for the applications in industrial applications with extremely stringent latency target.On the other hand, the authors in [13] [14] studied the resource allocation for multiple devices operating under the orthogonal frequency division multiple access (OFDMA) mode.Unfortunately, this requires huge amount of system bandwidth, which is not feasible as some industrial applications operate over unlicensed spectrum [15].By equipping a large number of antennas at the base station (BS), massive multiple-input multiple-output (MIMO) has been widely regarded as the key enabler for the creation of the fifth generation (5G) wireless networks [16].By exploiting excessive number of spatial degrees of freedom, massive MIMO is capable of supporting multiple devices simultaneously without additional time or frequency resources.In addition, due to the channel hardening effect, massive MIMO is more immune to the fast fading and can provide deterministic communications required by the industrial applications.Due to these attractive advantages, massive MIMO is ideal for supporting industrial applications with stringent quality of services (QoS) requirements.However, most of the existing literature adopted Shannon capacity as the performance metric to optimize the resource allocation [17]- [19], which implicitly assumes the infinite channel blocklength.
Therefore, conventional resource allocation solution based on Shannon capacity is not optimal for industrial applications with short channel blocklength.To the best of our knowledge, we are the first to study the resource allocation for massive MIMO providing ultra-reliability and low-latency communications (URLLC) for any number of devices.Specifically, our contributions are summarized as follows: 1) We derive the closed-form lower bounds (LBs) on the achievable rates for a uplink massive MIMO system by considering the imperfect channel state information (CSI) with finite channel blocklength, and assuming both maximum-ratio combining (MRC) and zeroforcing (ZF) receivers.They can be regarded as the conventional Shannon capacity minus a penalty term due to short packet transmission.Simulation results confirm the tightness of the LBs.Given fixed delay budget, we formulate an optimization problem with the objective to maximize the weighted sum rate by jointly optimizing the pilot and payload transmission power subject to the decoding error probability, the minimum data rates, and the energy constraints for all URLLC devices.
2) For the case with MRC receiver, the formulated optimization problem is non-convex due to the complicated expression of data rate LBs, and it is difficult to find the globally optimal solution.To deal with this issue, we first approximate the penalty term in the data LB as a log-function, which can facilitate the transformation from the original optimization problem to a series of geometric programs (GPs).Each GP problem can be efficiently solved with polynomial time.Besides, we provide a novel method to check the feasibility of the original problem, and provide both complexity and convergence analysis.
3) For the case with ZF receiver, the data rate LB is more complicated and the algorithm proposed for MRC cannot be directly applied since the nominator of the signal-to-interferenceplus-noise (SINR) is a posynomial function.To handle this issue, we approximate the polynomial functions with their best local monomial approximations and then transform the optimization into a series of GPs with low-complexity.Convergence analysis tailored for the ZF receiver is further provided.4) Simulation results show that our proposed algorithms converge rapidly, which verifies the low-complexity of the proposed algorithm.In addition, it is also shown that the proposed algorithm outperforms the benchmark schemes, especially the ones adopting Shannon capacity as the optimization performance metric, which emphasizes the importance of using short packet transmission theory.
The remainder of this paper is organized as follows.In Section II, system model and problem formulation are provided.In Section III, we provide a low-complexity algorithm for joint design of pilot and payload power allocation for the MRC case.The ZF case is studied in Section IV.
Then, simulation results and analysis are presented in Section V.The final conclusion is drawn in Section VI.

A. Factory System Model
Consider a uplink multi-device Massive MIMO communication in one factory as shown in Fig. 1, where the central controller (CC) serves K devices, e.g., actuator, robot, etc.The devices need to send their emergency information of URLLC requirements such as measured data or their current operation states to the CC.Thus, the CC can process these data information immediately and provide prompt response/feedback.For simplicity, we focus on the uplink transmission and the solutions for downlink can be similarly derived.The set of the devices is denoted as K = {1, 2, • • • , K}.The CC is equipped with M antennas and each device is equipped with single antenna due to their low signal processing capability, where M K. Let us denote as the channel vector from the CC to the k-th device and can be decomposed as where α k denotes the large-scale channel gain that includes the pathloss and shadowing, and hk denotes the small-scale fading following the distribution of CN (0, I).Let us denote H ∈ C M ×K as the channel matrix from the K devices to the CC, with The K devices need to transmit K packets to the controller.Then, the M × 1 received signal vector at the CC is given by y where p d k is the payload power of the kth device, s k is the zero mean and unit variance Gaussian information message from the kth device, and n ∼ CN (0, I M ) is the additive noise during the data transmission, where the variance of each element is normalized to unit.[20] brought by massive MIMO, the system is more immune to the fast fading, which can provide high reliable services for the devices.However, to reap the benefits brought by massive MIMO, the CSI should be available at the CC.Furthermore, TDD mode is always taken as an enabler for massive MIMO systems since downlink instantaneous CSI is obtained by estimating uplink CSI based on channel reciprocity [21].In addition, for machine type communications, the devices may not be able to perform complicated signal processing tasks required in frequency division duplexing (FDD) systems, such as channel estimation calculation, quantization, etc.More time slots are needed for CSI feedback.Hence, the TDD protocol is adopted in this paper.All the devices should be allocated with orthogonal pilot resources so that the CC is able to distinguish the channels from different devices, thus, the number of symbols for channel estimation should be no smaller than the number of devices [20].
In a massive MIMO URLLC scenario, each block mainly consists of two parts: 1) l p symbols for channel estimation (the pilot sequence is of length l p ); 2) l d symbols for the K devices' data transmission, thus the total number of symbols of the frame is denoted as L = l p + l d .As the symbols used for data transmission is already limited, we assume that K devices sharing the same symbol duration and the frame structure is illustrated in Fig. 2. Accordingly, the time durations for channel estimation and data transmission in one frame are given by t p = l p /B and t d = l d /B, respectively, where B is the bandwidth of the system.In the training phase, all devices simultaneously and synchronously transmit orthogonal pilot sequences q 1 , • • • , q lp ∈ C lp×1 to the CC, with q H k q k = 1 and q H i q j = 0, i = j.Hence, the minimum length of the pilot sequences to guarantee the orthogonality is equal to l p = K.Based on the received signal, the CC estimates the channel conditions of all devices, and the received pilot signal at the CC is where p p k is the pilot transmit power at the kth device, and N ∈ C M ×K is the additive Gaussian noise matrix received during the training phase, whose elements are independently generated and follow the distribution of CN (0, 1).To obtain channel h k , the CC first multiplies Y p by 1 √ Kp p k q k , which yields where Since q k is a unit-norm vector, it is easy to show that n p k is still Gaussian distribution whose elements are independently and identically distributed as CN (0, 1 which follows the distribution of CN (0, σ k I) with σ k given by According to the property of MMSE estimation, channel estimation error hk = h k − ĥk is independent of ĥk , and follows the distribution of CN (0, δ k I M ), where δ k is given by

C. Achievable Data Rate for Massive MIMO URRLC
By taking into account the number of symbols for pilot transmission, to achieve the decoding error probability of ε k for the kth device, the instantaneous achievable data rate R k can be accurately approximated by [22] R where β is equal to β = K/L, γ k is the signal to interference plus noise ratio (SINR) of the As seen from ( 7), when the blocklength L approaches infinity, the data rate R k will approach (1 − β) log 2 (1 + γ k ), which is the classic Shannon capacity.The second term in (7) can be interpreted as a penalty on the rate in order to guarantee the decoding error probability ε k .
In the following, we derive the expression of γ k for two different low-complexity detection schemes: 1) maximum-ratio combining (MRC); 2) zero-forcing (ZF).
Define the estimated channels as Let A be an M × K linear detection matrix that is based on the estimated channel H.By using the linear detection A, the received signal can be processed as Two conventional low-complexity linear detectors are considered: Then, the processed signal after using the detector is given by where the last equality is obtained by using h k = ĥk + hk .The detection signal for the kth device is given by where a k is the kth column of matrix A. Since H and H are independent, a k is also independent of H.The CC will treat the estimated channel as the true channel, and the last three terms of (11) are regarded as interference and noise.Then, the SINR for the kth device γ k is given by Remark: Massive MIMO can offer the channel hardening effect, where the channel variations between different channel fading blocks can be averaged out and the achievable data rate for these fading blocks mainly depend on the large-scale fading, which changes very slowly.As a result, we first derive the lower bound (LB) for the achievable data rate as a function of large-scale channel fading parameters, and then optimize the resource allocation based on the large-scale fading information rather than the small-scale fading, which can significantly reduce the computational delay that is beneficial for URLLC applications.In other words, when the large-scale fading parameters of all devices are given, we can use our developed algorithm to find the optimal power allocation, which can be used for consecutive channel fading blocks.
The algorithms are needed to be rerun only when the large-scale channel fading parameters have changed, which vary much slowly compared with the small-scale channel fading.To make it more clear, a block diagram scheme is given in Fig. 3, where the large-scale channel gains of any devices vary at t = t Ls 0 and t = t Ls 1 .In general, the channel coherence time is much longer than the packet transmission time as we consider the URLLC services, and the time difference for large-scale fading (t Ls 1 − t Ls 0 ) is much larger than the channel coherence time.The power allocation obtained at time t = t Ls 0 can be employed for the subsequent transmissions until t = t Ls 1 , which significantly reduce the computational time.Please note in our scheme, the channel estimation should be performed at the beginning of each channel coherence since the decoding requires the channel state information as shown in (9), while the power allocation needs to be updated once the large-scale fading gains change.
Due to the channel hardening effect, in this paper we focus on the ergodic achievable data rate that is defined as Rk = E {R k }, where the expectation is taken over the randomness of ĥk , hk , ∀k .Unfortunately, the exact average achievable data rate Rk with channel uncertainty is not available.In the following, we aim to derive the closed-form expression of the LB of the rate expression, which is more tractable to analyse and optimize.To this end, we first define function f (x) as where a is a fixed positive value.In the following, we derive the feasible region of function The first-order derivative of g(x) with respect to x is given by g where the first inequality follows by using the relation ln 1 + 1 x < 1 x .Hence, g(x) is a mono-tonically decreasing function of x.In addition, lim x→0 g(x) = ∞ and lim x→∞ g(x) = 0, where the latter equation is obtained by using the L'Hospital's rule.Hence, from ( 14), we know that the feasible region of f (x) is given by Then, we have the following lemma.
Lemma 1: Function f (x) defined in ( 13) is decreasing and convex for x ∈ Ω.
Proof : Please refer to Appendix B in our recent work [23].
Based on Lemma 1, we are able to derive the LB of Rk in the following.The instantaneous data rate R k in ( 7) can be written as follows: where function f k (•) is in the same format as f (•) in (13), where the parameter a is In addition, we need to guarantee that R k ≥ 0, and thus γ k ≥ 1/g −1 (a k ).By using lemma 1 and Jensen's inequality, we obtain the following LB on the ergodic data rate: In the following theorem, we derive the expression of Rk for each beamforming solution.
Theorem 1: The ergodic achievable rate for the kth device for MRC in finite blocklenghth regime can be lower bounded by: where γk is given by Proof : Please refer to Appendix A.
For the ZF detection, the LB of the ergodic data rate is given by the following theorem.
Theorem 2: The ergodic achievable rate for the kth device for ZF in finite blocklength regime is lower bounded by: where γk is given by Proof : Please refer to Appendix B.
In the proof of Theorem 1 and Theorem 2, we notice that the above derived LB of the ergodic data rates are valid for any number of antennas (M > K for the ZF).However, the gap between the LB and the actual ergodic data rate is reduced when the number of antennas is large, which is the case for massive MIMO system.These LBs are commonly used in the literature concerning massive MIMO systems.Hence, we use these lower bounds throughout this paper.
In the following, we aim to optimize the pilot power p p k and payload power p d k to maximize the weighted sum rate.The optimization is performed only when any large-scale fading parameter changes.The simulation results in Section V also verify the tightness of the derived LBs.Hence, the optimization solutions are applicable for a large time scale, which is appealing for URLLC applications.

D. Problem Formulation
In this paper, we jointly optimize the power allocation for pilot and data transmission of each device for maximizing the weighted sum rate of all devices.By using Theorem 1, the weighted sum rate maximization problem can be formulated as where Rk and γk are given in ( 20) and ( 21) for MRC and ( 22) and ( 23) for ZF, respectively, w k is the weight of device k used to guarantee the fairness among the devices, constraint (24b) denotes the minimum data rate requirement for the k-th device, constraint (24c) means the energy constraint for each device.
Power control for weighted sum rate problem with interference is well known to be an NP-hard problem even under perfect CSI [24].It becomes more complicated for the more general case with imperfect CSI and finite blocklength.In this paper, we aim for designing efficient algorithms with polynomial-time complexity to solve the weighted sum rate problem with imperfect CSI.
To this end, we first simplify the problem formulation in (24).The first-order derivative of Rk w.r.t.γk is given by R ≥ 0 1 .Hence, constraint (24b) can be transformed 1 Since Rk ≥ R req k > 0, the feasible γk must lie in the range of Ω = γk | 0 < 1/γ k ≤ g −1 (a k ) .Hence, Lemma 1 holds and we have f k To additionally simplify the problem formulation in (24), we introduce auxiliary variables χ k , ∀k, and then Problem ( 24) can be equivalently transformed as follows where wk = (1−β)w k ln 2 and The equivalence between Problem (26) and Problem ( 24) can be readily proved by using the contradiction method.Problem (26) and Problem (24) are equivalent in the sense that they have the same power allocation solutions and the same objective function (OF) value.Hence, in the following, we focus on solving Problem (26).
Due to different SINR expressions, in the following two sections, we optimize the power allocation for MRC and ZF, respectively.

III. WEIGHTED SUM DATA RATE FOR MRC
In this section, we aim to deal with weighted sum rate maximization problem for MRC.

A. Algorithm Design
The complicated function G(χ k ) in (26a) makes the optimization problem difficult to solve.
To resolve this issue, we first study several properties of this function.
Lemma 2: G(x) is a concave function of x.
Proof : Please refer to Appendix C.
For the URLLC applications, the decoding error probability ε k for each device is much smaller than 0.5 so that a k is a strictly positive value, and the OF of Problem ( 26) is to maximize the difference of two concave functions.However, due to the non-convex constraints in Problem (26).This problem does not belong to the class of difference of convex (DC) problem.In addition, due to the additional term of a k G(χ k ) in the OF of Problem (26), this problem cannot be solved by dealing with a sequence of geometric programs (GPs) as in [25], which considered Shannon capacity formula under the infinite blocklength regime.The intuitive method to solve Problem (26) is to approximate function G(χ k ) as its first-order Taylor series expansion and solve the approximated problem until convergence.However, the approximation is in linear form while the first term in the OF is in log-function form.The successive GP method in [25] is still not applicable.To deal with this issue, we approximate function G(x) as a log-function as shown in the following lemma., the following inequality holds: where ρ and η are given by and In addition, we have: which means that the approximation F (x) is tight at x = x.
Proof : Please refer to Appendix D.
According to (16), the optimization variable γk should be no smaller than 1/g −1 (a k ).Note that ) is a decreasing function of the decoding error probability and blocklength, while an increasing function of the number of device.Hence, for typical FA cases [26], where the typical required error probability is lower than 10 −8 , the available channel blocklength is smaller than 200, and the number of devices that should be supported is larger than 5, 1/g −1 (a k ) is larger than , which implies that Lemma 3 is applicable for our considered optimization problem in (26).
In Fig. 2, we compare the approximation accuracy of function F (x) and the linear approximation function S(x) defined as follows: where the inequality holds since G(x) is a concave function proved in Lemma 2. It can be observed from Fig. 4 that the approximation function F (x) is more accurate than the linear function S(x) over the whole region of x at different points of x.We can also find that the curve of the log-function F (x) is always above the curve of the linear function S(x), which verifies the correctness of the theoretical conclusion in Lemma 3. The most important advantage of approximating G(x) as the log-function F (x) is that we can transform the original problem into a GP problems that enables us to find the optimal solution.
In the following lemma, we also provide the LB of ln(1 + χ k ), which enables us to develop low-complexity algorithms.
Lemma 4: For any given x ≥ 0, function ln(1 + x) is lower bounded by where ρ and η are given by In addition, the bound is tight at x = x.
Proof : The proof is similar to those in Lemma 3, and thus omitted for simplicity.
Based on Lemma 3 and Lemma 4, we are now ready to solve Problem (26).The main idea is to approximate the OF of Problem (26) as the approximated functions provided in Lemma 3 and Lemma 4, and then solve the approximate problem in an iterative manner.In the following, we provide the details of the iterative algorithm.
First, we denote k , ∀k} as the power allocation in the i-th iteration, and the corresponding χ k is given by χ k are obtained from ( 28) and ( 29) respectively with x = χ (i) k .In addition, we can approximate ln(1 + χ k ) around χ k .By substituting these approximations into (26a) and recalling that a k is a positive value, we can obtain the LB of the OF by using Lemma 3 and Lemma 4 as follows where the equality holds only when Next, we optimize the power allocation to maximize the LB of the OF instead of maximizing (26a) directly.In specific, the LB maximization problem to be solved in the i + 1-th iteration is formulated as 25), (24c), where k and the constant term in the OF is omitted.Then, we can transform the above optimization into a GP problem as follows: Although GP is not a convex optimization problem, it can be equivalently transformed into a convex optimization problem by applying a logarithmic change of variables.Therefore, the globally optimal solution of Problem ( 36) can be obtained by using the interior-point method [27].Some software packages that can solve GP problem are MOSEK package and CVX [28].
Based on the above discussion, the iterative algorithm to solve Problem ( 26) is given in Algorithm 1.

B. Algorithm Analysis 1) Initialization of Algorithm 1: As shown in
Step 1 of Algorithm 1, one has to find a feasible initial power allocation in order to make the algorithm work.Note that randomly selecting a set of power allocation solutions that satisfy the per-device energy constraints may not satisfy their minimum SINR requirements.Hence, one has to carefully choose the initial power allocation.In the following, we provide one alternative method to find the initial power allocation solutions.
Inspired by the user selection problem formulation in [29], [30], we construct the following alternative optimization problem by introducing an auxiliary variable ϕ: Obviously, Problem (37) is always feasible since at least {ϕ = 0, p p k = 0, p d k = 0, ∀k} is a feasible solution.It can be readily verified that the original Problem ( 24) is feasible if the optimal ϕ ≥ 1, and the output power allocation can be adopted as the initial input for Algorithm 1.In this paper, we assume that Problem ( 24) is always feasible and the optimal ϕ in Problem ( 37) is always no smaller than one.Problem (37) can also be transformed into a GP problem, where the globally optimal solution can be obtained.The details are omitted here due to the limited space.

2) Convergence Analysis:
In this part, we analyze the convergence of Algorithm 1.In the following, we show that Obj (i) ≤ Obj (i+1) .

Since {χ (i+1) k
, ∀k} is the optimal solution of Problem (35) in the i + 1-th iteration, we have By using inequality (34) with , we have By combining (38) and (39), we have In addition, since each device has its own energy constraint, the OF value of Problem (24) has upper bound.As a result, Algorithm 1 is guaranteed to converge.
3) Solution Analysis: Since the original problem ( 24) is non-convex, the globally optimal solution cannot be obtained in general.However, by using the similar proof as in Appendix B in [30], we can prove that Algorithm 1 can converge to the Karush-Kuhn-Tucker (KKT) point of Problem (24).The converged solution only depends on the initial input of Algorithm 1.
Simulation results show that the algorithm almost converges to the same solution with different initial input solutions.

4) Complexity Analysis:
The main complexity mainly lies in solving a GP problem in each iteration.In [31], the authors claimed that the GP problem can be efficiently solved by using the standard interior point methods with a worst-case polynomial-time complexity.The upper bound of the total number of Newton steps in the interior point method does not depend on the number of variables, or the number of constraints.The derived upper bound shows that the barrier method converges linearly.By carefully choosing the parameters, the bound on the number of Newton steps can grow as √ m instead of m, where m is the number of constraints [32].In addition, simulation results show that Algorithm 1 converges rapidly, which means that Algorithm 1 can converge to a local optimal solution with a polynomial time complexity.

IV. WEIGHTED SUM DATA RATE FOR ZF
In this section, we aim to deal with weighted sum rate maximization problem for ZF.Due to different expressions of the SINR of MRC and ZF, some derivations in MRC cannot be directly applied in the ZF case.In the following, we develop an efficient algorithm to solve Problem (26) for the ZF case.

A. Algorithm Design
By substituting ( 5) and ( 6) into (23), the SINR in the ZF case can be reformulated as Unfortunately, since the nominator of γk in (41) is a posynomial function, the SINR constraint in (26b) cannot be transformed into the format as that in (36b), in which the right hand side is a monomial function.Hence, the problem cannot be directly transformed into a GP problem.To solve this issue, we introduce Theorem 3 as follows.
Theorem 3: For any given vector where λ and τ i , ∀i are given by and x is given by In addition, we have: where ∇W (x) and ∇Y (x) denote the gradient of function W (•) and Y (•) w.r.t.x, respectively.
Proof : Please refer to Appendix E.
Based on Theorem 3, we replace the polynomial functions in the nominator of γk in (41) with their best local monomial approximations by employing Theorem 3 2 .In specific, we denote , ∀k} as the power allocation in the n-th iteration, and the corresponding χ k is given by χ Then, in the n + 1-th iteration, we approximate the term i∈K (1 + α i Kp p i ) in the nominator of (41) by its best local monomial approximations, which is given by function where λ (n) and τ , ∀i.Then, we focus on the following constraint instead of the original SINR constraint in (26b): Note that the left hand side (LHS) of ( 46) is a posynomial function, and the right hand side (RHS) becomes a monomial function.In addition, the LHS is no larger than RHS.Hence, constraint (46) satisfies the conditions for a problem to be a GP problem [27].
Then, by using the same method as in the MRC case to deal with the OF, in the n + 1-th iteration, we aim to solve the following GP problem: where the parameters ŵ(n) k 's are the same as those in the MRC case.This problem can be efficiently solved by using CVX [28].
Based on the above discussion, the iterative algorithm to solve Problem (26) for the case of ZF is provided in Algorithm 2.
Algorithm 2 Iterative algorithm for solving Problem (26) for ZF 1: Initialize iteration number n = 1, error tolerance ξ.Initialize a feasible power allocation {p k , ∀k}, and calculate the OF of Problem (26), denoted as Obj (0) .2: With given {χ ∀k}, solve Problem (47) by using the CVX package to obtain {p

B. Algorithm Analysis
Algorithm 2 can be analyzed similar to Algorithm 1 except the convergence analysis since the problems to be solved in each iteration of Algorithm 2 do not have the same set of constraints.
In the following, we prove that the solution obtained in the n-th iteration is also feasible for the problem to be solved in the n + 1-th iteration.We only need to check constraint (46) since the other two constraints are the same in each iteration.
Let us denote {χ , ∀k} as the optimal solution in the n-th iteration.Then, it is also a feasible solution, and we have By using (45), we have In addition, by using (44) in Theorem 3 and (49), we have Finally, by combining (46) and (50), we have , ∀k} is also a feasible solution in the n + 1-th iteration.Then, by using the similar proof as in the case of MRC, we can also prove that Algorithm 2 is guaranteed to converge.By using a similar method, we also prove that this algorithm will converge to a feasible solution of Problem (26).

V. SIMULATION RESULTS
In this section, we provide simulation results to demonstrate the effectiveness of our proposed algorithms for industrial automation systems.The channel path loss is modeled as P L = 35.3+ 37.6log 10 d (dB) [34], and the small-scale fading is modeled as Rayleigh fading with zero mean and unit variance.Unless otherwise specified, the simulation parameters are set as follows:

A. Tightness of the Date Rate LB
In Fig. 5 and Fig. 6, we investigates the tightness of the LB derived for the cases of MRC and ZF, respectively.The simulation results are obtained through the Monte-Carlo simulation by averaging over 5000 random channel generations.It is observed that the data rate LB is tight for both cases of MRC and ZF for any number of transmit antennas.Interestingly, the curves for the ZF case are almost overlapped with each other.This verifies that the data rate LBs derived in Theorem 1 and Theorem 2 are suitable for optimization instead of directly optimizing the complicated expectation expression.

B. Convergence Behaviour of the Proposed Algorithms
Fig. 7 and Fig. 8 investigate the impact of the energy limit at each device on the performance of the proposed Algorithm 1 for the MRC case and Algorithm 2 for the ZF case, respectively.
From these two figures, we can observe that both algorithms converge rapidly for various energy limits and only 2 or 3 iterations are sufficient for the algorithms to converge.This demonstrates the low complexity of the proposed algorithms.

C. Performance Comparison
In this subsection, we compare the proposed algorithms with the following algorithms: • Upper bound: In this method, Shannon capacity is adopted for optimization in Problem (26).In other words, the penalty term a k G(χ k ) is set to zero in both the OF and rate constraint (24b), i.e., a k G(χ k ) = 0, ∀k.This method provides the upper bound of the average weighted sum rate performance in the IIoT networks.
• Conventional alg.: As in [35], the solution obtained from the upper bound is applied in (20) and (22) for the cases of MRC and ZF respectively by considering the penalty term a k G(χ k ).That means that the upper bound is used for obtaining solutions, but the achievable data rate under finite blocklength is used for performance evaluation.
• Fixed pilot power alg.: In this scheme, we only optimize the payload power p d k while fixing the pilot power as p p k = E/L.This algorithm is provided to show the benefits of jointly optimizing the pilot power and payload power.
The following results are obtained by averaging over 100 Monte-Carlo simulations where in each snapshot the devices are randomly generated in the cell.For each snapshot, if the device's achievable data rate cannot achieve its rate targets, we set the corresponding data rate to zero.The proposed algorithm is denoted as 'Proposed alg.' in the following figures.
In Fig. 9 and Fig. 10, we show the average weighted sum rate versus the energy limit at each device, E. The rate targets for MRC and ZF are set as R req = 1 bit/s/Hz and R req = 4 bit/s/Hz, respectively.As seen from these figures, the system performance increases with the available energy at each device since the SINR at each device is increased.Note that the weighted sum rate achieved by some algorithms may approach zero, which means the power allocation solution is infeasible.As expected, the upper bound has the best performance since the penalty terms are not considered.Furthermore, the 'conventional alg.' based on Shannon capacity formula has higher probability to violate the data rate requirement, especially for samll E. Therefore, Shannon capacity cannot be employed for the transmission design of URLLC for industrial applications, in particular when the energy limit is small as the QoS target cannot be guaranteed.However, for the large value of E, the SINR value for each device is very high and then the penalty term a k G(χ k ) can be ignored when comparing with the first term of ln(1 + χ k ).Hence, the 'conventional alg.' will achieve similar performance as that of the proposed algorithm.As shown in Fig. 9 and Fig. 10, by jointly optimizing the payload power and pilot power, the proposed is superior over the 'Fixed pilot power alg.', which only optimizes the payload power.
The performance gain is obvious when the energy limit is low, especially for the case of ZF.This means the need of jointly optimizing pilot and payload power at low energy limit.This can be explained as follows.When the energy limit is low, the system performance is limited by channel estimation procedure.Through the joint power allocation, some power can be borrowed from that for data transmission to enhance the channel estimation accuracy, and thus increases the weighted sum rate.However, in the high energy E, the accuracy of channel estimation is already enough, and additional joint power control bring marginal performance gain.Another interesting observation is that the performance gain of the proposed algorithm over 'Fixed pilot power alg.' is much smaller for the MRC case than the ZF case.This may be due to the fact    that the CSI is more important when using it for removing multiple device interference in the ZF case.
Fig. 11 and Fig. 12 show the average weighted sum rate versus the number of devices for various schemes.The rate targets for MRC and ZF are set as R req = 1 bit/s/Hz and R req = 2 bit/s/Hz, respectively.The energy limit for MRC and ZF are set as E = 2 and E = 1, respectively.
It is observed from Fig. 11 and Fig. 12 that the average weighted sum rate achieved by all the algorithms except the 'Conventional alg.' increases with the number of devices since these schemes fully exploit the multi-device diversity.For the MRC case, the performance of the 'Conventional alg.' decreases with the number of devices.The main reason is that the design is based on Shannon capacity formula, which does not take into consideration the effect of short blocklength on the achievable data rate.As a result, the probability that the achieved data rate violates the rate requirement will increase with the number of devices.It is interesting to observe that 'Fixed pilot power alg.' has similar performance as the proposed algorithm for the case of MF.However, for the case of ZF, the proposed algorithm significantly outperforms 'Fixed pilot power alg.', and the performance gain is increasing with the number of devices.
This again reveals the importance of joint power optimization in the case of ZF.For the ZF case, the weighted sum rate achieved by 'Conventional alg.' first increases with the number of devices and then decreases with it.
Finally, we study the effect of blocklength on the weighted sum rate performance in Fig. 13 and Fig. 14 for the cases of MRC and ZF, respectively.The rate targets for MRC and ZF are set as R req = 2 bit/s/Hz and R req = 4 bit/s/Hz, respectively.As expected, the system performance increases with the channel blocklength since more time/frequency resource can be exploited for transmission.Some interesting observations can be found in these figures.First, when the blocklength is small, there is a significant performance gap between the proposed algorithm and the upper bound.However, this gap starts to reduce with the increase of the blocklength.This demonstrates that the blocklength has significant impact for IIoT devices with URLLC requirements.In addition, the proposed algorithm outperforms the 'Fixed pilot power alg.' algorithm for both cases of MRC and ZF.This may be due to the fact that higher rate target is imposed in this example than that in Fig. 11.

VI. CONCLUSIONS
In this paper, we studied the resource allocation for uplink massive MIMO systems to support critical IIoT operating under finite channel blocklength, where multiple robots and/or actuators transmit URLLC signals to the central controller simultaneously.We first derived the closedform data rate LB with imperfect CSI for both MRC and ZF receivers under the short packet transmission.We then formulate the weighted sum rate maximization to jointly optimize the pilot and payload power allocation while considering their energy, minimum data rate and decoding error probability requirements.This optimization problem is non-convex, and we proposed novel low-complexity iterative algorithms to solve it.Simulation results demonstrate that the algorithm converges rapidly, and outperforms the existing benchmark algorithms, especially the algorithm We first consider the MRC beamforming.The proof follows the similar steps as those in Appendix A of [36] for perfect CSI.Denote γ k as the instantaneous SINR value when using MRC.By substituting a k = ĥk into (12), we have Then, E 1 γ k can be expressed as By using ZF, we have A H Ĥ = I K .Thus, a k ĥi is equal to one if k = i; otherwise, it is equal to zero.Then, the instantaneous SINR for ZF can be rewritten as where (62) is obtained by using (55).
The first-order derivative of U (x) w.r.t.x is given by It can be readily proved that U (x) ≤ 0 when 0 < x ≤ By substituting x = x, λ and τ i , ∀i into into the above two functions, we can show that ∂W (x) Hence, the second equality in (44) is proved.Now, we begin to prove (42).Before proceeding, we first introduce the following lemma.
Lemma 5: For any given x ≥ 0, we have the following inequality: where τ is given by τ = x 1+x , and equality holds only when x = x.Proof : The proof is similar to those in Lemma 3, and thus omitted for simplicity.
By applying inequality (73) for each x i , i = 1, • • • , K, we have Then, by multiplying the above K inequalities, we have

Fig. 1 :
Fig. 1: Factory scenario where a massive MIMO central controller serves multiple devices
number of transmit antennas of M = 100, number of devices of K = 10, channel bandwidth of B = 0.2 MHz, noise power spectral density of -174 dBm/Hz, decoding error probability of ε k = 10 −9 , ∀k, number of transmit antennas of M = 100, and maximum transmission duration of 0.5 ms.The maximum blocklength is calculated as L = BT = 100.The other parameters are specified in each figure.The energy constraint for each device is assumed to be equal, i.e., E k = E, ∀k, and each device has the same data rate targets, R req k = R req , ∀k.The weights for each device are uniformly generated within [0, 1].

Fig. 5 :Fig. 6 :
Fig. 5: Tightness of the derived data rate LB for the MRC case.

Fig. 7 :
Fig. 7: Convergence behaviour of the proposed algorithm for the MRC case.

Fig. 8 :
Fig. 8: Convergence behaviour of the proposed algorithm for the ZF case.

Fig. 9 :Fig. 10 :
Fig.9: Average weighted sum rate vs. maximum energy limit 10log 10 E for various schemes for the case of MRC.

Fig. 11 :
Fig.11: Average weighted sum rate vs. the number of devices for various schemes for the case of MRC.

Fig. 12 :
Fig.12: Average weighted sum rate vs. the number of devices for various schemes for the case of ZF.

Fig. 13 :Fig. 14 :
Fig.13: Average weighted sum rate vs. blocklength for various schemes for the case of MRC.

. 1 γ k = i∈K\k p d i σ i + i∈K p d i δ i + 1
Conditioned on ĥk , u k,i and v k,i are Gaussian random variables with zero mean and variance equal to σ i and δ i , respectively.In addition, u k,i and v k,i are independent of ĥk .Then, we have E where W ∼ W m (n, I n ) is an m × m central complex Wishart matrix with n (n − m) degrees of freedom.Then, we have 56) into (54), we can obtain γk in (21).APPENDIX B PROOF OF THEOREM 2 Then, we can express Ĥ as Ĥ = HΛ.The columns of H are independent of each other, and each column follows the distribution of CN (0, I).Then, E 1

√ 17−3 4 , 4 ≤ 17 −3 4 ≤√ 17 −3 4 ,
and U (x) > 0 when x > x ≤ x, we have U (x) < U (x), and thus J(x) < 0. From (67), we know that H (x) < 0, which means H(x) is a monotonically decreasing function of x.Hence, H(x) ≥ H(0) = 0 holds for √ x ≤ x, which equivalently means F (x) ≥ G(x).On the other hand, when x > x, we have U (x) > U (x), and thus J(x) > 0, which means H (x) > 0 and H(x) is a monotonically increasing function of x.Hence, H(x) > H(0) = 0 holds for x ≥ x, and thus F (x) > G(x).Based on the above analysis, when x ≥ F (x) is always no smaller than G(x), which completes the proof.APPENDIX E PROOF OF THEOREM 3The first equation in (44) can be readily verified.Then, we focus on the second equality in (44).The partial derivative of W (x) and Y (x) w.r.t.x i are given by ∂W (x)∂x i = j =i (1 + x j ), ∂Y(x) ∂x i = λτ i x −1 i j∈K x τ j j , i = 1, • • • , K.