Autonomous Power Allocation Based on Distributed Deep Learning for Device-to-Device Communication Underlaying Cellular Network

For Device-to-device (D2D) communication of Internet-of-Things (IoT) enabled 5G system, there is a limit to allocating resources considering a complicated interference between different links in a centralized manner. If D2D link is controlled by an enhanced node base station (eNB), and thus, remains a burden on the eNB and it causes delayed latency. This paper proposes a fully autonomous power allocation method for IoT-D2D communication underlaying cellular networks using deep learning. In the proposed scheme, an IoT-D2D transmitter decides the transmit power independently from an eNB and other IoT-D2D devices. In addition, the power set can be nearly optimized by deep learning with distributed manner to achieve higher cell throughput. We present a distributed deep learning architecture in which the devices are trained as a group but operate independently. The deep learning can attain near optimal cell throughput while suppressing interference to eNB.


I. INTRODUCTION
Device to device (D2D) communication is an emerging technique to able to cope with the increasing mobile traffic demands [1]. Specifically, Internet of Things (IoT) enabled 5G system is one of the most important system to use D2D communication. Major scenarios of the IoT enabled 5G include remote control or broadcasting alert message by distributed wireless sensors [2]- [4]. Conventionally, interference management between two links is mainly focused on the D2D communications underlaying cellular system [5]- [8]. However, more challenges are still remained in the IoT-D2D enabled 5G system. First of all, the data and control planes would be separated and there are additional small base stations that support only the data plane in the 5G [9]. It means that the base station has to control devices which are covered by multiple small cells. Consequently, the control burden for the base station would be cumulated. In addition, The associate editor coordinating the review of this manuscript and approving it for publication was Min Jia . many IoT devices will be deployed with cellular support. If D2D communication supports offloading only in a data plane, the performance of offloading is significantly reduced because of the management overhead to control the D2D connectivity in 5G. The second challenge is the latency. Ultra-low latency is one of the primary requirements of 5G [7]. The time required for resource allocation is one of the major causes of increased latency. The time to request and receive scheduling information from a central node is inevitable in the conventional D2D communications. The conventional D2D communication also has the problem that channel information for all D2D links is required for efficient resource allocation. If all IoT-D2D devices report their channel information periodically, it might be significant burden to control channel and a central node. In addition, the computational overhead in a central node cannot be ignored.
Therefore, we propose an autonomous power allocation scheme for IoT-D2D devices without involvement of a central node. The proposed scheme operates similarly with a static transmit power decision but it avoids interference between a VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ cellular link and an IoT-D2D link. In addition, to exchange channel information between D2D devices is not required in the proposed scheme because it can operate independently on each D2D device. The proposed autonomous power allocation scheme can maximize the total throughput of D2D links while suppress the interference to cellular networks below a predetermined level. To achieve these goals, the proposed scheme uses deep learning. Deep learning and deep reinforcement learning have been exploited in various fields of wireless communications and networks [10]. Applying deep learning to IoT-based communication is also activated with various approaches [11]- [14]. These researches prove that the IoT network is also one of the good candidates to apply deep learning to optimize the performance. Note that the inference requires less computation compared to training and the evolution of IoT hardware is very fast. In addition, on-chip execution with a pre-trained model has proven to be fully feasible [15]. Also, the authors in [16] suggest an extremely efficient deep learning for mobile devices. Both technologies allow deep learning to be used in IoT devices.
In the proposed scheme, all devices have pre-trained deep learning model to maximize total throughput of D2D links by distributed power allocation. The deep learning model performs the role of sophisticated mapping between local information and global objective function. The proposed scheme based on deep learning has three main features: the distributed decision model which can maximize total cell throughput, the reduced process which uses only location information to eliminate exchanging channel information step, and the customized objective function for deep learning while maintaining interference constraints. The proposed scheme also suggests the methodology to customize an objective function for deep learning. Thus, the proposed scheme can be easily extended to consider other constraints such as energy efficiency.
The main contributions of this paper are as follows: 1) We propose a power allocation scheme for IoT-D2D communication using only location information. We note that the channel model can be statistically expressed as a function of distance. We propose a learning architecture which implies the overall deriving process including channel model in hidden layers. 2) We suggest an autonomous power decision scheme with local information to meet a near-optima. We design a distributed learning architecture for deep learning. We use one deep learning model to train with big data generated by simulation. After training, every IoT-D2D device has the same trained model. It enhances the feasibility of the implementation of the proposed scheme. 3) We design a customized cost function to optimize an objective function with several constraints in Lagrange multiplier method. It is important that the objective function and constraints have similar scales for deep learning. We design constraints to similar form to the objective function. Then, it is shown that it works well with Lagrange multipliers which can be roughly found. 4) Consequently, we reduce the time for power decision in D2D communication with two factors: shortening the process by making it autonomously possible and reducing computational complexity. In deep learning, the training process requires a lot of computation and longer processing time compared to the inference.
The rest of this paper is organized as follows. In section II, we introduce related studies on IoT-D2D communication.
In section III, the proposed method is described in three aspects: a distributed architecture, cost design for learning, and a deep learning process. Section IV presents the results of the actual implementation of the proposed scheme. We show various expressions of the results, including the power distribution of the cells. Finally, the significance of the proposed method is summarized in the conclusion section.

II. RELATED WORKS A. D2D COMMUNICATIONS
Many of the D2D communication researches consider IoT enabled 5G system. There are some studies about that D2D communication and small base station are coexisted [17]- [19]. The authors of [17] proposed a graphical solution to obtain an optimal transmission power of reusing nodes and proposed a potential game to solve the radio resource allocation problem in a distributed manner. The authors of [18] used the Stackelberg game to solve the power allocation problem of D2D nodes. The cellular user equipment is considered as the leader of the game and D2D transmitter and small user equipment are considered as the follower of the game. After the setting of leader and followers, they analyze the strategies of leader and follower to obtain the optimal performance. The authors of [20] consider the game theory with incomplete information mechanism. They proposed a static game for resource allocation in multi cell scenario and a repeated game extended from the static game with incomplete information. In addition to comparing and improving SINR performance, there are studies to improve other metrics like energy efficiency or fairness in D2D enabled environment [21]- [23]. The authors of [21] proposed two heuristic algorithms to allocate resources to cellular and D2D links. In this study, fairness is considered significantly among all the nodes. The cognitive radio situation is considered in [22]. The traditional cellular communication is considered as the primary link and D2D communication is considered as a secondary link. To obtain optimal energy efficiency of the secondary link by protecting minimum rate constraint of primary, the authors proposed an algorithm by considering two transmit covariance matrices of the secondary link. An energy harvesting enabled D2D network is considered in [23]. The optimization problem in this paper has a constraint related to energy harvesting and optimize the rate of the D2D links. For distributed resource allocation in D2D networks, the authors of [24] formulated the problem as a stochastic non-cooperative game with multi-agent Q-learning. However, it requires several iterations to converge for each resource allocation. In [25], a distributed resource allocation scheme was proposed also by using game theory, but it requires additional information exchange.

B. D2D COMMUNICATIONS FOR LOW LATENCY
The ultra-low latency is the key requirement of the proximity communications [7], [9]. PC5 interface is considered to support proximity communications in the 3GPP standards [26]. Mode 4 of PC5 interface is considered as a mechanism to allocate radio resource of D2D nodes in a distributed manner. When a node wants to establish a D2D link without the cellular network control, the node uses the PC5 mode 4 with configuration parameters. However, the scheme is not currently specified in the standard, so it needs to be more studied [27]. Therefore, there are many papers to find optimal radio resource allocation mechanisms in various D2D communication scenarios [28]- [32]. The binary search algorithm is used to propose an algorithm to guarantee latency and reliability of proximity communications in [28]. In this paper, the authors also proposed a technique of converting the latency constraint into equivalent rate constraint to solve an optimization problem easily. The situation that IEEE 802.11p protocol and LTE proximity protocol coexist is considered in [29]. They proposed a greedy algorithm of admission control of LTE proximity services to maximize the reduction of latency caused by two proximity protocols. In [30], a computation offloading scheme for mobile edge computing technology with vehicle devices. The authors of [31] proposed a fast discovery and radio resource allocation algorithm to minimize the latency of proximity communications. Deep reinforcement learning is used to allocate radio resource and transmission power of D2D nodes in [32]. The latency can vary depending on where the measurements are placed in the communication procedure.

C. RESOURCE ALLOCATION WITH DEEP LEARNING
Since deep learning has produced innovative results in the computer visions [33], many researchers have studied the application of deep learning to wireless communications. Currently, results using deep learning in each field of wireless communication are being announced. In resource allocation of wireless communications, there are also several impressive results. In the first generation, resource allocation and power control based on deep learning are studied with simple problem and labels from a known algorithm. The authors of [34] proposed a power control scheme using DNN. They conduct WMMSE [35] to get labels, then train DNN to predict the labels with all channel information. It is helpful to reduce computation time. Next studies had been conducted for more complex problems. In [36], the authors use Convolutional neural networks (CNN) to inference the labels with incomplete channel information. The authors of [37] use Recurrent neural networks (RNN) to solve Non-orthogonal multiple access (NOMA) problem.
For the cases of researches about D2D related, the authors in [38] used deep learning for intelligent link adaption to determine transmission rate. A V2V resource allocation is proposed with deep Q-networks (DQN) in [32]. It adopts a way that one of several given options is chosen because DQN is a discrete decision algorithm. However, the transmit power is a continuous variable. Thus, there is room for further performance improvement. In this paper, we suggest a transmit power allocation scheme that is available with continuous action spaces. Meanwhile, the authors in [39] proposed a D2D resource allocation with deep learning. They do not use labels but optimize the objective function directly using deep learning. The different from our proposed scheme is that it is based on central manner. For the IoT-D2D environments, a distributed scheme has to be seriously considered.

III. PROPOSED SCHEME
In this section, we describe the proposed Distributed Power Allocation method using DNN with Interference to eNB Constraint (DPADIC).

A. SYSTEM MODEL
It is assumed that orthogonal frequency division multiplexing access (OFDMA) is used in the considering cellular networks. It has N orthogonal subcarriers, which are nonoverlapped. The spanned bandwidth is smaller than the channel coherence bandwidth, so the spectrum is regarded as flat. We consider a set N = {1, . . . , N } of shared OFDMA channels, as well as a set of D2D device pairs, K = {1, . . . , K }. The pair of D2D devices consists of a transmitter and receiver, which are considered to be in perfect synchronization. Likewise, we consider multi-cell environments with B cells. The set of eNB is B = {1, . . . , B}. As shown in [25], a received signal Y n,k,k on link n can be expressed as follows: where H n,k,k means the complex channel gain between the transmitter and receiver of D2D device pair k. The H n,i,k is also the complex channel gain from the transmitter of D2D pair i to the receiver of D2D pair k. S n,k,k is the symbol of transmission. W n,k,k is an additive noise from zero-mean Gaussian distribution with variance (σ n,k ) 2 . Therefore, the spectral efficiency T k at a receiver of D2D pair k is expressed as follows: where p n,k is transmit power for D2D pair k on channel n. p k is a set of p n,k on each channel, p k = {p 1,k , p 2,k , . . . , p N ,k }. The proposed scheme aims to maximize the sum of D2D throughput while maintaining the following two constraints: power constraint, and interference to eNB constraint. Therefore, the objective function and constraints VOLUME 8, 2020 can be derived as: where P max is the power limitation of each D2D transmitter, and Q max is the interference to eNB constraint per channel. The maximum power constraint means that the total transmit power per user cannot exceed a given limit P max . Also, the interference constraint means that the interference experienced at the eNB cannot exceed the threshold Q max .

B. DEEP LEARNING MODEL FOR DISTRIBUTE POWER ALLOCATION WITH INTERFERENCE CONSTRAINTS
After the training phase in a central machine, all D2D devices have the same deep learning model. The model can autonomously infer transmit power with the location information of a transmitter and a receiver only. The proposed distributed decision scheme can maximize the total D2D rate in the multi-cell environment while maintaining interference constraints. The deep learning model uses a pair of location information to derive a pair of transmit power. In the training phase, the inferred transmit powers from every device are collected to calculate the sum of throughput. The sum of throughput is used to update the deep learning model. In conclusion, the model is trained taking into consideration the inferred power on the data link and the interference on other data links. After the training phase in a central machine, all D2D devices have the same deep learning model. The model can autonomously infer transmit power with the location information of a transmitter and a receiver only. The proposed distributed decision scheme can maximize the total D2D rate in the multi-cell environment while maintaining interference constraints. Fig. 1 shows that the distributed deep learning architecture. There are two phases: training phase in (a) and inference phase in (b). In the training phase, a deep learning model is trained with all location information of whole D2D devices in cells. For example, a D2D pair DUE 1 has four number: (x,y) of transmitter and receiver. The four numbers are a unit of data. The K units of data are used to train the deep learning model as independent input data. It means that the model infers transmit power differently to each pair. After that, the inferred transmit power are evaluated with the sum of throughput and constraints. The throughput of each pair is not maximized independently. The deep learning model is trained to maximize the sum of the throughput. After the training phase in a single machine, all D2D devices have the same deep learning model. Consequently, the models autonomously determine the transmission power of each D2D device only with local location information while maximize global objective function: the sum rate of D2D in multi-cell. In [40], the similar concept has been introduced but the proposed scheme has advanced features. The biggest difference is that we use one model. It simplifies overall training process and enhances feasibility of the proposed scheme. If multiple models are adopted for difference devices, then each model is trained by different data set. In that case, it is ambiguous that which model should be given to which device. If online learning is adopted instead of pre-trained model, another problem can be occurred. In online learning, deep learning can be affected by too much initiative data. Overfitting can also be occurred in the initiative data. If the multiple models use the same data set during training phase, those would become the same model consequently. Thus, one large model is more efficient to achieve the same result compared to cooperative multiple models. The distributed architecture is described as follows. Typically, θ is defined as a policy parameter. The policy for a D2D pair k is θ k . Then, the optimal setθ * k can be defined asθ *

1) DISTRIBUTED DEEP LEARNING ARCHITECTURE
where p k (θ k ) is transmit power which is derived from the policy θ k for D2D pair k. Each element ofθ * k are different from each other to optimize Eq. 3. However, the proposed model pursues that every D2D device has the same machine to determine their transmit powers to achieve the near optimal spectral efficiency. It means that every device in the same set K has the same θ K as where θ * K is the optimal θ K . Note that all pairs of devices k have the same θ * K in K. Also, the results of θ * K should approximate the result of the optimal setθ * k as Extensively, a set of K can be defined asK = {K 1 , K 2 , . . . , K B } where B is the number of sets. From that, the θ also can be redefined forK as Finally, we define the target θ which is independent to distributions of other devices while satisfy It is difficult to approximate θ * K to have the result of the optimalθ * k in Eq. 4. It is why deep learning should be adopted. Therefore, the policy θ can be redefined as a set of weights and bias in the DNN, {W , b}. According to θ , the neural network can determine the transmit power p so the θ is still the policy parameter. Thus, the p can be redefined with DNN as where DNN is a neural network, which can determine the transmit power p k based on the D2D pair k and the weights and bias set forK. Deep learning is a process for finding the optimal θ . Intuitively, if θ * K is sufficiently large, it can include all the meanings of the elements ofθ * k .

2) COST FUNCTION
In DPADIC, two constraints should be reflected to the cost function: i) transmitting power constraints, ii) interferences to eNB constraints. We adopt the Lagrange function to express the two constraints in the cost function. In deep learning process, a cost function defines a way to give benefit or penalty to update DNN. In other words, a cost function can be customized if it can give benefit or penalty. Therefore, we use throughput directly to the cost function of deep learning itself in the proposed scheme, as shown in Eq. 2. Thus, labels of data are not required. Although throughput and constraints are non-convex, it can be approximated by using deep learning. The power constraint η p is expressed as follow: where ReLU is the rectified linear unit (ReLU) function which is ReLU (x) = max(0, x). If the sum of the transmit power of a D2D transmitter is under the threshold P max , η p would be 0. Therefore, it only delivers a penalty if the transmit power of the transmitter exceeds the constraint. Besides, it is designed like Shannon capacity for being easy to make similar scale. Note that ReLU ( n∈N p n,k −P max ) is a ratio unit as similar to the definition of SINR. If the difference of scale is too large between independent terms in a cost function, deep learning cannot maintain balances of terms while training. Traditionally, additional constants, e.g) Lagrange multipliers, are used to balance the terms. We also adopt them but finding appropriate multipliers for deep learning is another challenge. Instead of that, we make constraints having similar scales to Shannon capacity. There are two points: using ReLU and similar form to Shannon capacity to make easy to find appropriate Lagrange multipliers. The interference to eNB constraint is also designed in a similar way like that to the power constraint. Before defining the constraint formula, the term of interference to eNB should be defined, which can be expressed as follows: where b means an eNB, and it is b ∈ B. According to Eq. 3, the interferences to eNB constraints are set for each channel. Note that the noise is not adopted for the formula. This formula aims to estimate the impact of each D2D transmitter on the eNB. Thus, the random noise factor should be ignored. Therefore, the interference to eNB constraints, λ if , can be formulated as follows: Finally, the cost function, C, of the proposed method can be described as follows: where λ if and λ p are Lagrange multipliers. Finding appropriate λ if and λ p are easy because they have a similar form to the objectives and ReLU in C.

C. DEEP LEARNING PROCESS
We adopt a multi-layered neural networks (MLP) to predict transmit powers. The number of features in an input data are only four, which are the locations of transmitter and receiver, so other extended deep learning architectures such as Convolutional neural network (CNN) do not need to be considered.
For activation function, we use a sigmoid, which is 1 e x +1 , instead of the ReLU. The defined problem is a regression problem. Thus, ReLU, which is a concept that identifies the required partial feature, is not appropriate. Sigmoid is suitable for the proposed method because it can deliver gradient to the previous layer with a back-propagation algorithm while preventing divergence of the neural network. If a vanishing problem is revealed, ResNet [41] can be used to deal with it but such a complicate network is not required because the input data consists of four features. In particular, the proposed method is more sensitive to germination, as there is a constraint for maximum power.
The learning process in the proposed scheme is similar to typical deep learning, except that simulation can be included in the training phase. In the proposed scheme, the deep learning process is merged with the simulation, which generates the location information of D2D nodes as input data to the learning process. It is a distinguished feature of the proposed scheme compared to typical deep learning process.
Input data and labels are important components for successful deep learning. Deep learning is trained to deliver output data to be similar with the labels of the input data. Thus, a successful learning process may not be guaranteed for the input data without labels. The problem to be solved in this paper corresponds to this case. The system cannot know the proved optimal solution before the resource and power allocation.
Instead of labels from the proved optimal solution, we use the objective function Eq. 13 as the cost function of deep learning. Because of this, the simulation generates new data every time for training batch data. Thus, the simulation generates as much input data as required at each iteration. It means that there is no overfitting. The detailed learning process is described in Algorithm 1.
We adopt Xavier initiation [42]. n_epoch is the number of iterations. The simulation is designed to deliver a batch, which is a set of input data. The size of a batch is given as batch_size. Train() function is the actual training part in 1. Get_Throughput(X,P) delivers the throughput as defined in Eq. 2. Finally, the throughput results are included in a set Throughput. The throughput results are collected in order of iteration in the set Throughput. Train() inferences the power set P with input data X and θ . Then, the cost function is defined as c with the input data as X and the predicted power as P. The cost function is the main part of this train function. It is implemented using Eq. 13.
X and P may have several data sets because the several input data sets are trained simultaneously. In the cost function, Eq. 13 of each input data set is derived, and the results are averaged. We also use the Adam optimizer in [44] to adjust θ , which deals with the cost function itself, not the result of the cost function. The Adam optimizer differentiates the cost function to trace the changes. Consequently, θ is gradually changed by the optimizer to minimize the cost function. In Inference(), the reshape function is used to change the shape of the input data.
The first shape of the input data is [batch_size, K, 4], which means that there is a number of batch_size and an input data set has K number of D2D pairs. A D2D pair has four features: x, y of the transmitter and receiver, respectively. It should be changed to [batch_size × K, 4] because each D2D pair data should be independent of distributed learning. Thus, there According to the proposed scheme, it can reflect large-scale fading including path-loss and shadowing. The path-loss can be modeled as a function of distance statistically. Because distance can be easily implied from the location, we can understand that the computation of path-loss is implicated inside of the neural network.
The shadowing effect is dependent on the location of a device because it is closely related to physical obstacles to signal, such as buildings or trees. In simulations, however, it is difficult to reflect the effects of random variables based on a neural network if the random value is not one of the input data. This problem can be mitigated to use enough practical data in the learning process or adopt more detailed channel model.
Small scale fading is usually defined with a normal distribution, and thus it is impossible to estimate small fading effect with only location information. However, the small scale fading can be negligible because of the purpose of the proposed scheme: drastically shortening the resource allocation latency instead of focusing on a near-optimal solution of the non-convex problem. Thus, to consider the small scale fading is out of scope in this paper but we remain it as a future work. The problem of adopting small fading to D2D communications can be covered by applying recent works to estimate the channel models [45].

IV. RESULTS
We consider the same experimental assumption with [25]. The simulation parameters are summarized in Table 1. We assume hexagonal cells with radius R = 500m. The maximum distance between D2D pairs is D max = 100 m, while they are uniformly distributed in [0, D max ]. In addition, we consider multi-cell cases: B = 3 and B = 7 where B is the number of cell. The number of D2D pairs is 8 per cell. Thus, the number of D2D pairs K is 8 · B. The number of OFDMA subchannels N is set to 8, and then the spectral efficiency η is derived as η = T k (K ×N ) . The maximum transmit power constraint P max is set to 0.25 W. The channel attenuation is expressed by the path loss with distance, including shadowing and fading. The path loss exponent α is 4, with shadowing and standard deviation σ =8 dB on log normal distribution. The additive zero-mean Gaussian noise in the cellular network  from D2D is set to −130 dBW in [46]. This simulation is implemented using Tensorflow [47].
We use 50 data sets for a batch and total iterations are 100K. Thus, we use 5M cases of drops for training and there are no duplicated data because the data sets are newly generated in every iteration. The learning rate of the optimizer is 0.0001. If the learning rate is increased, DNN can attain a converged D2D rate earlier with fewer iterations. However, the final converged D2D rate may be decreased. Hyper parameters are 7 layers and 1500 perceptrons per layer. The size of neural network can be regarded as too large, but it is not a problem with computing power with this entry-level GPU. With these parameters, the learning time is about 3 4 hours. We use I7-6700K processors and a GTX 1080 Ti. It is another area of deep learning research that producing the same result with a smaller neural network. In addition, the function of inference is able with CPU, which means that it requires less computing power. Those deep learning parameters are summarized in 2. Fig. 2 and Fig. 3 describe the performance of the proposed scheme where B = 3 and B = 7 respectively. They tend to converge to a constant value after 30K iterations. Smaller Q max cases tend to be converged earlier because the initial transmit power is close to zero, as shown in Figs.2-(b)  and 3-(b). We set the range of power between −150 and 20 dBm. The initial powers are set near the middle of the range. The power is increased to find a better throughput by using the optimizer. Figs. 2-(c) and 3-(c) shows that DNN obtains the converged throughput while maintaining the constraint of interference to eNB.
In the proposed scheme, there are two significant parameters for adopting constraints, λ if and λ p . They should be determined manually, but it is not difficult because the valid range of the parameters is wide enough. Fig. 4 show the   effects of the interference to the eNB constraint factor, λ if . If too small λ if is used, the interference to eNB constraints can be ignored. In that case, it is more profitable to ignore λ if η if in minimizing the cost, though DNN takes the penalty from λ if η if . Thus, the spectral efficiency T is high but it is not valid because the interference to eNB exceeds the limit, Q max . If λ if is high enough, DNN cannot ignore the constraint. Then, DNN should maintain the constraints with reduced transmit power. If a much higher λ if is used, T can be reduced, but the falling is not meaningful. Note that η if includes ReLU function. It turns off the constraint if it does not exceed the threshold. Because of this, an effect of a high λ if is limited. However, D2D transmitters are dropped randomly, and it may be very close to the eNB. Thus, there can be a few cases of exceeding Q max though it has a very small transmit power. The cases affect the results. Consequently, T can be reduced slightly with larger λ if . Fig. 5 describes the effect of the transmit power constraint factor, λ p , which is less sensitive than λ if , because P max is 0.25 W. Similar to the case of λ if , DNN may ignore the power constraint if λ p is not high enough. With a very small λ p , η can be increased but cannot maintain the constraint. DNN adopts the transmit power constraints appropriately where λ p is over 10. Unlike λ if , a larger λ p does not has a problem. Even when λ p is 200, the performance of spectral efficiency does not change. It is because there is no D2D transmitter, which is over the P max after enough training. Fig. 6 compares the Iterative Approximated Distributed Rate Maximization Problem with Interference Constraint (IADRMPIC) in [25] and the proposed scheme with various P max and Q max . With the four cases of different Q max , the proposed scheme has similar throughput to the IADRMPIC. Note that the purpose of the proposed scheme is to achieve similar throughput without any involvement of other nodes. It shows that DPADIC can achieve a meaningful throughput via a prediction method with deep learning.   Scalability with various numbers of devices is important to a system because DPADIC uses pre-trained deep learning model. We compare two deep learning models which are trained with 8 pairs and 12 pairs, respectively. The learning model also considers the constraint of eNB interference factor, λ if . The models have been tested for various numbers of devices: ranging from 2 to 24 pairs. Throughput decrease as the number of devices increases because of the effect of interference.
Note that there is no meaningful difference between two pre-trained models. The deep learning model is trained to achieve that the eNB interference constraint in any distributions, so the policy from the deep learning is set conservatively. It means that there is a room for additional devices to meet the eNB interference constraint. According to this experiment, the pre-trained model can show valid performance for sufficiently diverse cases of the numbers of devices.
Figs. 8 and 9 show that T with various hyper parameter cases, where training with 16 devices and 24 devices respectively and B = 3. Depth means that the number of layers and width is the number of perceptrons in a layer. According to these experiments, both depth and width are important to achieve enough performance of deep learning. Note that the case of 24 devices requires more hyper parameters than those of the case of 16 devices. It means that the case of more devices is regarded as a more complex problem to solve. For scalability, it is advantageous to set higher hyper parameters. Optimizing hyper parameter is another challenging issue for most deep learning schemes [48], [49]. However, the proposed scheme, it does not focus on optimizing hyper parameters. Also, the experimental results show that the range of valid hyper parameters is large enough. Therefore, an additional optimizing hyper parameter algorithm is not required. The reason for the low association between hyper parameters and spectral efficiency is that the large amount of data can prevent overfitting. Overfitting is a phenomenon where performance is rather poor when the size of the neural network is too large for the number of data. In this system, the data can be generated by simulation, so it is hard to have the overfitting problem. Fig. 10 and 11 show visualized training results for each cell environment respectively. Because of the interference constraints to eNB, the D2D power allocations are more distributed in a cell edge area. With 100k iterations, it can get almost converged results. These results indicate DNN divides the compartments for power allocation to maximize throughput. It allocates fractionally transmit power by very slight subdividing. In particular, it is remarkable that the transmit power of the cell in the edge area increase. This implies that D2D links with the proposed method can be helpful to improve throughput of cell edge users. The signals of cell edge users can be combined or multi-hopped by D2D communication. Furthermore, DPADIC can be derived in a distributed way, which means that the performance enhancement for the cell edge users can be conducted without eNB involvement.
In Fig. 12, power distributions with distorted and non-hexagonal cell architecture where Q max = −150 dBW and B = 7 are depicted. To show that the proposed scheme can work independently from the architecture of cells, distorted cell architectures are simulated by shifting two right cells to left. Deep learning is a mapping function of the   location and the transmit power to maximize cell throughput. Even if the distribution of the cell changes, the mapping ability of the deep learning does not decrease.

V. CONCLUSION
We propose a distributed power allocation scheme for D2D links underlaying a cellular system. We describe the models that the D2D devices work autonomously. Then, the sum of the results of decisions at each device can achieve near-optimal spectral efficiency of the related result. It can be expressed that the D2D devices memorize the appropriate transmit power with location information to meet the near-optimal result. The proposed method also has another technical point that can be generalized. There are two features that can be adopted for not only wireless communication but also other optimization problems. The first feature is that it supports to solve general maximizing problems while maintaining specific constraints using deep learning. We show that it can be operated to optimize a problem while maintaining several constraints. Another feature is the distributed deep learning architecture. We solve the distributed power allocation problem for D2D links using this architecture, which can be applied to develop a centralized system into a distributed system. In the future, we will improve the proposed scheme for more complex system, which is difficult to cope with conventional schemes.