A Fast Q-Learning Based Data Storage Optimization for Low Latency in Data Center Networks

,


I. INTRODUCTION
With the increased importance of data analysis in the cloud data center networks, more and more service providers in the world rely on data service as part of their core business that affects the performance of that system, such as Amason, Google and Microsoft [1]. But they have to battle daily with data latency: slow data access rates can reduce their ability to deliver new digital products and services, and thus harm the profitability, customer relationships, and any operational efficiency.
Selecting the right data storage configuration is critical for both performance and cost [2]. The methodology of most The associate editor coordinating the review of this manuscript and approving it for publication was Joanna Kołodziej . existing researches is reducing latency by setting up models and designing optimal algorithms [3]. However, factors that cause the delay are diverse and dynamic, such as network latency, disk latency and other types of latency(RAM, CPU, etc.) [4]. Static models can neither describe the multiple causes of delay nor be adapt to dynamics. How to take full account of the dynamic factors of data centers to optimize data storage is still an open challenge.
Since the data storage problem can be formulated as a Markov decision process (MDP) [5], and MDP problem can find an optimal action-selection policy by model-free Q-learning [6], we choose Q-learning as the basic scheme to decide the best data locations for lower latency. However, the Q-learning technique faces two challenges to be widely used in data issues for two reasons:(1) massive input data and (2) blindness on parameter settings, which severely hamper the convergence of the learning process. To divide and conquer these two problems, a Low latency and Fast convergence Data Storage scheme, named as LFDS is designed. The LFDS firstly sparse the input matrix of Q-learning to reduce the dimensionality of the input while retaining its information as much as possible. Then come to the training phase, a specialized neural network is adopted for Q-learning to achieves a quick approximation. To overcome the blindness on the training parameter setting, the relationship of the two key parameters, learning rate and discount rate, are carefully studied and tested with real data input and network architecture. The preferred range of learning rate and discount rate are finally carefully analyzed and recommended for the data center scenario, which brings high training rewards and fast convergence.
The proposed LFDS acts as an agent interacting with the data center environment and continuously makes actions of choosing the storage location for each data item. By collecting feedback from the environment, such as the current state of request patterns, network conditions, and the resultant endto-end performance metrics (e.g., the read/write latency) due to these actions, the LFDS will improve the next data access location. Through this process, the agent can learn how to make a better choice for data access.
The main contributions of this paper are summarized as follows: 1) The data storage optimization problem is analyzed in a data center environment, aiming at reducing data access latency. A Q-Learning (QL) based scheme, named as LFDS, is proposed combining with neural network techniques. 2) LFDS is designed to shrink the dimensionality of the input matrix of QL under the premise that the integrity of the input information is maintained.
3) The preferred setting of the two key parameters in QL, learning rate and discount rate, are advised for the first time on the big data benchmark, which plays a decisive role in convergence. 4) Based on real data set, extensive simulations results show that LFDS can reduce the average write and read latency by 23.4% while the convergence time is improved by 15%.
The remainder of this paper is outlined as follows. The related works were concluded in Section II. Section III presents the system architecture and problem formulation. The LFDS scheme was proposed in Section IV. We evaluate the scheme in Section V. Section VI gives the conclusion.

II. RELATED WORK
Since the accessibility of Big Data is on the top priority of the knowledge discovery process, many efforts were done on improving data centers' efficiency. Two key concerns that existed are low latency and energy consumption. Researches have pointed out that well-designed data placement in data centers can highly improve the above issues by reducing data migrations, improving memory accesses to releases the network bandwidth and disk latency.
Fan et al. [7] tackled the problem of green data placement in data centers to strike a tradeoff among access latency, the energy consumption of data centers and network transport. The problem is proved to be NP-completeness and a 3proximation algorithm is prosed. To meet the latency requirements of the applications and clients, Xiang et al. [8] provided an insightful upper bound on the average service delay of erasure-coded storage with arbitrary service time distribution and consisting of multiple heterogeneous files. Oh, et al. [9] presented a lightweight system called Trips to model and solve the data placement problem using mixed-integer linear programming to determine data placement. In addition, to adapt quickly to dynamics, they introduced the notion of Target Locale List, a pro-active approach to avoid expensive re-evaluation of the optimal placement. Ren et al. [10] modeled the joint problem of data purchasing and data placement within a cloud data market as a facility location problem which is NP-hard, and gave a divide and conquer design to get near-optimal results. Li et al. [11] analyzed the complexity and compared algorithms for superposed data uploading problem in networks with smart devices. Based on this, Li et al. [12] designed a multi-model framework for indoor localization via mobile edge computing technology. Chen et al. [13] proposed visual object tracking algorithm research based on adaptive combination kernel. To guarantee QoS, Chen et al. [14] proposed a single-image superresolution algorithm based on structural self-similarity and deformation data block features.
However, static models can neither describe the multiple causes of delay nor be adapt to dynamics in data center networks. Because the machine learning method can approximate the optimal solution by iteratively learning the feedback from historical decisions, it is thought to be one of the best tools to solve optimal problems under environments with multiple dynamics factors. Wu et al. [15] gave a reinforcement learning-based data storage scheme for vehicular ad hoc networks, which can dynamically consider throughput, vehicle mobility, and bandwidth efficiency by employing a fuzzy logic algorithm. Xu et al. [16] proposed a reinforcement learning-based job scheduling algorithm combining with neural networks to reduce data centers' cost. To enhance the training speed, random pool sampling is proposed to retrain the neural networks via accumulated training data, and a unidirectional bridge network architecture is designed for further by using historical knowledge. Liu et al. [5] presented DataBot, a reinforcement learning-based adaptive model to learn the optimal data placement policies facing dynamic network conditions and time-varying request patterns. Liao et al. [17] considered a practical data center networks with Fat-Tree topology, and utilized a deep learning technology k-means to store most related data blocks, where k is the number of cores in the Fat-Tree. Klimovic et al. [2] presented a tool Selecta to recommend near-optimal VOLUME 8, 2020 configurations of cloud compute and storage resources for data analytic workloads. An online incremental and decremental learning algorithm based on a variable support vector machine was proposed in [18].
In most of the above learning application researches, convergence is rarely considered and users are relying on default parameter settings provided by the training system. A different parameter setting, however, might yield a much higher-quality convergence. He et al. [19] studied parameters compressing in deep learning. Deblasio and Kececioglu et al. [20] have considered biological benchmarks for the first time the problem of learning the optimal set of parameter choices for a parameter advisor, who proved that learning an optimal set for an advisor is NP-complete. They implemented an approximation algorithm to find sets for advisors that are close to optimal. Considering the realtime requirements of data analysis services, convergence should require more attention and improvement. In this work, aiming at improving the learning convergence, we design a Low latency and Fast convergence Data Storage scheme by (1)shrinking the dimensionality of the input matrix of QL under the premise that the integrity of the input information is maintained, and (2)advising the two key parameters of QL, learning rate and discount rate, via analysis and tests.

III. SYSTEM ARCHITECTURE AND PROBLEM STATEMENT A. THE DATA CENTER NETWORKS
Data center networks are comprised of three different three entities, they are master nodes, data nodes, and clients. Taking one storage system for example, it usually includes one master node who is managing the data of data (metadata), oversees the following key operations that comprise the system. Data nodes are responsible for storage and running parallel computations on that data. Clients are the applications or the load data into the cluster, submit jobs describing how that data should be processed, and then retrieves or views the results of the job when processing is finished.
The DCN has a distributed storage system consists of one master node and a set of N data nodes. Similar to the Hadoop Distributed File System (HDFS) [21], when a file comes, it is split into a set of data blocks D and these blocks are supposed to be stored in a set of data nodes. The master node executes file system namespace operations like opening, closing, and renaming files and directories. It also determines the mapping of blocks to data nodes. The data nodes are responsible for serving read and write requests from the file system's clients. The data nodes also perform block creation, deletion, and replication upon instruction from the master node.
To WRITE in a file, the client, master and data nodes will interact with each other as the following steps: 1) The client actively requests to upload a file by communicating with the master node, and the master node checks whether the target file already exists and whether the parent directory for the target file exists.
2) The master node responds to the request of the client and returns whether it can be uploaded. 3) After receiving the response from the master node, the client will split the file into blocks and start requesting the storage location for the first block. 4) The master node recommends a list of data nodes to the client. Meanwhile, the starting time T W i,start is recorded. 5) The client selects one data node from the list and requests to write in the first block. After the first data node receives the request, it will continue to call the second data node on the list, and then the second data node calls the third data node, completes the entire data node pipeline, and returns to the client step by step. 6) The client starts uploading the first block to the first data node. The first data node receives one and passes it to the second, and the second passes to the third. Usually, each block has three replicas. The third data node will send a feedback to the master node to record the complete time for the data block, such as T W i,finish . 7) When a block transfer is completed, the client repeat Step 1 until the target file is completely written in. Denoting the latency of writing in the ith data block is L W i , then To READ a file from the system, a client needs to interact with the master (who stores all the metadata i.e. data about the data). Now the master checks for required privileges, if the client has sufficient privileges then the master provides the address of the data node where a file is stored. Then the client will interact directly with the respective servers to read the data blocks. The master records the time duration from receiving a request to complete reading and as L R i . For the data analytical function, distributed applications run on multiple data nodes and may require the transmission of data blocks among them. Because the data analytical latency is mainly related to the computation workload of data nodes and influenced by the request task priority, the optimization of analytical latency is beyond the scope of this paper.
The storage system is built on fat-tree [22], a typical network topology found in data centers cite. Generally, a fattree topology is typically referred to in terms of the number of pods that are numbered left to right from Pod-0 to Pod-(k − 1). The topology consists of k pods with three layers of switches: edge switches, aggregation switches, and core switches. Fig. 1 illustrates a distributed storage system and the Fat-Tree topology of the DCN.

B. PROBLEM STATEMENT
In the big data storage system as mentioned before, different data streams accessed by analytic workloads have distinct characteristics. Selecting the right to compute and storage data node for data analytic applications is difficult as the space of available options is large and the interactions between options are complex [2]. How to decide the optimal data access location among all available data nodes to reduce latency is critical for both performance and cost.
Since the DCN environment is complex and dynamic, the traditional static model is no more suitable for the optimization problem of DCN. The data access can be formulated as a finite Markov decision process (FMDP) as described in one of our previous works [5], for (1) the amount of candidate data access locations(data nodes) are finite, and (2) each data access decision depends only upon the present state of the DCN, not on the sequence of events that preceded it, which is called as the Markov property. Because for any given FMDP, given infinite exploration time and a partly-random policy, Q-learning can identify an optimal action-selection policy [23] and it is a model-free reinforcement learning algorithm, Q-learning is chosen as the solution to learn dynamically from the historical DCN data access and apply improved data access decision.
Although DQL can provide us with the optimal solution through the finite Markov decision process, that is, when the request arrives, it can determine the optimal storage location for us. However, imagine that in the era of big data, with the exponential growth of data, facing the dynamic allocation of massive data and the demand of low latency, if all these data are used as input of neural network learning, it will result in huge input and training space and a low convergence of the training process. In another word, the input of massive data hinders the advantage of Q-learning. How to reduce the training input while retaining information becomes the key issue.

IV. DESIGN OF THE LOW LATENCY AND FAST CONVERGENCE DATA ACCESS SCHEME (LFDS)
LFDS is composed of two parts: (1) the basic Deep Q-Learning scheme (DQL) for dealing with the dynamic environment and data access patterns, (2) the Sparse input matrix method to further reduce the input state scale of DQL. Fig. 3 give the overview of the LFDS scheme, followed by design details of each part.
A. THE BASIC DEEP Q-LEARNING SCHEME Q-learning is used on the master node acting as an agent interacting with the data storage system. This agent continuously  makes actions of choosing the write/read location for each data block and collects the feedback from the environment, including the current state of request patterns and network conditions, and the resultant end-to-end performance metrics (e.g. the read/write latency) due to these actions.

1) DESIGN OF Q-LEARNING
Similar to [5], the fundamental design of Q-learning consists of three sets: states S, actions A and reward Q-function. Each State of the storage system will changed according to each Action, and brings different Reward, which can be expressed as: S × A → R.
States: According to the characteristics of the big data center, we divide the state into three categories, in which the state information comes from the data read/write request log.
• Network conditions include the average latency of a read request from the source node i to the destination node j which is denoted as L R ij , and the average latency of write requests which is L W ij , where i, j ∈ N . The average delay is measured in a real-time network state information log.
• Request frequency include [24]: 1) Read rate or frequency of data block m from source server i, denoted by F [R] i,m ; 2) Write rate of data block m to source server i, denoted by F [W ] i,m ; 3) Read rate of all data block from source server i, denoted byF [R] i ; 4) Write rate of all data block to source server i, denoted byF [W ] i . Action: We use a to represent the destination node (storage node) of data item writing. N is the number of storage servers, VOLUME 8, 2020 where a ∈ N . We use an array to represent the action set. When the action is taken (i.e. after the target node is successfully stored), set the index value of the corresponding array to 1 (the index of the array is the number of the current storage node), and set all other indexes to 0. In the dynamic data center network environment, the actions taken by the master node agent at every moment may affect the load balance of the data center.
Reward: The big data center system needs to evaluate every data location update (action), that is, the reward in reinforcement learning. The goal of data center network optimization is to obtain low latency data deployment by maximizing rewards. The reward is defined as the reciprocal of the weighted sum of read/write delays for data item movement at time [t, t ), l w refers to the write latency, l R k is the read latency of the k th round of training [24]. t is the time when the data item m to be written, and t is the time of next write operation to m. In this period, although the data is only written once, there may be multiple read operations, it is necessary to calculate the delay of all read operations during this period. Then average delay is seen as a reward for this time. The calculation method of reward is shown in Formula 2.

Reward
The Markov decision process (MDP) indicates that the next time state of the system is only the current time state, which is independent of the historical state: We use (S, A, P) to represent MDP, that is, S to represent the state, A to action and P to the probability of state transition which means the probability of state S t transferred to S t+1 by action A t at time t. We define the state set S = {s 1 , s 2 , · · · , s n }, Action set A = {node 1 , node 2 , · · · , node n }. Our goal is to obtain a low latency data deployment strategy by using a reinforcement learning algorithm. The strategy here represents the mapping from the state to the action, which is given by the conditional probability distribution π, that is, the distribution of the action set in the known state s: In Formula 4, strategy π specifies an action probability in each state s. When strategy π has been solved, we can use strategy π to figure out what action to take in any state s. When we adopt the strategy π according to the current state s and action a, the system will interact with the environment according to the current strategy and get rewards. Reinforcement learning ultimately seeks the optimal strategy, which is measured by the cumulative return from environmental feedback. The greater the cumulative return, the closer to the optimal solution. When the master node agent adopts the policy π, we can calculate the cumulative return. Cumulative return is defined as: (5) where R is the reward, k is the round of training, γ is discount factor, which is generally less than 1. Usually, the present reward is more important.
When the master node agent uses the policy π, the expected value of the cumulative reward under the state s is defined as the state-value function: Accordingly, the state-behavior value function is: Finding the optimal strategy is equivalent to solve the optimal value function: The updating formula of value function is as follows:

3) POLICY DESIGN
The master node agent explores the dynamic network environment through the ε-greedy strategy. In other words, the probability of ε is used to select random actions, and the probability of 1-ε is used to calculate and make an optimal action according to the current Q value, Where ε is a number greater than 0 and less than 1. The mathematical expression of ε-greedy strategy is:

4) DEEP Q-NETWORK
Q-Learning can solve the Markov decision problem in low dimensional state space, but in the real data center network, every microsecond has to deal with huge data. It is not rational to use the Q-Table to store state-action pairs. In this paper, deep neural uses formula 6 to fit the state-value function [5]. Every time a certain number of samples are collected and the Q-function is updated, that is to say, the parameters of the neural network are constantly updated, to learn the optimal action. Fig. 4 shows the deep neural network structure. The network delay matrix x 1 is the average delay of the end i sending read operation request to the destination j. if we use N to represent the number of servers in the data center, then the network delay matrix size is N × N . The read-write request matrix x 2 is the frequency of each data item requesting operation from the original end. If we use M to represent the number of data items, the size of the read/write request matrix is M × N . Because there is only one source side of the current data block request and there are N servers in total, the size of Node deployment matrix is 1 × N .
(w i,j , b) is used to represent parameters in neural networks. w (l) i,j represents the weight of the connection between i unit of l layer and j unit of l + 1 layer. b (l) i is the deviation parameter. z (l) i is the weighted sum of i unit input in l layer, which can be calculated by Formula 11.
When training neural network, there is a functional relationship between the output of the L layer node and the input of the L + 1 layer node, such as Formula 12, which is called activation function.
In this paper, We use the Rectified Linear Unit (Relu) as the excitation function, which is a piecewise linear function. When the input parameter p is greater than zero, the output is equal to the input and when p is less than zero, the output is zero. Compared with sigmoid, Relu tends to converge more easily in multi-layer deep neural network training.
After training, the weights in the neural network will be updated. It uses back propagation (BP) to update the weights in the network. Gradient descent is a very common method to find the local minimum value. Constructing square loss function for for a single sample (x (i) , y (i) ) in a training set T = {(x (1) , y (1) ), (x (2) , y (2) ), . . . , (x (n) , y (n) )}: According to the loss function, the gradient descent method is used to update W (l) Every iteration of gradient descent calculates all samples, which will affect the speed of convergence. So in this paper, we use stochastic gradient descent (SGD) to randomly select a group of samples from each iteration. In the case of large sample data, we can get an acceptable loss value without training all the samples.
When the system is running, clients continuously send read/write requests to the master node agent. By using the state matrix s of the current time t as the input of the neural network, the agent obtains the action a of the time t through the neural network or the greedy search strategy. Then the agent takes the action a in the data center network environment of the time t + 1 to obtain the data center network state and the reward of the time t+1. We can see that the whole data center environment is dynamic. After continuous attempts, the master node will select 128 1 sample data from the sample pool as training data, and constantly update the parameters in the neural network. In the end, the master node can take the optimal node deployment location according to the current load status of the data center network.

B. SPARSE INPUT MATRIX METHOD
By analyzing the features of real data, we observed that only a few data appeared frequently and Zipf's law [25] is satisfied. As shown in Fig. 5, In other words, we do not need to consider all the read/write request and only a small part of them covers the read/write request law. Sparsity refers to retraining the main feature information in the input as much as possible and eliminating the secondary feature parameters. In the training process, as many feature parameters as possible are expected to be zero, so that the practical and effective information can be concentrated in a low-dimensional space.
In our system, we assume that there are N data nodes and M data blocks. f m,n denote the request frequency of data block m by the node n. The Data-node matrix is recorded as: The maximum frequency value of row i of A denoted as f max . If the variance between f i,j and f max is less than a certain value d, then we call this data block i as Active Data in nodes j. Otherwise, we call it Inactive Data and set f (i,j) = 0. We use the data blocks with high request frequency as the input of our neural network, so neural network does not need to train all the data. By preprocessing the data frequency request in advance and reducing the deep Q-learning status input, the following matrix B is obtained. Matrix B is formed by sparseness of matrix A, where k m.

V. PERFORMANCE EVALUATION
Based on the real trace data, Microsoft Research Cambridge Trace [26], a series of simulations are carried out in this section. The experimental environment of this paper is based on the Ubuntu 16.04 operating system. The hardware configuration is equipped with Intel @ Xeon (R) CPU e5-2697 V2 processor, 8G memory and 512GB hard disk. The required software includes mininet v2.2.0, openflow 1.3, floodlight v1.2 and memcached v1.59. Through simulations, the performance of the proposed LFDS in improving read/write latency is compared with related benchmarks, and the convergence of each algorithm is compared. Finally, the impact of the learning rate and discount rate setting on the performance of the algorithm is tested. Benchmark works used in this paper include: 1) DateBot [24]: the basic Q-learning scheme for data storage using the original state space matrix. 2) CommonIP [27]: metaserver selects the node closest to the current copy to place data. 3) DSBK [17]: data deployment strategy based on k-means which is used to cluster the data. Mininet [28] is used to simulate the data center network in this paper. Mininet has the advantages of fast start-up speed, large scalability, multiple bandwidths, easy installation and use, which is very suitable for our simulation environment construction. We have built a three-layer switch network topology of Fat-Tree, and we set the direct link bandwidth  of each switch to 1Gbps [7]. A Fat-Tree, as shown in Fig. 6, is set to 4 pods, so there are 32 hosts in the network. Each host is equipped with Memcache. Each storage host has a client as the request source and a memcached process as the requesttarget. Memcache is a memory-based cache system, which supports the storage of key-value pairs. It has excellent data reading and writing performance and distributed expansion capability. Similar to HDFS, we adopt a 3-copy data backup strategy.

A. READ/WRITE LATENCY PERFORMANCE
We explored the impact of the sparse input matrix method on DQL. We have carried out two groups of experiments between the LFDS and the Datebot. From Fig. 7(a), we can see that during the first 1500s of system operation, when the request operation sent by the client is read, the average read/write latency is lower than the Datebot. After the 1500s, the convergence of the two algorithms is the same. In general, the sparse input matrix method reduces the average latency of reading operation by 39.3 ms. On the other hand, from Fig. 7(b), the sparse input matrix method reduces the average latency of write operation by 45.67ms. We can explain that for the data center, the sparse DQL state-space matrix is helpful to improve network utilization and reduce the internal  backbone network load. It is more helpful for its master node to adopt the optimal server node deployment to achieve lower read-write latency.
Finally, we compared the read/write latency performance of all deployment strategy. It can be seen from Fig. 8 that during the 3000s running of our system, the average read-write delay of DQL is 81.7ms and 43.98ms respectively, which is 2.4% and 28.4% lower than DSBK (k = 4), and 63.9% and 82.4% lower than CommonIP respectively. We can conclude that in the same experimental environment, DQL and DSBK have lower average read/write latency than CommonIP. In the aspect of data write operation, DQL reduces the latency by 28.4% compared with DSBK.

B. IMPACT OF LEARNING RATE AND DISCOUNT RATE ON CONVERGENCE
We test the impact of learning rate α and discount rate γ on reward in Q-learning. The learning rate refers to how much difference between each iteration will be learned, and the discount rate is the attenuation value of the future reward. To control the variables, we preset the learning rate and a discount rate as 0.075 and 0.7 respectively.
As shown in Fig. 9(a) when the learning rate is 0.06, the reward value obtained by each action (storing data) taken by the master node agent fluctuates between 40 and 80, and the change is not very large over time but rebounds around the 2600s. When the learning rate is 0.08 and 0.075 within the 1500s of system operation, there is a significant difference in reward: ARA α 0.08 > ARA α 0.075 > ARA α 0.06 , and AVA is reward per action. When the system runs for the 1500s, ARA α 0.08 and ARA α 0.075 are the same. While they all grow faster than ARA α 0.06 . According to Fig. 9(b), during the operation of the system, the master node continuously deploys and stores the data, and obtains the corresponding reward from the network environment, to optimize the deployment strategy of the master node. By changing the attenuation value of the reward, we can find the optimal value for our scheme. During the system operation time, ARA γ 0.8 > ARA γ 0.9 > ARA γ 0.7 . To sum up, when the learning rate is 0.08 and the discount factor is 0.8, the reward for the master node to take action will be greater.

C. CONVERGENCE COMPARISON
The faster the algorithm converges, the faster the data deployment scheme can be given. By comparing the cumulative distribution function of read/write latency of each scheme, we can see the convergence of the scheme. In another word, the fast the CDF curve convergent, the better the scheme. Given a simple example, in the Fig. 10(a), it is shown that when the read latency of LFDS's is 100ms, its CDF is about 0.95, which means that the distribution probability is 95% when the read latency is less than 100ms. And the CDF value of LFDS is already 1 at about 200ms, while that of the CommonIP reaches 1 at about 800ms. This means that LFDS has a faster convergence than CommonIP and can provide a data deployment scheme faster than the CommonIP.
By analyzing the CDF value of CommonIP, DSBK (k = 8), DSBK (k = 4) and LFDS, it is found that when the CDF value of CommonIP is 0.6, the read-write delay is 200ms. When the CDF of DSBK(k = 8) is 0.6, the read/write latency is 174.2ms (as shown in Fig. 10(a)) and 177.7ms (as shown in Fig. 10(b)). When the CDF of DSBK(k = 4) is 0.6, its read/write latency is 85.4ms and 112.2ms. The cumulative distribution probability of LFDS is 0.6, and the delay is 81.4ms, 42.6ms. With the sparse input matrix and the dedicated set learning/discount rate, LFDS outperforms its counterparts on convergence, which means high efficiency on data storage.

VI. CONCLUSION
This paper studied the data center storage method for reducing read/write latency in the data center. An evolutionary Q-learning scheme, named as LFDS (Low latency and Fast convergence Data Storage), was proposed. Reinforcement learning was used to obtain the optimal assignment and the neural network to fit the input data. Furthermore, the input VOLUME 8, 2020 matrix of Q-learning was sparse to shrink the dimensionality of the massive input data. The simulation-based on real data shows that LFDS can effectively reduce the read/write latency of DCN.
JINGYU ZHANG (Member, IEEE) received the B.E. degree in communication engineering from Hunan Normal University, in 2008, the M.E. degree in computer applications from Chongqing Jiaotong University, in 2010, and the Ph.D. degree in computer science and technology from Shanghai Jiao Tong University, in 2017. He was a Visiting Ph.D. Student with Ohio State University, from 2014 to 2016. He is currently an Assistant Professor with the School of Computer and Communication Engineering, Changsha University of Science and Technology, China. His main research interests include high-performance architecture and big data performance optimization, blockchain performance, and consensus mechanism optimization. He is a member of the China Computer Federation.
JIN WANG (Senior Member, IEEE) received the M.S. degree from the Nanjing University of Posts and Telecommunications, China, in 2005, and the Ph.D. degree from Kyung Hee University, South Korea, in 2010. He is currently a Professor with the Changsha University of Science and technology. He has published more than 300 international journals and conference papers. His research interests mainly include wireless ad hoc and sensor networks, network performance analysis, and optimization. He is a member of ACM. VOLUME 8, 2020