Adaptive Call Center Workforce Management with Deep Neural Network and Reinforcement Learning

Workforce management is one of several critical issues in a call center. A call center supervisor must assign an adequate number of call agents to handle a high volume of time-variant incoming calls. Without effective staff allocation, improper workforce management can degrade service quality and reduce customer satisfaction. This paper presents a novel call center workforce management based on a deep neural network and reinforcement learning (RL). The proposed method first uses a deep neural network to learn and predict call center traffic characteristics. The deep neural network consists of a Long-Short Term Memory (LSTM) network and a Deep Neural Network (DNN) capturing non-linear call traffic behaviors. The expected traffic parameters are supplied into the Erlang A model, which calculates important service metrics, including a call abandonment probability and the average response time. This paper applies a reinforcement learning framework using the Q-learning algorithm to establish the optimal starting times of call agent shifts and their associated call agent numbers by maximizing a defined reward function to handle dynamic call center traffic. The objective of these findings is to maintain the quality of service of a call center throughout working hours. The proposed method surpasses experienced human supervisors and previous workforce management schemes in terms of achieved qualities of service and average waiting time from experimental results under actual call center data.


I. INTRODUCTION
Operating a call center is a very challenging issue because of its operating nature 24 hours and seven days a week. Incoming calls from customers require actual assistance day and night, and call agents must be present to provide services. Instead of hiring a large number of call agents, call center supervisors must design a workforce management scheme to schedule working hours and to allocate a proper call agent number for each service-time interval [2], [3]. This task becomes more complicated when a call center possesses several services. For example, in Thai Telecommunication Relay Services or TTRS [1], a multimedia call center providing relay services to Thai deaf people, supervisors must allocate call agents for different types of services, including real-time video relay services, real-time text relay services, and short message relay services. In general, the objective of workforce management in a call center including TTRS is to leverage customer satisfaction under limited human resources. Designing a workforce management procedure for a call center must consider the following challenges.
• High turnover rates: A call center often has a high rate of employee turnover all over the world. A call center's functioning is impacted by high turnover rates since new call agents must be recruited and trained to replace resigned staff. When a call center has to deal with highly competent call agents, the situation becomes much more challenging. For example, TTRS call agents must manage relay services between the deaf and regular individuals using sign language. Replacing resigned employees would necessitate a lengthy period of training. • Workload forecasting: When a call center is understaffed, customer satisfaction generally drops, while call agents are overwhelming, leading to high turnover rates. In contrast, when a call center is overstaffed, many call agents are idle, and a call center will waste human resources. Workload forecasting is generally carried out by supervisors' experiences in some call centers, such as TTRS, resulting in inaccurate workload forecasting and overloading supervisors when a call center has several services and many call agents. • Human errors: Supervisors and call agents who are overworked frequently make mistakes in planing a call center, preserving client information, and providing services. When all operations are performed manually, the situation becomes even worse. For instance, in TTRS, supervisors normally spend two or three consecutive days for human resource planing. Supervisors are unable to work on other things during this planning period since they must focus on this laborious duty. Despite the fact that they put forth a lot of effort, human resource planning is prone to blunders. • Service level agreement: The majority of call centers have their own set of quality-of-service targets, with some performance indicators recommended by industry standards. For example, TTRS adopts the recommendations from the International Telecommunication Union (ITU) [39], [40] as a guideline to set up its service level agreement. In general, there are multiple performance metrics used to assess call center performance such as customer's call waiting time, call abandonment rate, and average speed of answer. For example, TTRS sets up different service level agreements for their video relay services, text relay services, and short message relay services. It is very difficult for human supervisors to incorporate all these metrics from different service types and to derive the best workforce management scheme that can satisfy both available call center resource and customer satisfaction.
In practice, human supervisors commonly employ a trunking theory via Erlang-B and Erlang-C [4] together with their experiences to manage human resources. However, these conventional tools do not consider some vital call center performance metrics such as abandoned calls. As a result, an improved version of Erlang-C called Erlang-A was proposed by including the rate of abandoned calls in its formulation. Unfortunately, these formulations are relied on the call center's static statistics and may not well represent actual dynamic traffics [6]. There were some researches proposed to be alternative tools for workforce management and resource allocations in various applications. The work from [7] introduced a Decision Support System (DSS) to handle a workforce management problem using linear optimization. The DSS outputs an appropriate number of service intervals (shift) and their corresponding agent numbers. However, the call center's parameters may not have a linear relation with the outputs. Hence, the DSS possibly can not provide the best solution. In addition, the DSS is still based on static statistics, which may not be efficient when we employ it in a dynamic environment. In practice, there are many factors affecting call center operations, such as different durations within a working day, holidays, and a number of call agents. Integrating several factors to derive suitable operating measures is not straightforward since we need to consider many variables, which may not be linearly related. Machine learning is widely known for producing a non-linear relationship between many inputs and outputs [8]- [11], [37], [38]. To be more specific, it can extract hidden clues from available data that human supervisors may miss. A neural network with Tabu Search [12], [13] was proposed to derive a job schedule. However, this work was operated under a small number of parameters and constraints, which may not be suitable to apply to a call center's workforce management problem. In [14], reinforcement learning was introduced to manage a maintenance schedule. A neural network and correlative variables were adopted for workforce management in banking operation centers [15]. Even though the works from [13], [15] tackled practical problems containing many parameters and constraints, their formulations were not designed to handle dynamic statistics, and the derived schedules were considered as suboptimal solutions. In [17], deep Qlearning was adopted to solve the problem of multi-resource multi-machine job scheduling. However, this research was not fully automatic and required human experts to identify partial labeling. This paper presents a novel call center workforce management based on a deep neural network and reinforcement learning (RL) to solve the limitations of existing resource allocation and workforce management methods. A deep neural network is used to learn and predict call center traffic characteristics. It consists of a Long-Short Term Memory (LSTM) network and a Deep Neural Network (DNN) capturing non-linear traffic characteristics. The expected traffic parameters are supplied into the Erlang-A model, which calculates important service metrics, including a call abandonment probability and the average response time. To handle dynamic traffic, this study applies an RL using the Qlearning algorithm for automatic workforce allocation. An RL establishes the optimal starting time of each call agent shift and the number of its associated call agents by maximizing a defined reward function. The objective of these findings is to maintain the quality of service of a call center throughout working hours under various service types and multiple service level agreements. The contributions of this paper can be summarized as follows.
1) We propose a new method for estimating call center traffic using a combination of a recurrent neural network and a fully connected network. 2) We incorporate multiple constraints in actual call cen-ter operations to our proposed traffic prediction framework.
3) We formulate an adaptive workforce management method based on an RL and employ the Q-learning algorithm to search for the optimal workforce management solutions. This paper is organized as follows. Section II describes the proposed call center traffic forecasting with the proposed deep neural network and the techniques in preparing data for traffic forecasting. Section III describes the dynamic workforce management with reinforcement learning framework. This section includes a policy searching technique of the formulated problem. The experimental results to access the performance of the proposed method from the actual call center data are in Section IV. Finally, concluding remarks are in Section V.

A. DATA PRE-PROCESSING MODULE
In a call center, a number of available channels is restricted by a number of working call agents, n a (t), at a specific time t. If all call agents are occupied, incoming calls need to wait in a queue. Let n q (t) is a number of calls that a call center's queuing system can handle at time t. As a result, the maximum number of channels that a call center can provide services is equal to n a (t) + n q (t). If there are some available channels in a queuing system, an incoming call triggers a ringing notification, and it is transferred to an available call agent. A call agent picks up a call and gives a service to a customer. After completing a service, a call agent and a customer release an occupied channel. Then, an awaiting call in a queue can proceed to receive a service. However, if all call agents and a queuing system are fully occupied, a customer will receive a busy signal. The described call center procedure can be illustrated in Fig.1. Service time T s is defined as a time interval starting from a call agent picking up a call to when both a call agent and a customer release a channel. Waiting time T w is a time interval starting from when a call from a customer enters a call center channel to when a call agent picks up a call. The waiting time consists of three-time intervals. The first time interval (t w1 ) is the time that a customer spends calling a call center. The second time interval (t w2 ) is when a customer has to wait in a queue before a call triggers a ringing notification. Finally, the third time interval (t w3 ) is when a ringing notification appears to a call agent to the time a call agent picks up a call. As a result, waiting time can be defined by (1) Elements of a call center dataset consist of a number of incoming calls, a number of receiving calls, each call's service time, and each call's waiting time. Suppose there are k services in a call center. Then, we will have a total of 4k datasets. For example, TTRS has three real-time services: video relay service on mobile phones, text relay service, and video relay service on kiosks and video phones. Therefore, TTRS call center has 12 datasets as shown in Table 1. Waiting time of video relay service on mobile phones L 5 Number of incoming calls of text relay service L 6 Number of received calls of text relay service L 7 Service time of text relay service L 8 Waiting time of text relay service L 9 Number of incoming calls of video relay service on kiosks and video phones L 10 Number of received calls of video relay service on kiosks and video phones L 11 Service time of video relay service on kiosks and video phones L 12 Waiting time of video relay service on kiosks and video phones To obtain meaningful data, we remove false and unintended calls by discarding the calls with the waiting time T w of lesser than eight seconds and the calls with no responses from callers. Then, the data are grouped into chunks of the 30-minute time interval. By dividing a whole day to be several 30 minutes time intervals, we obtain 48 intervals. However, if a dataset contains information for n d days, the total number of time intervals is calculated by (2) Table 2 shows an example of collected data from the TTRS video service that are grouped into 30-minute time interval chunks. In general, call center traffic affects by several factors such as weather, time interval, and holidays. For example, the number of incoming calls of TTRS significantly increases during rainy days, breaking times, and holidays. Hence, we incorporate a set of time-related features as shown in Table  3 to characterize call center traffic and call center's collected data. In this paper, there are 44 time-related features. The first 32 features characterize call center's operating day. The last 12 features (from the 33 th feature to 44 th feature) represent a service-time interval of the length of 120 minutes within a day. These time-related features F i , where 1 ≤ i ≤ 44, and traffic parameters, L i , where 1 ≤ i ≤ 12 of each call center service are used as inputs of our proposed traffic forecasting, which be described in the following section. Collected call center data normally possess various statistical properties. Different data may differ in data types, data formats, and data scales. We normalize different data to be on the same scale. The normalization will shorten a training period since the lost function will be close to being symmetric, make easier to optimize. To normalize data, the min-max normalization is utilized, which can be expressed as where z is a data sample, z n is the normalized sample of z, z max is the data sample with the maximum value, and z min is the data sample with the minimum value. These normalized data will be used as inputs of a traffic forecasting module.

B. CALL CENTER TRAFFIC FORECASTING MODULE
We use a deep neural network [38]   The pre-processed data are fed to the LSTM network with 128 hidden units. A Rectified Linear Unit (ReLU) is used as an activation function for all network layers. The activations from layer one are fed as inputs to layer two, which is the LSTM network with 64 hidden units. The activations of layer two are fed to be inputs of layer three, which is a fully connected layer with 64 hidden units. The activations from layer three are fed to layer four, which is a fully connected network with 64 hidden units. The activation of layer four is fed to the fully connected network of layer five with 64 hidden units. Layer six and seven possess the same structure as layer five. The dropout rate is set to be the same value as layer one during the training process. The activations of layer seven are fed to the fully connected network with 24 hidden units of layer eight, and the prediction output is the product from the final layer. The dropout rate during the training process is set to be 0.2 to avoid over-fitting. These predicted call center statistics will be used in an adaptive workforce management algorithm, which will be described in the next section.

III. DYNAMIC WORKFORCE MANAGEMENT WITH Q-LEARNING ALGORITHM
This section addresses a new workforce management method with reinforcement learning (RL). RL offers several advantages in the workforce allocation problem as follows.
• The RL can handle a complex task involving several variables, including incoming traffic, SLA, and starting time of each service interval. • The customer behaviors dynamically change over time.
The conventional method may not learn from changing data, resulting in the human resource policy that fails to achieve better call center services. For example, the RLbased human resource management can outperform the supervised learning-based human resource management under dynamic environments because the training data may not reflect the changing customer behaviors.
The following subsections will present the formulation of the workforce management problem as Markov decision process and demonstrate the use of Q-learning algorithm to search for the best solution of human resource allocation.

A. WORKFORCE MANAGEMENT AS MARKOV DECISION PROCESS
We formulate our workforce management problem using a Markov Decision Process (MDS). The MDS consists of a set of environment and agent states, a set of actions of the RL agent, the transition probability from state S to state S ′ , and the immediate reward from state S to state S ′ with action a. The MDS's objective in our problem is to learn the optimal or nearly optimal workforce management policy that maximizes the reward function. The reward function directly reflects call center service qualities. The service qualities are measured via Service Level Agreements (SLA), such as a dropped call rate or waiting time. An RL agent interacts with a dynamic call traffic environment in discrete time steps. At each time step t, an agent possesses state S t and reward r t . An agent chooses an action a t from all possible actions. This action is sent to the environment before moving to the next state S t+1 and the reward r t+1 corresponding with the transition (S t , a t , S t+1 ) is computed. An agent observes and predicts the current call traffic environment state in our formulating problem via our proposed deep learning network. Our formulated RL action, a t , is defined as an ordered pair of the starting time of each working shift and a number of call center agents working in that shift. For example, if each consecutive starting time is 30 minutes apart, within one day, there are a total of 48 possible starting times (i.e., 00.00, 00.30,.., 23.30). The number of call center agents ranges from one to the whole number of call agents in a call center. For instance, if a call center possesses 24 call agents, call agents range from one to 24. Let l t be the starting time, and g t is a corresponding number of call agents. The ordered pair (l t , g t ) can represent an action of RL. A number of possible actions is dependent on the possible number of starting times and a number of call center agents. In our example, there are possible 48 possible starting times and 24 call agents. Hence, there are a total of 48 × 24 = 1152 actions.
Define agent state t, S t (γ, β), characterized by working shift γ and service type β. For example, suppose that we have five working shifts and three services. In this case, γ can be either working shift R 1 , R 2 , R 3 , R 4 , or R 5 and β is either service C 1 , C 2 , or C3. There are a total of 15 states, as shown in Fig.3. The dwell time of the specific state is equal to the length of the corresponding shift. The state only changes within the same service type. We preset the dwell time of each state based on its service time and its service type. Hence, it is certain that state S t (R t , C k ) will move to state S t+1 (R t+1 , C k ). For instance, in Fig.3, we will change from state S 1 (R 1 , C 1 ) to state S 2 (R 2 , C 1 ) after staying in state S 1 (R 1 , C 1 ) for its state dwell time τ 1 (C 1 ). Since the starting time of each state or shift is dictated by the starting time from the selected action, it is possible that consecutive working shifts may overlap. In this case, the number of agents working during the overlapping shifts is equal to working call agents summation from consecutive states. The number of working call agents for each service can be expressed as where K denotes the state S t within the same service type k that represents (R t , C k ) at each time step t. w k is set to be one for a consecutive state and zero for a non-consecutive state. Fig.3 shows an example of overlapping shifts when state S 1 (R 1 , C 1 ) overlaps with state S 2 (R 2 , C 1 ). The number of working call agents during these overlapping states is equal to g 1 + g 2 . Notice that when state S 2 (R 2 , C 1 ) does not overlap with state S 1 (R 1 , C 1 ), a number of working call agents is equal to g 2 . Our proposed learning agent learns a policy π(S t+1 (R t+1 , C k )) = a t+1 (l t , g t ), which maximizes the accumulative reward. The immediate reward is positive and negative when the action gives call center performance above and below the preset SLA.

B. REWARD FUNCTION AND SERVICE LEVEL AGREEMENT
To apply the immediate reward to an RL agent, it is needed to define the SLA of a call center. We measure the SLA of a call center based on the average speed of answer and the probability that the actual speed of answer is less than or equal to the required average speed of answer. To compute the SLA, define A as a traffic intensity [19] computed as  where λ is an arrival rate of calls, H is the average service time per call, and µ is a service rate, which is equal to 1 H . Suppose that there are n a agents. A load per agent [19] can be calculated as ρ = λ µ × n a .
Define a ratio between a call abandonment probability and the average waiting time,T w , as The Erlang-A model [5], [20] is used to compute the performance metrics of a call center. We first define parameters J, ε, and K via [20] and The parameters J, ε and K can be deployed to calculate the probability of waiting P {T w > 0} and the probability of abandonment P ab as and The above equations lead us to performance metrics of a call center via the Erlang-A model as where T asa is the average speed of answer. A probability that the speed of answer is less than or equal to the average speed of answer can be expressed as In general, any call center sets its own SLA. In the case of the TTRS call center, it sets the required average speed of answer for video relay service on mobile phones, text relay service, and video relay service on kiosks and video phones to be 15, 12, and 15 seconds, respectively. Required P {t ≤ T asa } of all three services are set to be 0.85. With predicted call center traffics from our proposed deep learning model, we can estimate T asa and P {t ≤ T asa } relied on the Erlang-A model and the human resource strategy from the RL. If an RL action satisfies the preset SLA, the immediate reward will be granted to the RL. In contrast, the negative reward is applied to the RL if its action can not meet the SLA.

C. Q-LEARNING ALGORITHM
We employ the Q-learning algorithm [21] to search for the optimal workforce management policy. The Q-learning algorithm computes a quality of a state-action combination via a mapping where S ×A is a Cartesian product between a set of states and a set of actions. R is a set of a real number. Before learning starts, the mapping Q is initialized to a random number. When an RL agent chooses an action a t , the Q-learning algorithm observes a reward r t . Then, the state changes from state S t to state S t+1 . The Q-value is updated via [22], [23] Q new (S t , a t ) = Q(S t , a t )+ α(r t + γ max Q(S t+1 , a) − Q(S t , a t )), where r t is a reward when moving from state S t to state S t+1 . γ is a discount factor and α is a learning rate. The algorithm will end when S t is the terminal state, where the Q value is no longer updated. We describe the use of the Q-learning algorithm in the following sections.

D. Q-TABLE INITIALIZATION
To perform the Q-learning algorithm, a Q-  Table 4. The Q-values in the table are set to all zeros. Each shift's starting time and a call agent number are set to be random and one, respectively. We also initialize important parameters, including a discount factor (γ) and a learning rate (α). In this paper, we set them to 0.8 and 0.0001, respectively.

E. ACTION SELECTION
The epsilon greedy algorithm [21] is used to search for the optimal policy in the Q-learning algorithm. The RL agent selects the action corresponding to the highest reward with probability 1 − ϵ. Otherwise, the RL agent will obtain the action from the rest with probability ϵ. Define the initial value of ϵ as ϵ s , the final value of ϵ as ϵ s , and the decay rate of ϵ as ϵ d . The value of ϵ is updated for the next time step ∆t as The greedy algorithm to search for the optimal policy can be described step-by-step as follows.
• Step 1: Set ∆t = 0. Initialize a number of call agents to be one and set the starting times randomly for all actions. • Step 2: For each iteration of ∆t and state S k , compute ϵ from 18 and select randomly ζ between zero to one with a uniform distribution. Select a policy via • Repeat Step 2 for T f iterations and return the optimal actions of all states. The greedy algorithm can be written in an algorithm form in Algorithm 1. Note that even though the selected actions are optimal in the Q-learning sense, they may not meet some constraints imposed by a call center. In this case, the selected actions will not be passed to the RL agents. Instead, the greedy algorithm continues on the next step until finding the optimal policy does not violate the constraints. Examples of restrictions imposed by a call center are the total available of call agents, the minimum number of call agents allocated to different services, and starting time of each shift. It is possible that if there are too many constraints, the greedy algorithm may not converge. In this case, we may need to relax some constraints.

IV. EXPERIMENTAL RESULTS
This section evaluates the proposed adaptive human resource management algorithm. We deploy the actual call center data from TTRS ranging from 1 January 2018 to 31 January 2021 in our experiments [1]. The TTRS data are based on 24 call agents and three services containing TTRS video relay service on mobile phones, TTRS text relay service,

Algorithm 1 The Epsilon-Greedy action selection
Initialize the time step to be zero, the number of call agents to be one and randomly the starting times for all actions for Each iteration of ∆t and S k do Compute ϵ as in (Eq. 18) select randomly r between zero to one with a uniform distribution and TTRS video relay service on kiosks and video phones. Call center data from 1 January 2018 to 31 December 2020 are used as the training set, whereas the rest are used as a test set. Table 5 shows the hyperparameters utilized in both deep learning prediction module and reinforcement learning. The hidden node number of each fully connected layer is 64, except the last fully connected layer that has 24 nodes as shown in .

A. CALL CENTER TRAFFIC PREDICTION
We first assess the performance of the proposed traffic prediction algorithm with DNN in this section. We compare the experimental results of our algorithm with previous works including Auto-Regressive Integrated Moving Average (ARIMA) [25], LSTM [26], Exponential Smoothing (ES) [27], and LSTM Deep Autoencoder (LSTM+DAE) [28]. We predict various call center traffic parameters, which are a number of incoming calls, a number of received calls, service time, and waiting time. The Mean Absolute Error (MAE) between the predicted and the actual traffic parameters are used as a metric for comparison among different algorithms. The MAE can be expressed as where p i andp i are the i th actual and predicted traffic values. N is the total number of traffic data. Tables 6,7, and 8 compare the results obtained from three services of TTRS, TTRS video relay service on mobile VOLUME 4, 2016 phones, TTRS text relay service, and TTRS video relay service on kiosks and video phones, respectively. The MAE results indicate that the proposed traffic forecasting based on deep learning can surpass all previous predictive techniques and can give higher predictive accuracy. The superior prediction accuracy of our algorithm is based on the exploitation of LSTM and fully connected layers. The LSTM layers are utilized to learn temporal dependency among time-series call center traffic. Then, six fully connected layers are employed to derive non-linear mapping between LSTM outputs and the forecasted traffic. As a result, it outperforms the scheme that uses LSTM alone [26], and LSTM+DAE [28], which uses only three layers of fully connected layers. Fig.4 compares the actual call center metrics with their predicted values from deep learning. We can observe that call center metrics are dynamically changing along with time intervals. Each time interval corresponds to a 30-minute call center service. From our experimental results, the proposed traffic forecasting method can track and predict call center metrics with high accuracy due to the non-linear properties of deep learning. The high accuracy of traffic prediction will enhance the performance of human resource management evaluated in the next section.

B. CALL CENTER WORKFORCE MANAGEMENT WITH REINFORCEMENT LEARNING
We verify the proposed human resource management with reinforcement learning in this section. We divide working hours within a day to be 48 intervals of 30 minutes. We set the total number of call agents to be 24 referenced from TTRS. Therefore, we have a total of 24 × 48 = 1152 actions in the Q-learning algorithm. Each action represents the starting time and the number of call agents allocated to various services. For each episode of the Q-learning algorithm, we average Q-values from all actions. First, we study the effects of different reward functions on the convergence of the Qlearning algorithm. We map the value y of the SLA to a new number via where x max and x min are the upper and lower limits of the x value, and y max and y min are the upper and lower limits of the SLA. The unit of the SLA is the percentage that call agents can provide the service within 15 seconds. To be specific, in this paper, y min = 0 and y max = 100. Based on a new value of x, we compute a reward via the following functions [23]− [25].
• Sigmoid [26]: • SiLU [27]: • tanh [28] r • Softplus [29] r(x) = ln(1 + e x ). • Step function [30] r(y) = 30 where t SLA is the preset threshold of the SLA equaling to 85. If the Q-learning algorithm gives the human resource violating the preset conditions of a call center, the reward will be set to be -30. The constraints can be customized to match the requirements of different call centers. Fig.5 shows the average reward along with the episode from 01/01/2021 to 05/01/2021. We average over five working shifts and three services. We can see that the tanh function gives the fastest convergence. In contrast, the sigmoid reward function has  the slowest convergence rate. Fig.6 shows the effects from different ϵ values of the greedy epsilon algorithm to the average reward. We can see that when we set the value of ϵ to be 0.01, the searching algorithm for the best strategy converges fastest. This implies that the tanh reward function should be employed in practice, and the value of ϵ should be set to 0.01 when we deploy the Q-learning algorithm for human resource management. We next evaluate the performance of our proposed method for adaptive workforce management using RL. The results of workforce management are based on the call center traffics from our proposed DNN. We compare the results of our proposed method to the outcomes of workforce management by experienced human supervisors and the Decision Support System (DSS) [7] as shown in Table 9. It is obvious to see that these three human resource management schemes produce different human resource management solutions. For example, on January 1, 2021, a human supervisor allocated three call agents for video relay service on mobile phones, whereas the DSS and our proposed algorithm suggested one and two agents, respectively. To assess how these solutions affect the call center's quality of services, we measure the percentages that call agents can provide services within the desired thresholds. Specifically, we adopt the service level agreement (SLA) of TTRS that call agents must provide video relay service on mobile phones, text relay service, and video relay service on kiosks and video phones within 15, 12, and 15 seconds after a customer starts making a call. VOLUME   Moreover, the average speed of answer (ASA) is also utilized as our comparison metric. In addition, we define a metric measuring a frequency that the SLA is not met based on various human resource management schemes. Define L t as the total number of the 30-minute service interval. Each operating day contains 24 × 2 = 48 intervals. Therefore, each month will contain 1440 service intervals. We categorize this interval as an SLAfailure interval if the SLA is not met in a service interval regardless of the SLA failing duration. Define L f as the total number of an SLA-failure interval. The SLA failure rate can be defined as For example, if the total number of SLA-failure intervals is equal to 144 intervals within one month, the SLA failure rate will be equal to 10 percent. We use this metric to assess the SLA-failure rates of the proposed algorithm, human supervisors, the DSS. Table 11 compares the SLA-failure rate among the proposed algorithm, human supervisors, and the DSS. We can see that the proposed method provides less SLA-failure frequency than other methods. This is based on the fact that the human resource allocation from our proposed algorithm most likely corresponds to the actual call volume. Hence, the SLA failure rate has decreased since we have sufficient call agents to pick up incoming calls in every service interval. This also implies that the proposed algorithm can better maintain the SLA along with service times. Finally, we investigate the influences of traffic prediction algorithms on the proposed human resource management method. We adopt the above SLA failure rate as an evaluation metric. Table 12 shows the SLA failure rates of different traffic prediction algorithms. When we incorporate the deep neural network with the RL, the SLA failure rate drops to 5.69 percent, which is much lower than those of other algorithms such as ARIMA, LSTM, and ES. The lower SLA failure rate comes from the fact that traffic prediction based on deep learning gives accurately predicted parameters. As a result, the RL can precisely allocate human resources to all services.

V. CONCLUSION
This paper presented a novel learning-based method for adaptive call center workforce management. The proposed system was designed to have two stages. The first stage was the call center traffic prediction, and the second stage was the automatic workforce management with reinforcement learning. The call center traffic prediction utilized a deep neural network containing the LSTM and the fully connected networks to predict call center traffic parameters. Because traffic data of a call center possessed high temporal correlation, we applied the LSTM to extract temporal features and derived correlation among features with the fully connected network. These predicted traffic parameters were fed to the Erlang-A formulation to compute the expected service level agreement. Under changing traffic statistics, the reinforcement learning with the Q-learning algorithm automatically found the best workforce allocation solution under some constraints, such as the total available call agents. From the experimental results with actual data, the proposed human resource management scheme with the deep learningbased traffic prediction surpassed other methods regarding the desired SLA and ASA and the SLA failure rate. Without human intervention, the proposed human resource management method could maintain the quality of service, provided high customer satisfaction, and offloaded human supervisors from tedious workforce management duties. The proposed framework can be utilized to replace the traditional human resource allocation based on the trunking theory in all kinds of call centers.