Fast Simulation-Based Order Sequence Optimization Assisted by Pre-Trained Bayesian Recurrent Neural Network

This paper presents a fast optimization method for the picking order sequence of automated order picking systems in logistics warehouses. In this order sequencing problem (OSP), the fulfillment sequence of the given picking order set is determined to optimize the performance measures such as makespan and deadlock occurrence. Simulation is generally necessary to evaluate these measures for complex automated systems. However, their order sequence cannot be optimized quickly due to the long calculation time. It may make the system productivity and flexibility lower than expected because its picking schedules cannot be updated frequently. We, therefore, propose a fast optimization method to solve these simulation-based OSPs by taking a pretrained surrogate-assisted optimization approach. Firstly, we utilized a Bayesian recurrent neural network (BRNN) as a surrogate model to accurately learn the relationship between picking order sequence and performances. Secondly, we developed the surrogate-assisted optimization method based on simulated annealing (SA) and BRNN. Numerical experiments show that the surrogate model can evaluate about 10000 times faster than the simulation. The proposed method also obtains an optimized solution 8.9 times faster than simulation-based optimization by the original SA.


I. INTRODUCTION
S EQUENCE optimization problem (SOP) or sequencing problem is a combinatorial optimization problem (COP) to find an appropriate sequence of tasks to minimize or maximize the given objective function. SOP can be seen in various practical fields such as traveling salesman problem (TSP), vehicle routing problem (VRP), and other job scheduling problems. In logistics warehouses, SOP can take the form of the scheduling of order picking activity, where the largest workforce is allocated. For example, in the most common order picking system (picker-toparts), order pickers physically walk into a warehouse to pick items [1], and the associated labor costs account for about 55% of the total fulfillment operation cost [2]. Recently systems have been developed to automate order picking. Among these, automated storage and retrieval systems (AS/RS) [3], automated guided vehicle (AGV) picking systems [4], [5] and piece picking robots [6], [7] are commonly employed.
In this paper, we consider the AGV picking system with robot arms and conveyor modules shown in Fig. 1, which is entitled Pick-Place-Carry-Pick-Place (PPCPP) system. Fig. 2 shows the detailed layout. The PPCPP system contains multiple piece-picking robots that can deal with various picking orders in parallel. When the warehouse management system (WMS) sends picking order information to the PPCPP system, the picking order is allocated to one of the available piece-picking robots. Then AGVs start transferring the movable storage shelves where ordered items are stored inside the storage boxes. The feeding robots pick the transferred storage boxes from the shelves and place them on the feeding conveyor. Then the feeding conveyor carries the storage boxes in front of the piece-picking robots. The piece-picking robots pick items from storage boxes on the feeding conveyor and place them into shipping boxes on the shipping conveyor. When a shipping box contains all the requested items, the shipping conveyor transfers the box to the following process, and a new picking order is immediately assigned to the piece-picking robot. PPCPP systems can achieve high scalability since the number of AGVs, robot arms, and other components can be customized easily.
For achieving high productivity, appropriate job scheduling is vital [8]. In the PPCPP system, the operation sequence of the picking orders from the WMS strongly affects the makespan, This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ Fig. 2. PPCPP system layout. The system consists of AGVs, feeding robots, piece-picking robots, and conveyors. The system input (picking orders sequence) strongly affects the performances, such as the makespan and the deadlock occurrence. defined as the duration from the initial order placement to the end of the final order shipping. Generally, the makespan can be reduced by continuously picking similar orders with overlapped items since transportation of these items' storages by AGVs and conveyors can be aggregated. The order sequence can also affect the occurrence of deadlock, which is a situation where no task can be achieved because of stacked system modules. An example situation is when inputting picking orders that require more kinds of items than the feeding conveyor capacity at the same time. All feeding conveyors can be occupied before completing these orders in this situation. Once such a situation happens, it increases the operation time due to system stoppage and is hard to restore automatically. Therefore, the order sequence that can lead to deadlock must be avoided.
In this paper, we focus on the order sequencing problem (OSP) of PPCPP systems that determines the optimal order sequence X of N picking orders O = {O n |n = 1, 2, . . ., N} for minimizing the makespan under the deadlock avoidance constraint. For example, if N = 3, the possible X is Simulation is essential for solving the real SOPs (such as the OSP of PPCPP systems) because the performance measures are hard to formulate mathematically. Therefore, many researchers and practitioners have solved SOPs with simulation-based optimization (SBO) methods using metaheuristics [9]- [13]. However, these approaches are usually computationally expensive because they require iterative simulation runs, which generally take a few seconds to several minutes for one execution.
Surrogate-assisted optimization (SAO) is an alternative approach to solving computationally expensive SBO problems. Fig. 3 shows the flowchart of SAO, where the fundamental idea is to approximate the performance measures using a surrogate model or metamodel whose computational cost is relatively lower than simulation. Especially, Bayesian optimization (BO) is the most popular one, and several BO variants for binary optimization problems have been developed recently [14], [15]. However, it remains challenging for these methods to solve SOPs because enormous numbers of binary variables and constraints must be considered. Additionally, BO requires many simulations to train an accurate surrogate model during optimization. Thus, For quickly solving simulation-based SOPs, we have developed a new sequence optimization method that uses a surrogate model trained before optimization (pre-trained). In practical OSPs, most picking system configurations cannot be changed frequently, and the distribution of future picking orders can be forecasted using historical data. Therefore it is possible to train the surrogate model with a dataset randomly generated from the forecasted distribution. We utilize a Bayesian recurrent neural network (BRNN) as a surrogate model to learn the sequential nature of OSPs. Our proposed method, BRNN-SA, combines simulated annealing (SA) as the basic optimization strategy and BRNN as the surrogate model in the optimization.
The purpose of this paper is to answer the following research questions. r Evaluation of BRNN-SA demonstrating its superiority in solving the OSPs of PPCPP systems compared with conventional metaheuristics.
In Section II, we explain the background of this study. Section III describes the BRNN-SA for OSPs. In Section IV, we evaluate the performance of the proposed BRNN-SA. We conclude in Section V with a brief summary and mention of future work.

A. Order Sequencing Problem
The order sequencing problem (OSP) is a job scheduling process [16] that is categorized as a sequence optimization problem (SOP), which is a type of combinatorial optimization problem (COP). As OSP is a vital decision-making task for efficient picking operations in a logistics warehouse, many studies have investigated OSP for order fulfillment systems. Boysen et al. [17] formulated the OSP of the consolidation and packing area as a mixed-integer linear programming (MILP) model and solved it with a heuristic approach and dynamic programming. Pinto et al. [18] solved the order batching and sequencing problem by an iterative method of two Genetic algorithms (GA) that solve order batching and order sequencing separately.
In these OSPs and other ordinary job scheduling problems, the time needed for each operation and transportation is given to the planner as constant input values. However, in our problem setting, these values are difficult to determine mathematically since they can change depending on the AGV and conveyor status at a given time. In PPCPP systems, storage feeding for multiple picking orders can be aggregated in one AGV transportation, and it is nearly impossible to calculate the operation and transportation times in all possible situations. Therefore, instead of defining them as constant input values, we evaluate the makespan and the deadlock occurrence with the PPCPP simulator, whose input is an order sequence X.

B. Simulation-Based Combinatorial Optimization
SOPs like those above have been solved as simulation-based optimization (SBO) problems, where simulation models evaluate the system performance under a given configuration. Metaheuristic methods are widely utilized in this context, especially for solving simulation-based combinatorial optimization problems (SBCOP). Rouky et al. [10] proposed an ant colony optimization (ACO) hybridized with a variable neighborhood descent local search to solve the task sequences of quay cranes. Uman et al. [11] tackled the flow shop scheduling problem and proposed the tabu search (TS) process with a genetic algorithm (GA) to minimize makespan. Another popular metaheuristic to solve SOPs is simulated annealing (SA). Cheng et al. [12] proposed an extended SA algorithm to minimize the makespan of a permutation flowshop scheduling problem (PFSSP). Khurshid et al. [13] also tackled PFSSP by using an improved evolution strategy that harnesses local search abilities of SA.
While these SBO methods may effectively solve analytically intractable SOPs, they are usually computationally expensive because iterative simulation runs (generally at least 100 or more) are essential to calculate the performance measures of each set of parameters. Realistic simulations like the PPCPP simulator take anywhere from a few seconds to several minutes for one execution, which limits the application of metaheuristics for OSPs to situations with enough calculation time.

C. Surrogate-Assisted Optimization
Surrogate-assisted optimization (SAO) is an alternative approach to solving computationally expensive SBO. The fundamental idea of SAO is to approximate the performance measures by a surrogate model whose computational cost is relatively lower than simulation. As surrogate models, linear regression (LR) [19], support vector machine (SVM) [20], gradient boosting trees (GBT) [21], artificial neural networks (NN) [22], and Gaussian process (GP) [23] are utilized. Bayesian optimization (BO), which uses GP as a surrogate model, is one of the most popular SAO methods. Although the original BO can solve only continuous optimization problems, more recent BO variants for solving SBCOP have been developed [24]. For example, COMBO (Oh et al. [14]) and MerCBO (Deshwal et al. [15]) are state-of-the-art BO methods to solve binary optimization problems.
However, it is still challenging to use these methods for solving simulation-based SOPs. SOP can be reformulated as a binary optimization problem, but the reformulated problem typically contains an enormous number of binary variables and constraints to eliminate overlaps or sub-tours. Generally, such problems are difficult to solve with metaheuristics and BO. Additionally, when solving a large-scale problem, BO requires many simulations to train the surrogate model accurately during optimization. Thus, conventional approaches cannot quickly solve simulation-based SOPs enough to update picking schedules flexibly. For reducing optimization runtime, we propose a new SAO approach that uses a surrogate model trained before optimization.

A. Problem Setting and Formulation
We investigate the OSP of PPCPP systems, which determines the optimal sequence of the given N picking orders O = {O n |n = 1, 2, . . ., N} for minimizing the makespan without system deadlock. The makespan and deadlock occurrence are obtained as simulation results. The notations and decision variables are listed in Table I. The mathematical formulation of the OSP is as follows: min (1) shows the objective function F (X), which is the minimization of the makespan to finish the given order sequence X. (2) describes the constraint to avoid deadlock G(X). (3), (4), and (5) ensure that the decision variable X is a non-overlapped sequence of each customer's picking order O. As described in Section I, F (X) and G(X) are difficult to define as mathematical equations. We, therefore, calculate them with a PPCPP simulator.
In this paper, we investigate practical OSPs in logistics warehouses. In general, OSPs are solved with only the target OSP information, which is available just before solving the problem. In this case, the target problem information is the picking orders from customers (shipping orders) and the PPCPP system configuration. However, in real situations, we can utilize other information to solve the target OSP, such as the historical picking order information and the current system configuration. In this work, we assume the following two conditions; r Rather than assuming that the picking orders from customers are only available just before the picking operation begins, we assume that the distribution of historical picking orders is always available.
r The system configurations do not change from the preparation to the picking operations because the configurations of practical industrial systems cannot be changed so frequently. In the following sections, we discuss how to solve the OSPs of PPCPP systems under these assumptions.

B. Surrogate Pruning Optimization Framework
The overall flowchart of the proposed surrogate pruning optimization framework is shown in Fig. 4. This method consists of two phases: pre-training and optimization. In the pre-training phase, random picking orders are generated in accordance with the distribution obtained by the historical picking orders. Then, the surrogate models of the performance measures, namely, the makespan f (X) and the deadlock g(X), are trained to predict the simulated makespan F (X) and deadlock G(X) respectively. In the optimization phase, surrogate models are used to choose good candidate solutions quickly without simulations. The selected candidate solutions are then evaluated by simulation. This surrogate pruning enables us to reduce computational time to evaluate performance measures by simulation. These steps are repeated until the runtime reaches the time limit.

C. Selection of Surrogate Models
There are two requirements for the surrogate models for OSPs (f (X) and g(X)) as follows.
Requirement 1: The surrogate models must find the appropriate features to infer the performance measures automatically since manual feature engineering of a complex system such as PPCPP systems is complicated. Requirement 2: For optimizing the order sequence, the surrogate models must learn how the performance measures change depending on the order sequence.
To meet these requirements, we opted to use a multi-layer perceptron (MLP), which is a powerful machine learning method that automatically extracts features from input, which meets Requirement 1. For dealing with Requirement 2, we use a recurrent neural network (RNN), one of the MLPs that utilizes a recurrent layer to learn the sequential relationship in input features, such as order sequences. We also utilize a Bayesian neural network (BNN) to improve the model generalization. BNN is an MLP model that imposes a prior distribution on the weight parameters and aims to infer a posterior distribution instead of point estimates. Although the output of the PPCPP simulator is deterministic, we utilize BNN because it performs ensemble learning by marginalizing over this posterior for prediction.
Based on the above discussion, we select a Bayesian recurrent neural network (BRNN), which is a BNN consisting of a recurrent layer, for the surrogate models, as shown in Fig. 5. However, precisely computing the posterior of non-linear BRNNs is infeasible, so we utilized the Dropout variational inference [25] to train BRNN. It interprets Dropout regularization as approximate inference in BRNN models and learns a variational Dropout posterior over the weight parameters. Prediction results of BRNN are obtained by averaging N pred time inferences. In this work, we use N pred = 1 for training and N pred = 10 for prediction and optimization. In addition, we use the Adam optimizer [26] to train BRNN. The bidirectional long short-term memory (LSTM) architecture with 64 neurons (n r ) is utilized as the recurrent layer and connected to two dense layers, whose neurons are connected to all neurons in the next layer. Each layer consists of 192 neurons (n d ) with Leaky ReLU as the activation function and has a Dropout layer whose Dropout rate is set to 10% (r d ). All mentioned parameters (n r ∈ [32, 48, 64, 128], n d ∈ [64,128,192,256], and r d ∈ [5%, 10%, 30%]) were analyzed and the most accurate model on the test set was picked.

D. Proposed Optimization Method: BRNN-SA
In this section, we propose BRNN-SA, which implements the surrogate pruning optimization by combining the simulated annealing (SA) strategy and BRNN as surrogate models. The detailed algorithm is shown in Algorithm 1.
This optimization procedure starts when the target shipping orders from customers are confirmed. Let X t be the searching point in iteration t. First, we generate M 0 searching points randomly and select the initial searching point X 1 that has the minimum surrogate penalized objectives h(X) = f (X) + ωg(X), where f (X) and g(X) are surrogate models and ω is the penalty parameter. Then the M candidate points are sampled uniformly at random from the neighborhood S(X t ), which contains all points with a Hamming distance of at most one from X t . In the surrogate pruning, the top-K candidate points are chosen from M candidate points according to the surrogate penalized objectives h(X). This surrogate pruning can reduce the computational time by choosing good candidate solutions quickly without time-consuming simulations.
For the selected K candidate points, the penalized objectives H(X) = F (X) + ωG(X) are evaluated by using their simulation results (F (X) and G(X)). Then, the best candidate pointX t is chosen from the K candidate points. IfX t has a lower penalized objective than the best solution X best , X best is replaced withX t . The searching point is updated in the same way as SA. If H(X t ) is smaller than H(X t ), let X t+1 ←X t . Otherwise,X t is adopted as X t+1 with the following probability P : where T is the temperature parameter. T starts with a high T 0 to encourage exploration and then cools down by multiplying the cooling rate α ∈ [0, 1] to zoom in on a good solution. These processes continue until the computational time reaches the time limit. Finally, the best searching point is returned as the solution order sequence. In this paper, we set M 0 = M = 10000, by which the surrogate runtime in a loop is almost equal to one simulation runtime, and ω = 10 6 so that the deadlocked candidates always become worse than any candidates without deadlock. We also selected K = 10. In the preliminary experiments, we found that the optimization performance was excellent when 5 ≤ K ≤ 30, and it became unstable when K = 1. It implies that selecting multiple candidates can improve the robustness of finding good solutions under prediction errors. Additionally, we selected T 0 = 30 and α = 0.95 that showed the best optimization performance according to the preliminary experiments with the following parameter ranges; T 0 ∈ [10, 30, 100] and α ∈ [0.9, 0.95, 0.99].

IV. EVALUATION
We conducted numerical experiments to investigate RQ1 and RQ2 and compared the proposed optimization method with conventional metaheuristic methods.

A. Conditions
In our experiments, the sequence of the given picking orders is optimized for minimizing the makespan without the deadlock of the PPCPP system (Refer back to Fig. 2 for the system configuration.) All picking orders used in the experiments are generated randomly. One picking order set consists of 1 to 30 picking orders. Each picking order contains 1 to 5 items and the shipping quantity of each item is 1 to 700. The pre-training dataset is the PPCPP simulation results of 50,000 randomly generated picking order sets whose order sequence s are randomly sorted.
We formulate the deadlock occurrence prediction as a binary classification problem and the makespan prediction as a regression problem. In the following experiments, BRNN shown in Fig. 5 is compared with linear regression (LR), Gaussian process regression (GP), gradient boosting trees (GBT), and the Bayesian neural network without recurrent layer (BNN). Each picking order set is converted to the input features shown in Fig. 6. BRNN uses the sequential feature whose shape is Output: Best order sequence X best = {x i }. 1: Initialize t ← 1 and temperature T ← T 0 . 2: Generate M 0 candidate points X 0j (j = 1, 2, . . ., M 0 ) 3: Select an initial searching point X 1 from X 0j for minimizing h(X 0j ) = f (X 0j ) + ωg(X 0j ). 4: while Until reaching the time limit do 5: Select M candidate points X tj (j = 1, 2, . . ., M) from the neighborhood S(X t ) uniformly at random. 6: Evaluate h(X tj ) = f (X tj ) + ωg(X tj ).
where N s is the number of samples of the picking order sets, N t is the maximum sequence number, and N f is the number of the features. Other models, which cannot handle the sequential feature, use the 2D feature consisting of horizontally connected sequential features. This data shape is (N s , N t × N f ). In this experiment, we used N f = 34 features and set N t = 150 to cover all training dataset. In the pre-training of GP, only 200 randomly selected samples were used to shorten the training time. The prediction accuracies of makespans and deadlock occurrence were evaluated by 4-fold cross-validation. The computing environment is a Ryzen 3950X CPU (3.70 GHz, 32 threads), 128 GB RAM, and NVIDIA RTX3080 GPU.

B. Prediction Accuracy Comparison
In this experiment, we evaluate the prediction accuracies of the pre-trained surrogate models. These models are compared based on accuracy, defined as the number of correctly predicted data points out of all the data points and the area under the ROC curve (ROC-AUC) for the deadlock occurrence prediction. Mean absolute percentage error (MAPE) and mean absolute error (MAE) are used as metrics for the makespan regression.
The evaluation results of each metric are shown in Table II. For the deadlock occurrence prediction, the classification accuracy is highest for GBT (93.7%), BRNN (93.2%), and BNN (93.2%) in that order. However, the differences in classification accuracies between all models are not so significant (92.7-93.7%), which implies that the order sequence has little effect on the deadlock occurrence. On the other hand, for the makespan regression, BRNN shows a lower MAPE (2.68%) than GBT (3.45%), BNN (5.15%), and other methods. Additionally, we confirmed the BRNN inference time (0.89 msec/sample) was over 10000 times faster than the PPCPP simulation runtime (11.7 sec/sample).
We also evaluated how accurately each model can predict the makespan changes according to the order sequences of one order set. Fig. 7 shows the simulated and predicted makespans of all combinations of the seven randomly generated orders (5040 combinations). The horizontal axis represents each order sequence sorted by its simulated makespan. The vertical axis represents the makespan of each order sequence obtained by simulation and prediction with the pre-trained surrogate models. We can see that the BRNN prediction (blue line) is closer to the simulation result (orange line) than any other surrogate model. These results suggest that BRNN is suitable as a surrogate model to predict performances strongly affected by the sequential feature such as the order sequence, which is the decision variable of OSPs.
Based on these findings, we answer the RQ1 as follows. RQ1: Which surrogate model is most suitable for the OSPs of PPCPP systems? A1: In the performance measures of OSPs, makespan is strongly affected by the order sequences. To infer such performance measures precisely, surrogate models that can consider the sequential input, such as BRNN, are suitable.

C. Optimization Runtime Comparison
Next, we evaluated the total runtimes to find the optimal order sequence by the proposed surrogate pruning optimization method. In this experiment, we used three problem instances consisting of 20 picking orders that are randomly generated from the same distribution with training data. These problems were solved by the proposed optimization methods with each surrogate model (BRNN, BNN, GBT, GP, LR) and the original SA. All the experiments were executed ten times each, and their time limits were 900 sec. We then compared the averages of the objective values of each final solution (Final obj.) and the runtimes to find the solution whose objective value is equal to or lower than the original SA's final solution (LE runtime). Table  III shows the optimization results. As we can see, BRNN-SA can find the best solution to two problems and show the shortest   LE time of all methods (8.9 times faster than the original SA on average). BRNN-SA shows the best optimization performances since BRNN can predict the makespan changes according to the order sequence more accurately than other surrogate models as shown in Fig. 7. Fig. 8 shows the optimization trajectories of each optimization method in Problem 1. The horizontal axis represents the optimization runtime, and the vertical axis represents the makespan of the best solution X best at that moment. We can see here that BRNN-SA quickly finds a solution whose objective is lower than SA's final solution. This demonstrates that surrogate pruning by pre-trained BRNN can choose good candidate points more precisely than other surrogate models even at the beginning of the optimization.
On the basis of these findings, we answer RQ2 as follows; RQ2: How quickly can the proposed optimization method (BRNN-SA) solve the OSPs of PPCPP systems compared to conventional metaheuristics? A2: BRNN-SA can solve the 20 order OSPs by 8.9 times faster than the original SA. This is presumably because the surrogate pruning with the pre-trained BRNNs can find good order sequences successfully by inferring their performance measures without simulation.

V. CONCLUSION
This paper proposed the surrogate pruning optimization framework and its implementation (BRNN-SA) to quickly solve the order sequencing problem (OSP) of PPCPP systems, the automated order picking system in logistics warehouses. The proposed method utilizes the Bayesian recurrent neural network (BRNN) as surrogate models to infer sequencedependent makespans and deadlock occurrences. Then we proposed BRNN-SA, which combines simulated annealing (SA) with pre-trained BRNNs to choose good candidate solutions without simulation (surrogate pruning). Our experimental results demonstrated the proposed method can streamline solving OSPs thanks to surrogate pruning with the pre-trained BRNNs. For future work, we plan to implement the incremental learning step of surrogate models in this framework for solving largerscale problems and apply the proposed method to other sequence optimization problems such as job-shop scheduling problems.
ACKNOWLEDGMENT AI Bridging Cloud Infrastructure (ABCI) of National Institute of Advanced Industrial Science and Technology (AIST) was used for parameter optimization.