Layout Design for Intelligent Warehouse by Evolution with Fitness Approximation

With the rapid growth of the express industry, intelligent warehouses that employ autonomous robots for carrying parcels have been widely used to handle the vast express volume. For such warehouses, the warehouse layout design plays a key role in improving the transportation efficiency. However, this work is still done by human experts, which is expensive and leads to suboptimal results. In this paper, we aim to automate the warehouse layout designing process. We propose a two-layer evolutionary algorithm to efficiently explore the warehouse layout space, where an auxiliary objective fitness approximation model is introduced to predict the outcome of the designed warehouse layout and a two-layer population structure is proposed to incorporate the approximation model into the ordinary evolution framework. Empirical experiments show that our method can efficiently design effective warehouse layouts that outperform both heuristic-designed and vanilla evolution-designed warehouse layouts.


INTRODUCTION
The global express delivery industry has been a trillion market, serving the people's daily life around the world. In 2017, the industry revenue is 248 billion USD [17] and in China, particularly, the annual gross express volume has surpassed 30 billion USD since 2016 [13]. During the recent two years, a new type of shipping warehouses, with intelligent robots sorting thousands of parcels per hour, emerged [23]. As shown in Figure 1a and 1b, autonomous robots carry parcels across the warehouse and unload the parcels into the target holes which connect to the vehicles heading to the target destinations. The layout of the warehouse, i.e. the matching of the holes and the target destinations, is usually designed by human experts. It can be challenging and also likely to be suboptimal, especially when the number of holes is large as shown in Figure 1b. Moreover, the demand of such warehouse layout design is not oneoff, since the distribution of the parcel destinations is not fixed and the warehouse layout design should be adaptive to achieve the best performance.
In this paper, we present an evolution-based method for automatically designing warehouse layout. To tackle the efficiency issue arising from time-consuming evaluation of each designed warehouse layout, we consider to train a neural network to predict outcomes of layouts without actually running agents in it, which is known as fitness approximation in the context of evolution [18]. We further propose a novel two-layer population structure to incorporate the prediction model into the evolution framework for improving efficiency, which can be categorised as multiple-deme parallel genetic algorithms [7]. Particularly, the higher layer consists of layouts that are actually evaluated and occupies a small fraction of the whole population while the lower layer contains layouts whose fitnesses are predicted by the learned model. Compared to existing methods for combining fitness approximation with evolution [10,16], the proposed two-layer evolutionary algorithm explicitly manages evaluated individuals and predicted individuals separately in two sub-populations and trains the approximation model online using the samples evaluated by the original fitness function. As such, the proposed method incorporates fitness function approximation into the multiple-deme parallel genetic algorithm naturally. Moreover, within an evaluation of a designed warehouse layout, we can observe not only the final outcome but also additional agent trajectories that comprise hidden information about the causes of the outcome. To take advantage of such additional information to improve the quality of the prediction model, we construct an auxiliary objective, i.e. to predict the heatmap of the environment where each individual value is the total number of visits of a point.
Our experiments of designing warehouse layouts demonstrate improved efficiency and better performance compared to both manual design and vanilla evolution-based methods without fitness approximation. Such a two-layer evolution-based environment optimization framework is promising to be applied onto various environment design tasks.

RELATED WORK
There are many real-world scenarios that can be regarded as environment design problems, ranging from game-level design with a desired level of difficulty [31], shopping space design for impulsing customer purchase and long stay [22] to traffic signal control for improving transportation efficiency [8]. In a recent work, [32] formulates these environment design problems using a reinforcement learning framework. In this paper, we focus on a new environment design scenario, i.e. warehouse layout design, emerging from the rapidly growing express industry.
Traditional warehouse design problems can be categorised to three levels, strategic level, tactical level and operational level [29]. At the strategic level, long-term decisions are considered, including the size of a warehouse [28] and the selection of component systems [19,21]. At the tactical level, medium term decisions are made, such as the layout of a conventional warehouse [2,4]. At the operational level, detailed control policies are studied, e.g. batching [12] and storage policies [14]. The problem discussed in this paper is about warehouse layout design, which is at the tactical level traditionally. However, in the era of big data, the layout of warehouse could be adaptive to the changes of the external environment. Specifically, the layout of the warehouse could be redesigned at intervals according to the changing destination distribution of the parcels. Thus, this problem is better to be categorised as a operational level problem.
For solving this problem, we adopt evolutionary algorithms. As getting a guiding signal means evaluating the designed objective in the target task, which would result in unacceptable computational resource requirement for scenarios where evaluation is expensive. To reduce the amount of expensive evaluations on real data needed before a satisfying result can be obtained, some works propose to learn a model to predict the outcome of a designed objective without actually running on real data [1,20]. Similar idea has been explored in the field of evolution and is known as fitness approximation [18]. Due to the inaccuracy of fitness approximation, it is essential to use the approximation model together with the original fitness function [15,25]. To incorporate the fitness model into the simulation-based evolutionary algorithms, individual-based [5] and generation-based [25] methods are studied. Differently, our approach explicitly manages two sub-populations whose individuals are evaluated by the approximation model and the original fitness function respectively. Similar approaches are known as multiple-deme parallel genetic algorithms [7]. Our work can be classified as a multiple-deme parallel genetic algorithm with a two-layer sub-population topology to balance exploitation and exploration.

PROBLEM DEFINITION
In this section, we formulate the environment design problem and introduce the particular robotic warehouse environment. We fix the agent policy in the robotic warehouse environment and focus on the remaining task, assigning destinations to the holes, which can be viewed as an environment design problem.

Environment Design
In many scenarios, there are n agents taking actions in a designable environment, such as cars running in a transportation system, consumers shopping in a mall, and so on. Denote the i th agent's policy as π i and the environment is parametrized as M θ = ⟨S, A,T θ , R θ , λ⟩, where S, A,T θ , R θ , λ denote state space, action space, transition function, reward function and reward discount respectively. After the agents play in the environment in an episode, a joint trajectory H = ⟨s 1 , a 1 , s 2 , a 2 , ...⟩ is produced and a cumulative reward G i is given to the i t h agent, where s t and a t (a)  Figure 1: (a) Real-world robotic warehouse for parcel sorting (screenshot from [23]). (b) Robotic warehouse environment. The triangles stand for the sources where parcels emerge. The circles stand for the robots carrying the parcels. The squares stand for the holes for the agents to put into the parcels. The squares are colored according to which destination the parcels coming into will go to. The agents repeatedly take a parcel with a color (destination) from a source to a hole with the same color. The objective is to maximize the total number of the parcels processed by the agents in a fixed period.
denote state and joint action respectively. Moreover, the objective of the environment designer is given as O(H ), whose function form can be defined specifically, and the designer intends to design an optimal environment to maximize the expectation of its objective (1) Note that the randomness of H is derived from the possible randomness of π i when selecting actions.

Robotic Warehouse Environment
In this paper, we consider a robotic warehouse environment abstracted from a real-world express system as shown in Figure 1a, where there is a warehouse for sorting parcels from a mixed input stream to separate output streams according to their respective destinations. The sorting process is done by the robots carrying parcels from the input positions (sources) to the appropriate output positions (holes) in the plane warehouse as Figure 1b illustrates. In order to maximize the efficiency of sorting, we should set the robots' cooperative pathfinding algorithm and assign the destinations to the holes. In this task, the agents share a common reward G and the environment also takes G as its design objective, i.e. O(H ) = G. We set π ϕ as a joint policy model for the agents. As such, the problem is formulated as For solving Eq.
(2), we should firstly set a sound cooperative pathfinding algorithm π ϕ * for the robots. After, we focus on optimizing the environment parameter θ , i.e. optimizing the layout of the warehouse (the assignment of the destinations to the holes) via Note that the demand of such environment layout design is not oneoff. Since the external variables (such as the destination distribution Assignment of destinations to holes Output of the parcels) may be changing, the best layout of the warehouse is changing accordingly. Thus, the layout of the warehouse should be redesigned at intervals, which gives a reason to find an efficient layout design approach.

Detailed Environment Description
The warehouse is abstracted as a grid containing h ×w cells. Among them, n s cells are sources and n h cells are holes, whose locations are given. There are n r robots available to carrying parcels from sources to holes. Each cell is only for one robot to stand.
In each time-step, each robot is able to take a move to an adjacent cell. When an empty robot moves into a source, it loads a new parcel whose destination follows a distribution over n d destinations (cities) with the proportions p 1 , p 2 , ..., p n d . On the other hand, when a loaded robot moves into a hole with the destination that is as the same as the loading parcel's, it unloads the parcel into that hole. That is to say, the rates of input and output flows are not restricted in our setting. Parcels are always sufficient when a robot moves into a source.
Our objective is to sort as many parcels as possible in a given time period T . We could achieve this objective by designing the layout of the warehouse, i.e. assigning the proper destinations to the holes. Specifically, we should determine the parameter θ = ⟨θ 1 , θ 2 , ..., θ n h ⟩ of the environment M θ , where θ i ∈ {1..n d } for i = 1..n h . Intuitively, the assignment of the destinations to the holes will affect the robots' paths and hence the efficiency of the whole warehouse.
The notations defined in this section are listed in Table 1.

Problem Complexity
For the problem defined above, the scale of the layout assignment space is n n h d , where n h denotes the number of the holes and n d denotes the number of the parcel destinations. Since the robot pathfinding algorithm works like a black box to evaluate each layout assignment, it is hard to determine a global optimum without exploring the solution space completely. Thus, this optimization problem is an exponential time problem. Even for a small setting, such as n h = 20, n d = 5, the number of the assignments is as large as about 100 trillion, which is hard to be explored completely.

Robot Pathfinding Algorithms
In our problem, the robot pathfinding algorithm is fixed. As the robots are quite dense in the real-world warehouse, jam prevention is the key point. We considered two cooperative pathfinding algorithms with jam prevention design. The first one adopts WHCA* [30] as a planner, which searches the shortest path from an origin to a destination for each robot in turn and ensures non-collision. The second algorithm is a greedy one, which guides the robots by a look-up table in each position and reduces conflicts by setting one-way roads in the map as illustrated in Figure 2a. We studied these two algorithms and the results showed that the greedy one has a significant advantage on time complexity and a minor disadvantage on performance. Due to the large simulation demand for testing environment parameter, we selected the time-saving greedy algorithm as the agent policy in our experiments. However, the proposed warehouse layout design solution can work with other robot pathfinding algorithm as well.

SOLUTION
In this section, we first introduce an evolution framework for automatically designing warehouse layout, and then present the auxiliary objective fitness approximation and the two-layer population structure for improving the efficiency.

Evolution with Robot Policy Simulation
Under the evolution framework, we maintain a population containing n warehouse layout individuals, i.e. assignments of the destinations to the holes (Figure 2b), and evolve the population for n д generations. Within each generation, we perform crossover, mutation and selection in order: • In the crossover phase, we randomly select c pairs of samples. For each pair of samples, we splice their holes from two matrices to two lines respectively. Then, we randomly select a common breakpoint for both lines and cross the two lines just like chromosomal crossover. Finally we generate two square matrices by reshaping the two lines.  Figure 3: An illustration of the process of evaluating an assignment sample θ . First, the latent representation X is learned via shared deep layers. Then based on X , separated layers are built to predict heatmapÎ and rewardĜ respectively. Two loss functions are calculated based on the difference between the prediction and the simulated results.
• In the mutation phase, we randomly select m 1 samples generated in the crossover phase. For each sample, we randomly select m 2 holes and randomly permute their destinations. • In the selection phase, we evaluate the generated samples in the crossover and mutation phases by robot policy simulations, then merge the original and the generated samples. The best n ones are selected for the next generation.

Two-layer Evolutionary Algorithm with Fitness Approximation
In this section, we propose a novel evolutionary algorithm that trains an auxiliary objective fitness function to evaluate a large population for providing promising individuals to a small population evaluated by simulations.

Auxiliary Objective Fitness Approximation.
In practise, the simulation of robots performing in the environment is time-consuming. A promising way to reducing the simulation time is to use an approximation function to compute fitness: where f G is the fitness approximation function, θ is a sample of environment parameter andĜ is the predicted fitness of θ , whose learning target is the expectation of the reward G. Moreover, since a simulation generates a trajectory H in addition to the reward G, we consider utilizing H to help training fitness function f G . Although G is the exact objective for fitness function to learn, we may extract additional information I (H ) from H that helps training the fitness function, under the assumption that G and I are correlated. We set an auxiliary training objective and use a neural network to capture this: where f is a neural network consisting of three sub-networks: f X is the bottom network that captures the common features and outputs X ; f I and f G are the two separate networks on the top of X that predictÎ andĜ respectively. In the robotic warehouse layout design problem, θ represents the assignment of the destinations to the holes and H represents the  Figure 4: The process of the two-layer population evolutionary algorithm in a single generation. The yellow and grey squares stand for the populations who have been (or will be) evaluated by simulation and fitness model respectively.
movements of the robots. Furthermore, we define I as the heatmap of the movements as Figure 2c shows. Intuitively, the distribution of busy areas should be correlated with the efficiency of sorting and the reward. The process of learning the fitness function in the warehouse layout problem is illustrated in Figure 3.
Since obtaining simulation samples is time-consuming, we train the fitness model online. Specifically, the fitness model is trained with the samples simulated along the process of the evolutionary algorithm. There is no pre-training in our approach.

Two-layer Population.
The fitness model provides a less accurate but more speedy evaluation than the simulation. These property indicates that the simulation is better to find the local optimum exactly and the fitness model is better to explore the global space speedily. For the standard simulation-based evolution, mutation rate is usually set small enough to ensure convergence within an acceptable time, thus the search space is relatively local. Therefore, we consider incorporating the fitness model into the standard simulation-based evolution as an additional part for exploring the global space.
Specifically, we maintain two sub-populations. The first one is of the same size as the population set in the standard simulation-based evolution. Also, the individuals in the first sub-population are evaluated by simulations. The second sub-population is multiple times larger than the first one and the samples in it are evaluated by the fitness model. We view the second sub-population as a candidate population whose top individuals have a chance of joining the first sub-population. On the other hand, the bottom individuals in the first sub-population may be moved to the second sub-population. We name the first-layer sub-population noble and the second civilian. Noble population and civilian population evolve separately while keeping a channel for migration.
In detail, the two-layer population evolves as Figure 4 and Algorithm 1 show. In general, N and C maintain individuals evaluated by the simulation and the fitness model respectively. In each generation, migration takes place. Specifically, C 2 from the civilian layer go up to the noble layer and N 3 from the noble layer go down to the civilian layer. In addition, the civilian layer discards the worst population C 4 and absorbs randomly generated population R.
There are 9 parameters related to the proposed two-layer evolutionary algorithm. They are noble population number |N |, civilian population number |C |, crossover rate c N , c C , mutation rate m N , m C , |C 2 | for the number of civilian individuals migrate to the Algorithm 1 Two-layer Evolutionary Algorithm with Fitness Approximation (a literal expalnation of Figure 4) Require: noble population N , civilian population C, untrained fitness model f , empty simulation sample set S 1: for each generation do 2: generate N 1 from N by crossover and mutation; 3: generate C 1 from C by crossover and mutation; 4: rank C ∪ C 1 by f to generate top population C 2 , middle population C 3 and bottom population C 4 ; 5: evaluate N 1 and C 2 by simulation and add the results to S; 6: rank N ∪ N 1 ∪ C 2 by the simulation score to generate top population N 2 and bottom population N 3 ; 7: generate random population R and discard C 4 ; 8: pass N 2 to the next generation as N ; 9: pass N 3 ∪ C 3 ∪ R to the next generation as C; 10: update f using S. 11: end for  Other variables can be determined by these parameters. In each generation, |N 1 | + |C 2 | simulations, n u model updates and |C 1 | model predictions are performed. Since the time cost of training the network and use it to predict is negligible compared to the simulations (see Table 4), the time complexity of the two-layer evolutionary algorithm for n д generations is O(n д (|N 1 | + |C 2 |)).

EXPERIMENT
We set up a virtual intelligent warehouse environment based on real-world settings and test our proposed approach comparing to the baselines. Our experiment is repeatable and the source code is provided in the supplementary.

Experiment Settings
Environment. We test our proposed approach in 20 × 20 maps. The positions of the sources and holes are set as the real-world scenarios. The detailed parameters are given in Table 2. The destination distributions are set according to long-tail functions to reflect reality. In our experiments, the reward is defined as the sum of parcel loading times and unloading times (roughly two times as the number of parcels processed).
Robots. As introduced, we adopt a greedy algorithm as the cooperative pathfiding algorithm for the robots. Firstly, we set one-way roads in the map as Figure 2a shows to avoid opposite-directional conflicts, while right-angled conflicts are avoided by setting priority. On the one-way roads, the robots decide moves by a look-up table containing h × w × (n s + n h ) records, each of which indicates the first step towards a particular source or hole from a particular cell.
Baselines. We test 5 baselines to compare with our proposed two-layer evolutionary algorithm (TLEA). Random: The holes are The evolutionary algorithm with simulations as introduced in the Solution section. SimuInd: An implementation of the individual-based evolution control algorithm [5]. This approach maintains a single large population for evolution whose individuals are evaluated by the fitness model. In each generation, the best individuals evaluated by the fitness model are evaluated by the simulation once again. The fitness model is trained online with the samples produced by the simulations. SimuGen: An implementation of the generation-based evolution control algorithm [25]. This approach also maintains a single large population as SimuInd. The difference is that SimuGen uses the simulations intensively in a generation and uses the fitness model in the next several generations.
Hyper-parameters. To ensure fairness, for Simu, SimuInd, Simu-Gen and TLEA, the number of generation is set as 60 and the number of simulations in each generation is set as 200. The model update and prediction times are also fixed as 5000 and 10000 respectively for SimuInd, SimuGen and TLEA. The population of Simu is 100; in each generation 200 individuals are generated by crossover; 50 of them are mutated. For SimuInd and SimuGen, the populations are 5000; 10000 are generated by crossover in each generation; 2500 of them are mutated. For the TLEA, |N |, |C |, c N , c C , m N , m C , |C 2 |, |R|, n u are set to be 100, 5000, 1, 1, 0.25, 0.25, 50, 2500, 5000 respectively.
Fitness model. Our network is composed of three sub-networks f X , f I , f G . The output of f X is used for the input of f I and f G . f X has two fully connected layers whose output is a vector that can be reshaped to match the size of map. Then, a 2D transposed convolution layer follows.f I has one transposed convolution layer to generate the heat map. And f G contains three fully connected layers to predict the reward. All the layers except the output layers have a ReLU activation function. The loss functions for the two outputs are set to be MSE. The first two fully connected layers have 128, 400 units respectively. The first 2D transposed convolution layer have 16 filters. And the second one has one filter. The three fully connected layers for reward prediction have 256, 128 and 1 unit respectively.
Hardware. We use two computers with an Intel core i7-4790k and an Intel core i7-6900k respectively. The one with 4790k also has an extra Nvidia Titan X GPU.

Results
We perform the baselines and TLEA. The results are shown in Table 3. We find Heuristic is fairly high compared to Random but is inferior to evolutionary algorithms. Moreover, TLEA outperforms all the baselines. Figure 5 shows the layouts designed by the baselines and TLEA with the heatmaps. We can see that the tracks of the robots running in the maps of TLEA are better balanced, indicating that there are less traffic jams. Figure 6a shows the learning curves. Since SimuInd and SimuGen mix the individuals evaluated by the simulation and the fitness model, their current best individuals may be the over-estimated ones by the inaccurate fitness model, which may lead to discarding the real best individuals. TLEA solves this problem by separating the two populations and ensure that the real best individual is always kept in the noble population.
In addition, TLEA and Simu are more stable than SimuInd and SimuGen, because the temporary best individual may be evaluated by the fitness model in SimuInd and SimuGen, which may be corrected by the simulation in later generations. The slight fluctuations of Simu and TLEA are caused by the variance of the simulations, which results in that the best samples can be over-estimated (which is much slighter than the fitness model) and would be averaged by extra simulations in later generations.

Discussions
Time cost. The time costs of the tested algorithms are listed in Table 4. It shows that the time cost proportion of the fitness model is less than 5%. In out experiment, we just ignore the time difference between Simu and other algorithms.
Effectiveness of heatmap. We evaluate randomly generated samples by the simulations and use them to train the fitness functions with and without heatmaps as auxiliary objective. We compare MSE and Pearson Correlation of them in Table 5, which shows that heatmap provides significant improvement to the fitness function.   Simulation allocation. Since simulations are scarce resources when running evolutionary algorithm, the allocation of simulations between the noble layer and the civilian layer is important. Moreover, it also determines the migration rate between the two layers. We test different |N 1 | |N 1 |+ |C 2 | , the ratio of simulations allocated to the noble layer, and find that 0.75 is a proper setting (see Table 6), which means three fourths simulations are allocated to ensure the accuracy of the noble layer and one fourth simulations are allocated to give chances to the civilian layer.
Impact of civilian population. We are interested in how much contribution has the civilian population made to the evolution of the noble population. We calculate a number named purity that measures how much the evolved noble population inherits from the initial noble population. As Figure 6b shows, the purity of the noble population declines rapidly along with the increasing of the reward (fitness). Finally, civilian population contributes more than 70 percent to the noble population.

CONCLUSION
In this paper, we study the problem of automatic warehouse layout design. The proposed two-layer evolutionary algorithm takes advantage of a fitness approximation model, augmented with an auxiliary objective of predicting the heatmap. Our approach enhances the exploration of the evolutionary algorithm with the help of the fitness model. The experiments demonstrates the superiority of our approach over the heuristic and the traditional evolutionbased methods. For future work, we would apply the proposed two-layer evolutionary algorithm to other environment design scenarios, such as shopping mall design, game design and traffic light control.  Figure 6: (a) Learning curves averaged over 10 runs. The Yaxis is the reward received by the best individual in each population. (b) Impact of civilian population for a particular run. Initially, the purity of each individual in the noble population is set to be 1 and each civilian is set to be 0. During the evolution, each child's purity is the mean of its parents' purity.