Meta-Heuristic Algorithms for the Generalized Extensible Bin Packing Problem With Overload Cost

In this paper, we consider a generalized extensible bin packing problem with overload cost, first proposed by Denton et al. in 2010, in which the total size of items packed into a bin is allowed to exceed its capacity, and the cost incurred each bin is equal to the fixed cost plus the overload cost, the objective is to minimize the total cost of all bins. According to the characteristics of the problem, we first propose an improved ant colony optimization algorithm (IACO), which enhances the positive feedback effect of ACO by improving the update method of pheromone and the adaptive adjustment parameters. We also introduce a variable neighborhood search method in ACO to improve the convergence of the algorithm and get rid of the phenomenon of local extrema. Then, we present a discrete particle swarm optimization algorithm (DPSO) to solve the problem. In order to ensure the uniform distribution and high quality of the initial particle swarm, we use some heuristic methods in the initialization process of the swarm, so that the initial particle can cover the entire search space with a large probability, which effectively improves the performance of DPSO algorithm. Finally, we compare and analyze the performance of these proposed algorithms through two sets of computational experimental frameworks. Compared with some algorithms in the literature, computational results signify that the improved ACO algorithm and MDPSO algorithm are more competitive than some other metaheuristic algorithms.


I. INTRODUCTION
A. RESEARCH MOTIVATION In a multiprocessor system, tasks in an order need to be scheduled to the processors in the shortest possible time. Each processor has a given capacity and is charged a fixed power-on/machine-wear cost. However, an overload cost will be charged if the total size of tasks scheduled to it exceeds the processor's service capacity. We aim to minimize the total cost of processing all tasks. Based on the application background in production and life, we consider the load balancing problem on identical parallel machines in this paper, called the generalized extensible bin packing problem with overload cost (GEBPOC). This problem was first proposed by Denton et al. [1], [2], who were inspired by a healthcare The associate editor coordinating the review of this manuscript and approving it for publication was Christian Pilato . delivery application in outpatient surgery centers that dynamically assign patients to operating rooms, with the goal of minimizing the total cost of opening an operating room and using overtime to complete a day of surgery when both the number of scheduled patients and the duration of the procedure are unknown. x ij = 1, for j = 1, 2, · · · , n. (2) x ij = 1, if job J j is assigned to machine M i 0, otherwise. (3)

C. LITERATURE REVIEW
It can be known from literature [3] that GEBPOC is also NP-hard in the strong sense, so it is unlikely to obtain the optimal schedule through polynomial time-bounded algorithms. Over the years there has been a great deal of research to develop efficient approaches for the problem in terms of performance guarantee. In GEBPOC, if c = 1, our problem is equivalent to extensible bin packing with unequal bin sizes (EBP-UBS), where an inexhaustible set of unequal bins are given. Unlike the traditional bin packing, a bin's capacity in EBP-UBS is allowed to be exceeded if needed. In addition, in the study of EBP-UBS, many researchers regard the capacity of each bin and the fixed cost as unitary, or the unit cost of the part beyond the bin is also 1. [4] considered this type of problem for the first time, and they studied both offline and online versions under the assumption that the largest item size does not exceed the smallest bin size. They shown that the worst-case ratio of the longest processing time first algorithm (LPT) is 4 − 2 √ 2 in the offline case, and that for the online version, the list scheduling algorithm (LS) has an upper bound of 5 4 , and this bound is tight. They also proved that any online algorithm has an approximation ratio is not less than 7 6 . [5] studied the online versions of this problem for m = 2, 3, 4, and proved that the designed online algorithm obtains tight bounds of 7 6 , 11 9 , 19 16 for these three cases, and when m = 2, this algorithms are the best possible. They also given an improved algorithm for m = 3. The vector scheduling problem in asymmetric settings was studied by [6], who gave a polynomial time approximation scheme (PTAS) for the EBP-UBS using dynamic programming techniques, where the state space is a vector that depends on the dimension of 1 ε . When c = t = 1 in GEBPOC, the extensible bin packing problem (EBP) is formed. In [3], Dell'Olmo et al. proved that the worst-case ratio of the LPT algorithm is 13 12 . When the number of bins m is fixed, according to the discussion in [7], there is a fully polynomial time approximation scheme (FPTAS) for this problem. If m is not fixed, EBP can be solved by using the (efficient) PTAS idea for the identical machine scheduling problem in [8]. The online version of EBP is proved by [9] using the LS algorithm that both its upper and lower bounds are equal to 5 4 , and a heuristic method that depends on the parameter x(0 < x < 1) is designed to improve this bound, the new algorithm H x assigned tasks to machines when a machine's load is less than or equal to H x , and its worst-case ratio is equal to 1.228. For the lower bound of the online problem, they also found an instance in the case of m = 2 to proved that the competitive ratio of any online algorithm will not be less than 7 6 . A fully polynomial time asymptotic approximation scheme (FPTAAS) for this problem was developed by [10]. On the other hand, if t = 1 in GEBPOC, [11] investigated the bin packing problem with overload cost (BPOC), where the number of identical bin is infinite, and they presented the lower and upper bounds of any deterministic online algorithm for BPOC according to the value of c. [12] considered the more general case of EBP, i.e., the generalized extensible bin packing problem (GEBP), where c = σ i and the capacity of all machines is allowed to be different. They developed an EPTAS based on using the shifting technique followed by a solution of a polynomial number of n-fold programming instances. When the number of machines of each type is not part of the input but part of the solution, they presented an asymptotic fully polynomial time approximation scheme (AFPTAS) for a related bin packing type variant of the problem (denoted by GEBP-BPV) similar to the variant of EBP.
In the context of surgical scheduling, inspired by the practical application of dynamically assigning patients to operating rooms in outpatient procedure centers, [1] considered the more general case of the EBP problem, where the decision maker needs to choose the number of bins of size S to be opened. The fixed cost of each opened bin is c f , and the overtime cost per unit time is c v . The goal is to minimize the total cost of opening an operating room and working overtime to complete a day of surgery. Afterwards, [2] investigated some faster approximation algorithms for solving the online version of the problem (DEBP), and they showed that every (1 + ρ)-approximation algorithm for the EBP problem produces a (1 + ρ · Sc v c f )-approximation algorithm in this more general setting. They also considered a two-stage randomized version of the problem, in which emergency patients need to be assigned to the operating room along with preassigned elective patients. This is also the first attempt to account for random cases in the EBP problem. Based on this stochastic setting, [13] studied the stochastic extensible bin packing problem (SEBP).
There is a quite some literature that focuses on the case of where the fixed cost of each bin is equal to 0, called the late work minimization scheduling problem, which is equivalent to the early work maximization problem when considering optimal solutions. The early work denotes a part of a job executed before a due date, while the late work represents a part of a job executed after a due date. Late work minimization problem was first proposed by [14]. For the offline late work minimization problem, [15] proved that is binary NP-hard on two identical machines, and they designed an optimal online algorithm with competitive ratio of For the offline early work maximization problem, [16] proposed a polynomial time approximation scheme on two identical machines. [17] proved that the LPT algorithm has a worst case ratio is 10 9 on two identical machines, and proposed a branch-and-bound algorithm for the general case with arbitrary due date. When there are m machines, [18] proposed a pseudo-polynomial time dynamic programming algorithm and a fully polynomial time approximation scheme. Recently, [19] proposed a more efficient new dynamic programming algorithm and a FPTAS for the problem in [18]. Reference [20] presented an efficient polynomial time approximation scheme for the problem in [18]. Reference [21] considered four semi-online scheduling problems with a common due date to maximize the total early work. Next, [22] studied several online and semi-online early VOLUME 10, 2022 work maximization problems on two hierarchical machines. Reference [23] also considered three semi-online early work maximization problems on two hierarchical machines with partial information of processing time.
The above algorithms will expose the characteristics of poor solution accuracy for large scale problem instances. The rise of some meta-heuristic algorithms has provided new research and development directions for solving instances of combinatorial optimization problems such as large scale identical machine scheduling. In the pursuit of obtaining high-quality solutions within an effective time frame, metaheuristics play an indispensable role. It defines a series of processing frameworks based on natural laws, biology and other phenomena, which can be used to solve any optimization problem. In recent years, it has aroused great research interest of scholars. They have successively applied simulated annealing (SA) [24], particle swarm algorithm (PSO) [25], ant colony optimization algorithm (ACO) [26], and genetic algorithm (GA) [27] to solve the identical machine scheduling problem, and achieved certain results. When solving a particular problem through a meta-heuristic algorithm, it is necessary to define the expression of the problem. Then the feasible solution set is iterated based on the initial solution, the bad solutions are eliminated and the high-quality solutions are retained. Once the algorithm satisfies the termination condition, the best solution currently found is output. Note that the solution obtained by these algorithms is not necessarily optimal, but must be a feasible solution of great quality and performance.
An ant colony algorithm for the no-wait flow shop scheduling problem with the goal of minimizing makespan is considered by [28]. Some literatures (e.g., [29], [30], [31], [32], [33], and [34]) studied the application of ACO algorithm to single-machine scheduling problem with tardiness penalty. Later, [35] considered the scheduling problem of minimizing maximum tardiness with m identical machines, and they applied an ant colony algorithm with four different specific heuristics in the construction of solution. As we all know, there are many parameters in the ACO algorithm, and different parameter settings will lead to different results. Reference [36] considered the setting of ACO's parameters as a combinatorial optimization problem, and they solved the problem by the PSO algorithm and proposed an adaptive parameter setting strategy. In addition, [37] proposed three population solving algorithms for the problem of fair resource allocation: discrete artificial bee colony algorithm, discrete artificial fish swarm algorithm and discrete hybrid frog leaping algorithm, and verified the effectiveness of these algorithms through computational experiments. Chen et al. [38] used a DPSO algorithm to consider the minimizing total late work scheduling problem in a flow shop system with different due dates and learning effects. Reference [39] considered the two criteria of fairness and cost respectively for the crew scheduling problem, and proposed an improved honey badger optimization algorithm to solve this problem through genetic algorithm and Levy flight.
Compared with other algorithms, the algorithm has good performance.
Reference [40] proposed a DPSO algorithm for minimize makespan criterion, and they also investigated the effectiveness of hybrid DPSO by this algorithm with an efficient local search heuristic. Reference [41] considers a no-wait flow shop scheduling problem with the goal of minimizing makespan and total flow time. They proposed a hybrid DPSO algorithm related to variable neighborhood search, and explored the selection method of control parameters and the effect of embedding variable neighborhood search on the optimization performance of the algorithm. Reference [42] used particle swarm optimization for parameter optimization related to improving the ability of soil surface process models to simulate soil moisture. They also used a particle swarm optimization algorithm in [43] to calibrate parameters related to turbulence in the surface layer in the source region of the Yellow River. Reference [44] takes the total weighted earliness and tardiness penalty as the optimization goal under the condition of common due date on single machine environment, and proposes a DSPO algorithm, and improves the local search ability of the algorithm and jumps out of the local extreme value ability by embedding variable neighborhood search. Reference [45] proposed a discrete DPSO with new information sharing mechanism for minimizing the makespan problem. Reference [46] proposed a discrete DPSO based on genetic algorithm crossover operator and mutation operator, and compared the effectiveness of several crossover and mutation operators. Taking into account the objectives of makespan and total flow time. Reference [47] considered the effect of artificial intelligence particle swarm optimization method on the calibration of freeze-thaw related parameters in the improvement of climate-vegetation model freeze-thaw process. Reference [48] proposed a combined particle swarm optimization (CPSO) and employs a simulated annealing algorithm to enhance the ability of CPSO to get rid of local extrema value. For the job shop scheduling problem, [49] presented a hybrid DPSO with the tabu search algorithm for the makespan minimization problem. Reference [50] presented the DPSO for solving the problem of minimizing the total weighted earliness and tardiness time, and confirmed that this algorithm outperforms the research results of related literature.
Compared with other heuristic algorithms, the ACO algorithm has strong robustness in solving performance, that is, the basic ACO algorithm model can be applied to solve the problem with a slight modification according to the specific characteristics of a optimization problem. Moreover, the DPSO algorithm has the advantages of relatively fast approaching the optimal solution, effectively optimizing the parameters of the system, and strong robustness. Therefore, in this research, we solve GEBPOC based on ACO and DPSO algorithms, and use the LPT algorithm as a base comparator algorithm to evaluate our results. To the best of our knowledge, this is the first attempt to solve this problem using a population algorithm.

D. MAIN RESULTS
The main contributions of this paper are mainly summarized as follows.
1) According to the characteristics of the problem, the corresponding ant colony model is defined, and the positive feedback effect of ant colony optimization is enhanced by improving the state transition rules and dynamic adaptive parameters. In order to avoid the premature or stagnant phenomenon of the ant colony algorithm in the search, the variable neighborhood search method is also introduced to improve the ant colony algorithm, and further improve the global search ability and convergence speed of the algorithm. 2) The DPSO algorithm is a new computing technology based on swarm intelligence theory to solve many discrete optimization problems. In order to ensure the uniform distribution and high-quality characteristics of the initial population, some heuristic methods are adopted in the initialization process of the particle swarm, so that the initial particle can cover the entire search space with a high probability, and avoid inefficiency caused by blind search. 3) A large number of computational experiments are designed, and the results of the proposed meta-heuristic algorithm and LPT algorithm in terms of performance ratio, running time, function value, Friedman rank and convergence are compared, which shows that these meta-heuristic algorithms have good stability and strong optimization ability when solving large-scale instances. The reminder of the paper is organized as follows: In the next section, we briefly introduce the concepts and ideas of ant colony optimization and particle swarm optimization. Section 3 introduces the improved ant colony algorithm for GEBPOC. Section 4 presents a discrete particle swarm optimization algorithm for GEBPOC. Section 5 compares and analyzes the results and effectiveness of the proposed algorithm through computational experiments. A summary of this paper and an outlook for future research are presented in Section 6.

II. BASIC ACO AND PSO
A. BASIC ACO Ant colony optimization algorithm is a population-based evolutionary algorithm proposed by Dorigo et al. [26] by simulating the trail-finding method of ants' foraging behavior in nature. As shown in Fig. 1, the behavior of real ants generating near-optimal trails can be explained by four steps During the movement of the ant colony, it can leave pheromone substances on the trail for information transmission, and the ants can perceive this substance and guide their movement direction. When there are many ants foraging, each ant will randomly choose a trail at the beginning and release pheromone in the trail. Ants with a short trail will reach the destination earlier than ants with a long trail, and the frequency of round trips will also be faster, the pheromone left on this trail will be correspondingly more concentrated. But pheromones also evaporate over time. When the next generation of ants forage, they will choose the trail of pheromone concentration, and the more ants who choose this trail, will release more pheromone. Therefore, the behavior of the ant colony composed of a large number of ants will show a positive information feedback phenomenon: the more ants walking on a certain trail, the greater the probability of the latecomers to choose this trail. ACO has the characteristics of distributed computing, positive feedback of information and heuristic search, and is essentially a heuristic global optimization algorithm in evolutionary algorithms.
ACO was originally proposed for the traveling salesman problem (TSP) problem. The following introduces the basic ant colony system model by taking the traveling salesman problem (TSP) as an example. Let m and n be the number of ants in the ant colony and the number of cities in the TSP problem, respectively, d ij represents the distance between city i and city j, the number of ants in city j at time z is represented by b j (z). τ ij (z) represents the residual pheromone concentration on the trail from city i to city j at time z. In ACO, the walking trail of ants represents a feasible solution of the optimization problem, and all trails of the whole ant colony constitute a solution space of the problem. The general procedure of the ACO algorithm is as follows: 1) Set the number of ant populations according to the specific problem, and assume that the pheromone concentration on each trail is equal at the initial moment, i.e., τ ij (0) = R (R is a constant), and then search in parallel. After each ant completes a trip, it will release pheromones on the trail, and the pheromone is proportional to the quality of problem solution. 2) Construct the trail. This step includes the selection of the initial city and the determination of the next arrival city. Each ant randomly selects a city as its starting point, and maintains a trail tabu table to store the cities the ants pass through in sequence. During the movement of ant k (k = 1, · · · , m), the trail selection adopts a random local search strategy, and the direction of transfer is determined by the pheromone concentration on each trail. The probability of ant k transferring from city i to city j at time z is represented by p k ij (z), i.e., where α is a importance factor of the residual pheromone on the trail (i, j), β is the importance of VOLUME 10, 2022 the information transferred from city i to city j, and η ij is the prior knowledge, generally taking η ij = 1 d ij , which means that the closer cities are more likely to be selected. Unlike the ant colony in actual life, the artificial ant colony system has a memory function. The set of cities that ant k has walked through is recorded by tabu k , which is called tabu table, and it will make dynamic adjustments with the evolution process. And allowed k represents the city set that ant k is allowed to transfer in the next step, that is, the complement of tabu k . 3) After n moments, all ants have completed the traversal of n cities, then set tabu k to empty, calculate the length of the trail traveled by each ant, and write the shortest trail. All ants start the next round of search and traversal from the previous starting point. 4) In order to avoid too much pheromone causing the residual pheromone to drown the heuristic information, one need to update the residual pheromone on the trail after each ant walks a step or traverses n cities. Note that ants of the same generation are not affected by the pheromones left by previous ants. The process of pheromone renewal includes both the evaporation of previous pheromone and the increase of pheromone on the trail traversed. The pheromone update formula is as follows: where 0 ≤ ρ ≤ 1 is the pheromone evaporation coefficient, k ij (z) represents the pheromone increment left by the kth ant on the trail (i, j) in the current iteration, Q is a constant, and L k denotes the distance traveled by the kth ant in the current iteration trail length. 5) When the algorithm reaches a predetermined maximum number of iterations or a stagnant state occurs, the algorithm terminates and outputs the shortest trail found so far.

B. BASIC PSO
The particle swarm optimization algorithm was proposed by Kennedy and Eberhard [25] and is one of the latest meta-heuristic algorithm based on population intelligence for optimizing continuous nonlinear functions. Its biological inspiration is based on the metaphor of social interaction and communication in a flock of birds or school of fishes. It is simple and easy to implement, requires few parameters to be adjusted, and has the characteristics of strong global convergence ability and robustness. PSO algorithm has been widely used in function optimization, neural network training, fuzzy system control and other fields.In the PSO algorithm, individuals are regarded as particles with positions and velocities, where the particle's position represents a feasible candidate solution to the problem. Starting from the initial population, the particles fly continuously in the search space, each particle searches for the optimal solution in the search space individually, and records it as the current individual extreme value, and then shares it with other particles in the entire particle swarm. Finally, the optimal individual extreme value found is used as the current global optimal solution of the entire particle swarm. All particles in the particle swarm adjust their speed and position according to the current individual extremum found by themselves and the current global optimal solution shared by the entire particle swarm to gradually approach the optimal solution.
Let v k = (v k1 , v k2 , · · · , v kn ) and x k = (x k1 , x k2,··· ,x kn ) be the velocity and position of particle k, respectively, and pB k = (pb k1 , pb k2 , · · · , pb kn ) and gB = (gb 1 , gb 2 , · · · , gb n ) denote the optimal position of the individual and the population, respectively. Then in the (z + 1)th round of iteration, the update rule for the velocity and position of particle k is as follows: where w is the inertia factor weight, which reflects the influence of the particle's original speed on the next speed. The constants c 1 and c 2 are called cognitive coefficients and social coefficients, respectively, which reflect the extent to which particles are affected by their own optimal solution and the optimal solution of population. Both r 1 and c 2 are random numbers in the interval [0, 1], pB z k represents the personal best solution found by particle k in first z rounds of iterations, and gB z represents the global best solution obtained by all particles in first z rounds of iterations. The procedure of the basic PSO algorithm is shown in Fig. 2.
In fact, particle swarm optimization (PSO) is a new evolutionary algorithm (EA). Similar to the genetic algorithm, PSO also starts from a random solution, finds the optimal solution through iteration, and evaluates the quality of the solution through fitness, but it is simpler than the genetic algorithm rule. PSO does not have the ''crossover'' and ''mutation'' operations of the genetic algorithm. It can find the global optimal solution by following the currently searched local optimal solution. As we all know, the genetic algorithm first needs to encode the problem, and after finding the optimal solution, it needs to decode the problem. In addition, many parameters are needed in the implementation of the three operators of selection, crossover and mutation, such as crossover rate and mutation rate, and the choice of these parameters is mostly based on experience, which will seriously affect the quality of the solution.
Compared with genetic algorithm, PSO has the advantages of simple and easy to implement process, less parameters to be adjusted, stronger convergence ability and robustness. Since GEBPOC is strongly NP-hard, the optimal solution can be found in a reasonable time for small-scale problem instances, while the computational time of the exact algorithm explodes when the problem is large. In this paper, we will design a DPSO algorithm based on the discrete characteristics of the problem to find near-optimal or optimal solutions with acceptable time and memory requirements.

III. IMPROVED ANT COLONY OPTIMIZATION ALGORITHM FOR GEBPOC
Identical machine scheduling problem and bin packing problem are typical combinatorial optimization problems. They are obviously different from TSP problems. Therefore, some adjustments and modifications must be made to the representation and update method of pheromone in the basic ACO algorithm according to the characteristics of the problem, in order to make it suitable for solving specific problems. In this section, we define the ant colony model of GEBPOC according to the characteristics of this problem, and enhance the positive feedback effect of ACO by improving the state transition rules and dynamic adaptive parameters. In order to avoid the premature or stagnant phenomenon of the ant colony algorithm in the search, we also introduced the variable neighborhood search method to improve the ACO to further enhance the global search ability and convergence speed of the algorithm.

A. CONSTRUCTION OF ANT COLONY MODEL
Suppose there are d ants, m machines and n jobs in the system. Each step an ant takes to select a target machine for a job, the ant completes a tour after n times, that is, the schedule process of n jobs is completed, thereby obtaining a scheduling scheme for this problem, which is represented by an n-dimensional that job J j is executed by machine M d j For example, when m = 2 and n = 4, the vector (1, 2, 2, 1) corresponds to a feasible schedule, which means that both jobs J 1 and J 4 are assigned to machine M 1 , while both J 2 and J 3 are scheduled on M 2 .
For GEBPOC, if the load of each machine is greater than or less than the regular working time t, then such schedule is the optimal solution of this problem, and the corresponding objective value is the lower bound LB of this problem, i.e., In scheduling problems, the quality of the solution is generally evaluated by fitness. According to the characteristics of the problem, we need to consider not only the overtime of each machine, but also the longest machine load in the solution space, so we can define the fitness function as The ACO algorithm finally outputs the optimal schedule corresponding to the solution with the minimal fitness value searched by the ant colony after many iterations.

B. PHEROMONE EXPRESSION FOR GEBPOC
In GEBPOC, how to reasonably express and store pheromone is the key to the realization of ant colony algorithm. Since the performance of each machine in the problem is identical, jobs are processed regardless of which machine executes it, they are only affected by those jobs that are assigned to the same machine. In this paper, we refer to this property as a job's matching degree (denoted as τ ij ) and store it as the pheromone concentration to guide the ants in choosing a appropriate machine for jobs.
If the job set that has been processed on the machine M i is S i , then for the next new job J j , whether it can also be arranged to be processed on M i needs to be selected with reference to the pheromone left by the ants and the state transition probability. We denote the average degree of matching between job J j and the assigned job subset S i on machine M i at time t (amount of information) as σ ij (z), i.e., where |S i | refers to the number of jobs that have been processed on machine M i .

C. STATE TRANSITION STRATEGY
In order to better select a suitable machine, ant k needs to select a machine M i for job J j to process it according to the random probability rule of (4): VOLUME 10, 2022 where q 0 = log N c log N max ∈ [0, 1] is the adaptive threshold of the ACO algorithm (N c is the current number of iterations of the algorithm, N max is the preset maximal number of iterations), q is a uniformly distributed random number in interval [0, 1]. The state transition probability of the ant at time z is represented by p ij (z), which reflects the probability that the ant assigns the job J j to M i at time z according to the residual pheromone and heuristic information during the search process. It can be concluded from the above rules that when q ≤ q 0 , the ant can choose the next point according to the previous knowledge, otherwise it will choose the next point according to the random probability. The prior knowledge takes into account the load of the current machine. When the load of the machine M i is small, the probability of the machine being selected is relatively large, and vice versa. Once the machine load exceeds the regular working time t, the probability of M i being selected is almost zero. (Note that the value of q 0 is very small at the beginning of the iteration, which implies that the ants will randomly search to ensure that the search space is large enough. As the value of q 0 increases gradually, the ants will conduct deterministic search with a large probability, and then select the trail traveled by the elite ants, so that the ants gradually approach the optimal solution area.)

D. VARIABLE NEIGHBORHOOD SEARCH
In the ant colony algorithm, the search process often falls into a local optimum phenomenon. For this reason, we introduce a variable neighborhood search algorithm into ACO to search the optimal solution generated by the ant colony in each iteration by multiple neighborhood structures. Thus, the search efficiency of ACO is improved. For GEBPOC, the selection of the neighborhood structure needs to consider the permutation of jobs on different machines and the machine with the largest load. In this paper, we present the following three neighborhood structures (e.g. m = 2, n = 5, t = 7).
• Move. In the solution π, the job J 3 processed on M 2 is moved to M 1 for execution, and a new solution π 1 is generated (cf. Fig. 3(a)).

• Symmetric swap. (in short, Swap)
In solution π, a new solution π 2 is generated by exchanging the job J 2 on M 1 with a job J 4 on M 2 (cf. Fig. 3(b)).
• Asymmetric swap. (in short, Aswap) In the solution π, the two jobs J 3 and J 4 on M 2 are exchanged with J 2 on M 1 , and a new solution π 3 is generated (cf. Fig. 3(c)). After many experiments, it has been shown that the algorithm can show better performance if the algorithm is performed in the sequence of Move, Swap, and Aswap, so we will introduce these three neighborhood structures in this sequence. According to the above three neighborhood structures, the following variable neighborhood search algorithm is designed by us:

E. IMPROVED PHEROMONE UPDATE RULES
The pheromone needs to be updated immediately after the ant completes a tour, and the update rule is carried out according to the following equations, ij (z) = R L * , if J i and J j are assigned to a same machine, 0, otherwise (15) where L * represents the total overtime in the best found schedule so far.
In this paper, we adopt the global pheromone update rule. After the ant completes a traversal, the pheromone is updated according to Eqs. (14) and (15), so the amount of information between the jobs assigned to the same machine will increase. Based on the above analysis, we design the following improved ant olony optimization algorithm (IACO) for solving the problem GEBPOC: Algorithm 2 IACO 1: Initialization, Set the initial pheromone and t = 0. Let the current number of loops and the maximum number of iterations be denoted by N c and N max , respectively.
Let the initial amount of information σ ij (0) = R. m ants randomly select a job J j and assign it to any machine randomly, then tabu k = {J j } and allowed k = J \ tabu k , and the optimal solution is denoted by π * ; 2: Calculate the lower bound LB of problem; 3: while totalcost(π * ) = LB and N c < N max do 4: for k = 1 to m do 5: Ant k selects the target machine for each job according to the random probability selection principle of (12) until the set allowed k is empty; 6: The schedule obtained by ant k is evaluated using the fitness evaluation function ((10)); 7: Let π 0 = π * and call the Algorithm 1; 8: end for 9: The pheromone update is performed by (14); 10: Update the current best solution found by ant colony; 11: end while 12: Output the historical best found solution π 0 .

IV. IMPROVED DISCRETE PARTICLE SWARM OPTIMIZATION ALGORITHM FOR GEBPOC
In Section 2.2, we have introduced the idea of the basic particle swarm optimization algorithm (PSO), which presents a feasible solution to a specific optimization problem through the position structure of particles, and then uses the iterative process of particle velocity and position changes to continuously evolve, so as to gradually approach the best position. However, the performance of PSO for some discrete variables is not very satisfactory, because the original PSO algorithm can only optimize problems in which the elements of the solution are continuous real numbers [40]. This section describes how the discrete particle swarm optimization (DPSO) algorithm can solve the problem GEBPOC. In order to obtain better quality initial particles, we employ some heuristics in the population initialization process, so that the particles can cover the entire search space with a large probability, and denote this modified algorithm as MDPSO.
It is acknowledged that the PSO algorithm provides a general framework for solving optimization problems. However, for a specific problem, the key lies in the representation of solution, the definition method of operators, the construction method of initial solution and the setting of termination conditions. Some related technologies of the MDPSO algorithm proposed for GEBPOC are introduced as follows: 1) Representation of the solution.
In order to establish a direct relationship between the solution space of GEBPOC and particles, the solutions corresponding to the assignment of jobs to machine are represented by an n-dimensional array (d 1 , d 2 , · · · , d n ) similar to that in Section 3.1. For example, when m = 2 and n = 5, the 4-dimensional array (1, 2, 2, 1, 1) corresponds to a feasible schedule, which implies that jobs J 1 , J 4 and J 5 are all processed on machine M 1 , while jobs J 2 and J 3 are both executed by M 2 . 2) Definition of operators in particle update.
Based on the idea of the basic PSO algorithm in Section 2.2, we give the following update method for the new velocity and position of particle k in the (z + 1)th iteration, combining the characteristics of the specific problem. In effect, we redefine what each operator expresses, i.e., where R 1 and R 2 are n-dimensional random variables consisting of 0 and 1. The addition ⊕, subtraction , and multiplication ⊗ operations between tuples need to be performed during the algorithm iteration. Therefore, we have to redefine these three operators according to the nature of the problem. The technique of redefining operators in this paper refers to the ideas in literature [40]. a) Redefinition of the subtract operator . The subtract operator is mainly used to express the difference between the current position (X z k ) and the best position (pB z k or gB z ) of particle k. The difference between two tuples is calculated by comparing whether the element at each position in X z k is the same as the element at the corresponding position in pB z k or gB z . If so, the element at the corresponding position in the final tuple is assigned zero; otherwise, the value at the corresponding position in the tuple pB z k (or gB z ) is reserved as the calculation result. Note that the jobs corresponding to the positions with the same elements in X z k and pB z k (or gB z ) need to be arranged according to the longest processing time (LPT) strategy and assigned to machine in turn when the machine is idle. The redefinition process of the subtract operator is shown in Fig. 4 (still take m = 2, n = 5 as an example). b) Redefinition of the multiply operator ⊗.
The multiply operator ⊗ is mainly used for the operation between the random variables R 1 , R 2 and the result of the subtract operation. It is also a process of data selection, which can improve the search ability of DPSO. This operator first defines R 1 and R 2 as two n-dimensional arrays, each of which has a randomly generated value of 0 or 1. A ⊗ B is a simple multiply arithmetic operation between the elements of two n-dimensional tuples. If the element at the jth position in tuple A is 1, then the element at the jth position of the operation result is equal to the element at the jth position in tuple B; otherwise, if the element at the jth position in A is 0, then the element at the jth position of the operation result is also 0, where j = 1, 2, · · · , n. The process of redefining the multiply operator ⊗ is shown in Fig. 5.

c) Redefinition of the add operator ⊕.
The add operator ⊕ is the final operation to obtain the velocity and position of a particle in a new iteration, and this operator must guarantee that the obtained result is a reasonable solution. In this paper, we regard ⊕ as the crossover operator in genetic algorithm. In fact, we randomly select two cut points from the particle chain, and then exchange the chain between these two cut points, which tends to produce two new particle chains in the end. During the operation, we generally randomly select a particle chain as the result of the add operation. The process of redefining the add operator ⊕ is shown in Fig. 6.   FIGURE 6. Redefinition of multiply operator ⊕ (one of the two new particle chains generated will be randomly selected as the result of the A ⊕ B operation).

3) Initialize the population.
The DPSO algorithm in this paper defines the population size as 20, that is, there are 20 particles in the system, and they are all n-dimensional arrays generated in a random way. In order to enable particles to cover the entire search space with a large probability, prevent the algorithm from falling into local optimum, together with avoid reducing the search efficiency due to blind search, we adopt some simple heuristic algorithms (such as longest processing time first (LPT), shortest processing time first (SPT)) in Algorithm 3 to obtain a higher quality initial population.

4) The termination condition of algorithm.
The DPSO algorithm is essentially a process of gradually replacing and seeking the optimum. A particle k combines the best solution pB z k searched by itself and the global best solution gB z searched by the population to continuously update its speed and position, making it gradually approach the position of the optimal solution. When the algorithm triggers the termination condition, it will stop running and output the schedule corresponding to the current global best solution gB z found by the population. In this paper, the MDPSO algorithm stops as soon as one of the following two situations occurs: the currently found global best solution gB z has not changed after 200 iterations or the lower bound in (9) is just obtained.
Combined with the introduction of the above related technologies, we formally present the MDPSO algorithm for GEBPOC.

A. EXPERIMENTAL SETUP
In this section, we compare the performance of some algorithms proposed in the previous two sections through data experiments.In addition to this, we also compare the results of IACO and MDPSO with those of several other wellknown meta-heuristics: variable neighborhood search algorithm (VNS) [51], simulated annealing algorithm (SA) [24], ant colony optimization algorithm (ACO) [26] and particle swarm algorithm (PSO) [25]. These algorithms have been proven to have excellent results when solving engineering optimization problems. We evaluate the ability of these algorithms to solve the GEBPOC problem by setting small-scale and large-scale instances, respectively. All the instances in this section are from literatures [52] and [24]. Since the calculation of the objective function value involves the selection of regular working time t, we set t = h · P to verify the influence of different values of t on algorithms, where h ∈ {0.2, 0.4, 0.6, 0.8} and P = j:J j ∈J p j . For convenience, in the whole experiment, we assume the unit time cost of overload c = 2. Therefore, the following two experimental frameworks are finally generated (cf. Table 1), where m, n and p represent the number of machines, the number of jobs and the processing time of a job, respectively. U (a, b) means

Algorithm 3 MDPSO
1: Initialization, Set z = 0 and there are 20 particles in the population; 2: for k = 1 to N do 3: particle X z k at random; 4: Let pB z k = X z k ; 5: end for 6: gB z = {X z l |l = arg min k {totalcost(X z k )}}; 7: while the maximal number of iterations z < 200 and totalcost = LB do 8: for k = 1 to N do 9: The velocity of particle k is updated by (16); 10: The position of particle k is updated by (17); 11: if totalcost(X z+1 k ) < totalcost(pB z k ) then 12: Let pB z+1 k = X z+1 k 13: else 14: Let pB z+1 k = pB z k ; 15: end if 16: end for 17: if totalcost(gB z ) > min k {totalcost(pB z+1 k )} then 18 Set z = z + 1; 23: end while 24: Output the historical best found solution gB z . that the processing time of each job is distributed randomly and uniformly in the interval [a, b].
These two sets of experimental frameworks yield 3 × 3 × 2 × 4 + 6 × 4 = 96 different (m, n, p, h) combination settings in total. To increase the reliability of the algorithm, 30 sets of test data were randomly generated for each (m, n, p, h) combination, and the results were averaged. In this way, there are a total of 2880 computing instances. In this paper, all algorithm experiments are coded in Python 3.8, and tested on a laptop with Ryzen Core R7-4800H 2.90 GHz CPU and 16 GB RAM.
We test the performance of each of these seven algorithms (LPT, VNS, SA, ACO, IACO, DPSO, and MDPSO) using the data generated by the above methods. The parameters of algorithm ACO are set as follows: the number of ants d = 20, α = 1, β = 5, and N max = 200. The population size in DPSO is equal to the number of ants in ACO, and the maximal number of iterations is also 200, which of course is a parameter involved in several other meta-heuristics. To avoid chance of results, for each combination of (m, n, p, h), 30 instances were generated.

B. ANALYSIS OF RESULTS: RATIO AND TIME
In this section, we test the performance ratio and required execution time of these meta-heuristics. The average results obtained from experiments E1 and E2 are shown in Tables 2-5. In each table, the columns ''Ratio'' calculate the ratio between the criterion values given by proposed algorithms (the optimal or near optimal) and LB, and they are presented with 4-digit precision, while the columns ''Avg.time'' indicate the time consumption of the corresponding approach, where LPT in milliseconds, other algorithms in seconds.
The results from Table 2 show that the ratio of the intelligent optimization algorithm proposed in this paper is closer to 1.0000 than LPT algorithm when solving the same problem instance for the case of m = 3, due to the small scale of the problem, while the LPT algorithm run much faster than other algorithms in general. For a specific n, the output value will be closer to the optimal value when a certain algorithm is used to solve the problem as the value of h increases. In particular, when h = 0.8, the ratio corresponding to almost all instances is 1.0000, which means that when the regular working time is large, the jobs can be processed without overtime. However, the average running time is not much different. For both ACO and IACO algorithms, IACO is stronger than basic ACO in terms of ratio, because the search ability of the global optimal solution of the algorithm is significantly improved after the variable neighborhood search algorithm is introduced in IACO, but the running time will be slightly increased, the average running time of IACO is about 7 times that of ACO. Compared with the DPSO algorithm, the ratio of MDPSO is closer to 1.0000, due to some heuristic improvement strategies adopted in the population initialization process, but the average running time of MDPSO is 2 times slower than that of DPSO. In most problems (except a few simple problems), the average solutions obtained by the IACO algorithm and the MDPSO algorithm are better than LPT, VNS, SA, ACO and DPSO. In addition, the running time of the above seven algorithms becomes longer as the number of jobs increases. Tables 3 and 4 report the comparison results obtained from the case of m = 4 and m = 5, respectively. The variation in ratio and average running time of these two cases is similar to the case of m = 3. As the number of machines increases, the running time of each algorithm also increases, but the increase in LPT is larger than that of the other metaheuristics. Compared with the results in Table 2, the ratios  corresponding to Tables 3 and 4 to when h = 0.8 are rarely VOLUME 10, 2022  equal to 1.0000, which is also due to the increase in the number of machines. In Table 3, the average running time of IACO is about 2 times that of ACO, while the average running time of MDPSO and DPSO differs by only 0.02s. In Table 4, the average running time of IACO is only 0.27s away from ACO, and MDPSO runs faster than DPSO in average time. Although VNS, SA, ACO, and DPSO have smaller average execution time than IACO and MDPSO, the former are more prone to falling into local minima. Therefore, in terms of running time, as the number of machines increases, the effects of IACO algorithm and MDPSO algorithm begin to emerge.
The second set of experiments E2 were tested on some large scale problems, and a longer job processing time was set. The experimental results are shown in Table. 5. This set of experiments omits the setting of different regular working time, since experiment E1 has already verified its effect on algorithms. The experimental results show that at the level of the number of jobs, the time consumed by the LPT algorithm is basically not affected by the number of machines, which is due to the time complexity O(n log n) of LPT. As the problem size increases, the average performance of the LPT algorithm becomes weaker. The difference in running time between LPT algorithm and other meta-heuristics grows with the number of jobs. This implies that the bigger instances, the higher improvement in time consumption can be observed, which is also a big advantage of these proposed meta-heuristic algorithms from the practical point of view. Secondly, compared to the LPT, VNS, SA, ACO and DPSO, the improved IACO and MDPSO beats them not only from the point of view of time efficiency, but also from the point of view of the ratio and the ability to solve problem instances in a reasonable time, which means that the decrease in computational complexity for the improved IACO and MDPSO algorithms in comparison to the previous algorithms allowed for predicting the decrease of running time, but the 124868 VOLUME 10, 2022  computational experiments showed how significant this decrease is in practice. More precisely, we observed that both theoretical and practical improvements are significant for big instances and both of these improvements are relatively minor when scale of instances is small, that is, IACO and MDPSO algorithms have better stability when solving large scale instances.

C. ANALYSIS OF RESULTS: FUNCTION VALUE AND FRIEDMAN RANK
To further illustrate the significant advantages of the proposed algorithm, we also select 8 instances from the two sets of experimental frameworks in Table 1, i.e., (3,15 Table 6. Four performance indicators (include ''Best'', ''Worst'', ''Mean'', and ''Standard deviation (SD)'') are used to validate the effectiveness of the proposed IACO and MDPSO in conjunction with other state-of-the-art optimization algorithms. Moreover, a nonparametric statistical test called the Friedman ranking test is applied for a fair performance comparison with other existing optimization methods. The Feldman ranking test is a non-parametric multiple hypothesis test of repeated measures ANOVA that gives the ranking of different algorithms on each dataset and finally calculates the mean of the ranking of each algorithm on all datasets, if all algorithms have no performance difference, then the average ranking of their performance should be equal.
It can be clearly seen from the information in Table 6 that our proposed IACO and MDPSO algorithms are better than LPT, VNS, SA, ACO and DPSO in terms of best fitness, worst fitness and average fitness. In particular, MDPSO is more competitive than IACO in all of these areas. MDPSO also outperforms more than half of the algorithms in terms of standard deviation. Therefore, we can see that a more balanced scheduling scheme can be obtained significantly by the MDPSO algorithm. It is worth mentioning that, for each instance, the standard deviation presented in the results in Table 6 is actually consistent with the Friedman ranking. Although VNS and DPSO ranked first in the 4th and 5th groups of instances, respectively, in the final Friedman ranking, the MDPSO algorithm was firmly in first place, followed by IACO and DPSO, respectively, while the LPT algorithm ranks last. This means that the quality of the solutions obtained by MDPSO and IACO is relatively excellent, and as the scale of the problem instance increases, the solution accuracy of LPT will decrease significantly, and VNS, SA and DPSO will also easily fall into local optimum.

D. ANALYSIS OF RESULTS: SOLUTION COMPARISON AND CONVERGENCE
In order to more comprehensively present the number of good-quality solutions output by these algorithms, we also record the number of corresponding solutions obtained by comparing some previous algorithms with IACO and MDPSO, respectively. We only record the results of 30 independent runs for the 8 instances selected in the previous subsection. The experimental results are shown in Table 7, where we gives an indication on how many times a given algorithm reports a better objective value comparing with respect to the other algorithm. For example, a value num 1 /num 2 in column DPSO/MDPSO implies that, out of 30 problems generated by each combination, there are num 1 problems for which DPSO yeilds a better solution than MDPSO, num 2 problems for which MDPSO performs better, and the other 30 − num 1 − num 2 problems for which DPSO and MDPSO yeild the same total cost.
It can be seen from the comparison results that after 30 experiments on the same set of instances, LPT, VNS, SA, ACO and DPSO produce significantly less good solutions than IACO and MDPSO, among which DPSO has the best performance, with an average proportion of 26.6%, and the average LPT algorithm with the worst performance is only 10.0%. In addition, the larger the instance size is, the smaller the number of times the output solutions by other algorithms  beat IACO and MDPSO, which highlights the superiority of our improved algorithm in solving GEBPOC. Of course, the experimental results also show that the number of good solutions output by IACO is less than that of MDPSO.
To further observe the convergence of the above several meta-heuristic algorithms, Fig. 7 also shows the change curve of the minimum total cost after 200 iterations when we select instances (3, 15, U (20, 50), 0.4), (4, 20, U (20, 50), 0.2), (5, 20, U (20, 50), 0.2) and (10, 100, U (100, 800), 0.1), respectively. In these four convergence curves, the algorithms IACO and MDPSO have faster convergence rates than other meta-heuristic algorithms. The point to be made here is that the global optimization capability of IACO is stronger than that of VNS, SA, ACO and DPSO. For example, it can be seen from the Fig. 7 (d), because of the basic ACO is not optimized and easily falls into local optimal, the algorithm only converges after the 185 th iteration, while the IACO has already converged at the 170 th iteration due to the introduction of the VNS method in the algorithm, which impling the ability of IACO to jump out of the local otimum is better than basic ACO algorithm. The basic DPSO algorithm and the improved MDPSO algorithm obtain the optimal solutions at the 179 th iteration and 161 th iteration, respectively. Therefore, VNS, SA, ACO and DPSO algorithms have significantly weaker convergence than IACO and MDPSO, respectively, because their performance has not been optimized. ACO and DPSO can only obtain sub-optimal solutions to the problem, while the improved algorithm can get close to optimal solutions. Meanwhile, IACO has better convergence than DPSO, but slower than MDPSO.

VI. CONCLUSION
In this paper, we consider the generalized extensible bin packing problem with overload cost using some meta-heuristics based on the existing work of Denton et al. [1], [2], to the best of our knowledge, this is the first attempt to solve this problem with an meta-heuristics algorithm. This model provides a powerful tool for management decision-making in outpatient operating room allocation in a healthcare setting and in servers handling large tasks. According to the characteristics of the problem, we use improved ACO algorithm and discrete particle swarm optimization algorithms to solve this problem. We enhance the positive feedback effect of ant colony optimization by improving the state transition rules and dynamic adaptive parameters. In order to avoid the premature or stagnant phenomenon of ant colony algorithm in the search, a variable neighborhood search method is also introduced, which further improves the global search ability and convergence speed of the algorithm. In addition, in order to ensure the uniform distribution and high-quality characteristics of the initial particle swarm, some heuristic methods are adopted in the initialization process of the particle swarm, so that the initial particle can cover the entire search space with a large probability. Computational experiments show that for the same problem instance, the proposed IACO and MDPSO algorithms outperform other metaheuristic algorithms in most cases. A number of results are achieved from solving this model and managers can make a reasonable choice according to the actual situation they are dealing with, which is very in line with the needs of economic, social, medical and green manufacturing.
For further studies, since the model studied has many uncertainties in practice, we need to consider more constraints, such as the regular working time of each server may be different, and the cost of overloading each machine may also be different. Of course, in the actual operating room allocation process, since the number of operations to be completed is dynamically known, we must also dynamically make decisions accordingly. We will conduct further research on these cases. Moreover, in the DPSO algorithm, it can be seen from the experimental results that it is not enough to get rid of particles trapped in local minima only by improving the quality of the initial particle population, and the search process of the best solution is often difficult to take into account the balance of ''detection'' and ''development'' at the same time. The new and improved DPSO algorithm will be a very interesting research direction in the future. He is currently working as a Professor and a Doctoral Supervisor with the School of Mathematics and Statistics, Yunnan University. His current research interests include discrete optimization, theoretical computer science, computational economics, and algorithmic game theory and its applications. VOLUME 10, 2022