An Improved Honey Badger Algorithm by Genetic Algorithm and Levy Flight Distribution for Solving Airline Crew Rostering Problem

Airline crew rostering problem contains a variety of rules and constraints, and there are almost countless possible scheduling schemes. It is the most complex and important link in the entire crew scheduling plan. In this paper, we build a model that includes qualification constraints. In this paper, we consider two models with qualification constraints with different objective functions, namely minimizing the total cost of the airline and balancing flight utility among pilots as much as possible. To solve this model, the Levy flight is used to improve the ability of the Honey Badger Algorithm (HBA) to jump out of local optima, and the crossover and mutation operators in the Genetic Algorithm (GA) are used to improve the quality of the solution. This improved HBA algorithm significantly improves convergence and solution accuracy. In addition to this, we verified the improved HBA algorithm on 6 instances, of which 4 instances do not contain any qualifications, and 2 instances contain high-qualification flight pairings. The good results of the improved HBA show that it has excellent performance in both objective functions.


I. INTRODUCTION
According to the statistics of major airlines in the past, the cost of crew is the second largest airline cost after fuel cost [1]. But the fuel cost is usually fixed, and the cost of the crew is largely manageable [2], [3], [4]. Table 1 lists pilot costs (annual crew fees, salaries, and benefits) for some of China's major airlines in 2019. Thus, the reasonable arrangement of the crew is very important for the airlines. In the past few decades, the problem of crew scheduling has attracted extensive research in both theory and practice, because a good scheduling scheme can save airlines hundreds of millions of dollars each year. However, the crew scheduling problem is NP-hard, which means that we cannot find the optimal solution to the problem in polynomial time [6]. Due to the large scale of flights involved and the need to strictly comply with complex work rules, in order to reduce the difficulty of solving, the crew scheduling problem can usually be divided into crew pairing The associate editor coordinating the review of this manuscript and approving it for publication was Shun-Feng Su . problem (CPP) and crew rostering problem (CRP) to be solved in turn. CPP constructs a set of flight loops that can cover all flight segments and depart from one base and return to the same base, while CRP assigns the constructed flight loops to each crew member to generate a personal schedule. In some surveys [4], [7], the contents of these two parts have been presented in detail. In this paper, we mainly study CRP, because CRP affects the fairness of the crew. The purpose of this study is to generate a set of low-cost and highly fair schedules.
CRP has been a problem that has puzzled researchers in the industry for many years. In combinatorial optimization, CRP is also divided into NP-hard problems [8]. This problem can be described as orderly assigning all flight pairings to pilots on the basis of satisfying laws and regulations and optimizing the objective function. The objective function of the airline crew roster problem usually has the following types: 1. Minimize the cost of the airline. 2. The income balance among pilots. 3. Minimize the amount of fatigue between pilots. In the method of solving non-deterministic polynomial (NP)-hard problems, some intelligent heuristic VOLUME 10, 2022 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ TABLE 1. Cabin crew costs for major Chinese airlines in 2019 [5].
algorithms are widely used, such as genetic algorithm and particle swarm algorithm, etc. In the existing studies, most of the literature assumes that the assignment of flight pairings to pilots is not constrained by qualifications. However, in real life, the influence of qualifications is very large. For example, if a pilot lacks a certain qualification, he cannot assign the flight pairings that require this qualification to this pilot. Therefore, this paper proposes an improved honeypot optimization algorithm, that is, in the process of updating the solution, adding Levy flight, and then using the two algorithms in the genetic algorithm to further improve the quality of the solution. This reduces the chance of getting stuck in a local optimum during the solution. In addition to this, we have developed a mathematical model that takes into account the qualification issue. Compared with other scholars' research, this paper has the following contributions: 1. A mathematical model that is more suitable for the real world is proposed. 2. In order to solve CRP, we propose an improved HBA algorithm, which reduces the possibility of the algorithm falling into local optimum. At the same time, a discrete version of the HBA algorithm was also designed. 3. Verified the effectiveness and feasibility of the improved HBA algorithm in solving large-scale CRP.

A. RELATED WORK
In recent years, many scholars have studied CRP. In the following, we will review the literature in terms of mathematical models and solution methodology.
Freling et al. [9] formulated CRP as an ensemble partition model to minimize the total cost of airlines, and employed a branch-and-bound algorithm to solve a monthly instance. Gamache et al. [10] formulate CRP as an ensemble partition model to minimize total airline costs, and employ a branch-and-bound algorithm to solve a monthly instance crew scheduling problem. The crew scheduling problem of Kasirzadeh et al. [11] is formulated as a set coverage problem and solved with a column-generating algorithm on a real dataset. Medard and Sawhney [12] introduced the crew scheduling problem and the crew recovery problem, in order to solve the crew recovery problem, they proposed a new method based on tree search and column generation algorithm. Based on European airlines, Nissen et al. [13] formulated a new mathematical model for the airline crew rearrangement problem, which covered various laws and regulations in a resource-constrained manner. Then the model is solved by branch pricing, and the calculation results show that a new solution can be quickly provided when the scheduling is disturbed. Saddoune et al. [14] integrated the crew scheduling problem, that is, the crew scheduling problem is no longer divided into two stages of CPP and CRP to solve. To obtain shorter computation time, they propose a dual dynamic constrained aggregation method. This approach not only improves the solution quality of the scheme, but also reduces the average computation time by a factor of 2.3. Souai and Teghem [16] established a model and two scheduling networks using the real data of an airline in Taiwan, and solved this problem by using a column generation algorithm. Doi et al. [17] proposed a CRP based on fair working hours, and in order to satisfy various hard constraints, a two-level decomposition algorithm was proposed. The calculation results show that this method can effectively obtain the solution of this problem.
In the above studies, the column generation algorithm or the derivative algorithm of the column generation algorithm is used to solve the problem. However, in addition to column generation algorithms, some intelligent optimization algorithms are also widely used when solving CRP. Compared with the column generation algorithm, the intelligent optimization algorithm has the characteristics of fast solution speed. Souai and Teghem [16] described the disadvantages of dividing the crew scheduling problem into two sub-problems, CPP and CRP, therefore, they proposed 3 heuristics based on genetic algorithms to solve these two sub-problems simultaneously. Considering both CPP and CRP, Saemi et al. [18] proposed a new formula and designed a heuristic algorithm based on ant colony optimization to solve it. The calculation results show that the ant colony optimization algorithm has a good effect in solving the problem of synthesizing CPP and CRP.
In the previous studies, all focus on the single-target situation. Most of them set the objective function to minimize the airline's expenditure, and a few set the objective function to consider the balance of working hours. In recent years, some scholars have begun to turn their research direction to multiobjective situations. In CRP, Zhou et al. [19] considered both the workload of the crew and the satisfaction of the crew, and proposed a more useful multi-objective model. In order to solve this multi-objective model, a multi-objective ant colony algorithm was proposed to effectively solve the problem. Chutima and Arayikanon [20] optimized four objectives at the same time. In order to solve this model, a hybrid algorithm of multiobjective evolutionary algorithm based on decomposition(MOEA/D) and honey-bees mating optimization algorithm (HBMO) was proposed. The calculation results show that the hybrid algorithm outperforms these two algorithms.
It is clear from the literature review above that the CRP problem has received extensive attention from researchers over the past two decades. But only a few articles directly address this issue. Because the captain and co-pilot not only need to depend on basic laws and regulations, but also need to consider their various qualifications and matching, etc. For example, the qualification of the captain in charge is C1, only the co-pilot with C1 qualification can match with it.
As far as the objective function is concerned, we can know that the objective function can be divided into single objective and multi-objective. As far as constraints are concerned, most literatures only consider some of the most basic constraints, while a small number of literatures consider constraints such as language. However, in this paper, we further consider the qualification constraint, which is an essential part of real world airlines. In fact, most constraints can be described as qualification constraints, such as language constraints, etc. Therefore, in this paper, we propose a model that is closer to the real world. To solve this model, we propose an improved honey badger algorithm by genetic algorithm and levy flight distribution and we call this improved algorithm the Honey Badger Algorithm-Genetic Algorithm(HBA-GA). And for CRP, we also designed a discrete version of this hybrid algorithm. In addition, we conduct a comparative analysis of the proposed hybrid algorithm and various other well-known metaheuristics and hybrid algorithms on several different datasets. Simulation experiments show that HBA-GA is an effective hybrid heuristic algorithm for solving CRP.

II. AIRLINE CREW ROSTERING PROBLEM
In our research problem, the aim is to assign pilots various training tasks while identifying flight tasks while minimizing cost and maximizing fairness among pilots. Some important features of CRP are as follows: (1) The pilot's departure station and end station should be at the same airport; (2) If a flight pairing is assigned to a pilot, the pilot cannot assign other tasks during this time period.

A. RELATED DEFINITIONS
To make it easier for those who are less familiar with the aviation industry, we first introduce some related terms.
Time: Airline operations span time and space. The time expression in this article consists of the year, month, day, hour and minute, and the precision below the second is not counted.
Airport: The departure and arrival of flights, as well as the departure and arrival of crew members, are all based on the airport. Two flight loops can only be assigned to the same pilot if their landing and departure airports are the same.
Crew: The research in this paper is only for pilots, but is fully applicable to flight attendants and marshals. Piloting of modern civil aircraft usually requires two pilot qualifications: captain and co-pilot. Each crew pilot has the following attributes: 1. Each crew member has a fixed base, expressed by the airport.
2. Each crew member has one and only one primary qualification, but may also have other alternate qualifications.
Flight: Refers to a takeoff and landing of an aircraft. Also called a leg when the flight applies to crew scheduling. Each flight includes the following attributes: 1. Every flight has a given departure time, and an arrival time.
2. Every flight has a given departure and arrival airport.
3. Each flight has a given minimum crew composition. Duty: A duty consists of a series of leg (flight or boarding) and connecting times. A duty is shown in Figure 1, where A1, A2, A3 and A4 represent 4 airports respectively, and 6142, 6143, 4133 and 4134 represent 4 flights. These 4 flights and the interval between flights together form a duty. The following relationships are satisfied between legs on the same duty: 1. The arrival of the previous leg is the same as the departure airport of the next leg.
2. The interval between two adjacent flight segments must satisfy the shortest connection time constraint.
3. Each leg in the same duty must take off on the same day, but the arrival time is not limited by this.
4. Connection time between legs counts toward duty time, but not flight time. The start time of duty is calculated from the departure time of the first leg performed on the day, and the end time is calculated according to the arrival time of the last leg.
Pairing: A pairing consists of a sequence of duty and rest periods, starting from and eventually returning to one's own base.
Roster: Each crew has a roster in each scheduling cycle, which consists of a series of pairings and vacations, and a certain number of vacation days are met between the pairings.

B. OBJECTIVES AND CONSTRAINTS
Each flight pairing has start time, end time, duty time, flight time and duration, etc. In airlines, there are often a variety of pilots with different qualifications, and pilots with different qualifications can fly different flight pairings. In this section, we build a model that takes into account the following constraints: 1. In a scheduling cycle, the flight time and duty time of each pilot shall not exceed the maximum flight time and maximum duty time stipulated by the Civil Aviation Administration.
2. The interval between all flight pairings performed by the pilot is greater than the minimum rest period.
3. One captain and one co-pilot must be assigned to each flight loop. We first define the sets, parameters and decision variables that are needed in mathematical programming. The detailed symbol definitions are shown in Table 2.

1) OBJECTIVE
In this paper, we consider two objectives and will solve for each of them separately. Although there are many goals that airlines focus on, these two goals are the most concerned by airline decision makers. The detailed explanation and mathematical expression of these two goals can be described as follows.
Objective 1: Balance of benefits between pilots of the same qualification. This goal is mainly to minimize the salary difference between pilots, so as to prevent pilots' salaries from being divided into two levels and causing pilots to change jobs. Therefore, the objective function 2 can be formulated as the following: In the above formula, v * i and v * j are the total revenue in one scheduling period.
Objective 2: Minimize airline costs. Of all costs for an airline, in addition to fixed costs, it is often possible to change the overall cost of crew duty by adjusting the allocation of flight pairings. Therefore, the objective function 1 can be formulated as the following: 2) CONSTRAINT Constraint 1: Any flight pairing requires a captain and a co-pilot. (1)

Constraint 2:
In one period, the flight time of the captain and co-pilot does not exceed T 1 .
Constraint 3: In one period, the captain and co-pilot's duty time shall not exceed T 2 .

Constraint 4:
The qualifications of the captain and co-pilot are not lower than the minimum requirements of the flight pairing.
Constraint 5: Each pilot can only perform one pairing per day.
Constraint 6: In order to ensure that the pilot has plenty of energy and reduce flight accidents. In any 144-hour period, at least 48 hours of rest is scheduled for pilots. At the same time, 2 days off must be arranged after 4 consecutive days of flying. In Equation 10 and Equation 11, when d is the last days in a period, it can be connected to the next period

III. BACKGROUND FOR HONEY BADGER ALGORITHM AND LEVY FLIGHT
We discuss mathematical notation and formulas for honey badger algorithm (HBA) [21] and levy flight in this section.

A. HONEY BADGER ALGORITHM
Honey badger algorithm (HBA) is a new intelligent optimization algorithm developed by Egyptian scholars Hashim et al. [21] in 2021 based on the foraging behavior of honeypots. The algorithm simulates two foraging behaviors of honeypots: either digging or following the honeypot, the first is digging mode, and the second is honey-collecting mode. In digging mode, the honeypot uses its sense of smell to locate the food, and then selects a suitable location to catch the prey. In honey-picking mode, the honeypot finds food based on the location of the guided honeypot.

1) INSPIRATION
The main inspiration of the honey badger algorithm comes from the foraging behavior of honey badger. Like other swarm intelligence algorithms, HBA is also divided into 3 stages, namely initialization stage, digging phase and honey phase. The following subsections describe the impact of these two behaviors of the honeypot on the HBA optimizer.

2) INITIALIZATION STAGE
HBA randomly generates an initial solution in the initialization phase.
where m is the number of candidate solutions and n is the number of decision variables.
In addition to this, the odor intensity of the prey needs to be defined. The higher the odor intensity of the prey, the faster the movement speed. The odor intensity can be expressed by the following formula: where r 1 is a random number between 0 and 1 and x prey is the current optimal position. To ensure a smooth transition from exploration to exploitation. Use the formula below to define a decreasing factor α that decreases with the number of iterations to decrease randomization over time.
where t is the current number of iterations, T is the maximum number of iterations, and C is a constant greater than 1.

3) DIGGING PHASE
During the mining phase, the honeypot moves towards the food in a heart-shaped path, which can be expressed by the following formula: where r 2 , r 3 and r 4 are three random numbers between 0 and 1, and β ≥ 1 is the ability of the honeypot to get prey. F represents the search direction, which is determined by the following formula: −1, otherwise where r 5 is a random number between 0 and 1. During the digging phase, the location of the honeypot depends on the strength of the smell and the search impact factor α. VOLUME 10, 2022

4) HONEY PHASE
The behavior of the honeypot during the honey phase can be simulated by the following formula: where r 6 is a random number between 0 and 1. As can be seen from the above formula, the location of the new honeypot is near x prey .

5) PSEUDO-CODE FOR STANDARD HBA
The optimization of HBA starts from randomly generating an initial solution X . In each iteration, the digging phase is used as the exploitation phase of the algorithm, and the honey phase is used as the exploration phase of the algorithm. When r < 0.5, the algorithm approaches the optimal solution through Eq. 13, and when r ≥ 0.5, the algorithm converges toward the optimal solution through Eq. 14, where r is a random number between 0 and 1. Finally, when the stopping condition of HBA is satisfied, the algorithm will stop and find the current best solution. The pseudo-code of the standard HBA is shown in Algorithm 1.

B. LEVY FLIGHT DISTRIBUTION
The Levy distribution is a continuous probability distribution of a non-negative random variable. A Levy flight is a random walk with a Levy distribution for the step size, which is heavy-tailed. When defined as walking in a space of dimension greater than 1, the steps performed are isotropic random directions. Therefore, to prevent the population from getting stuck in a local optimum, we can use Levy flight to change the position of the solution. Eq.15 describes changing the position of the solution by Levy flight, where rand1 and rand2 are random numbers between 0 and 1. UB and LB are the upper and lower bounds, respectively. θ can be calculated by the Eq.16 where is the probability distribution function and β is a fixed constant of 1.5.

IV. THE PROPOSED HBA-GA FOR CRP
Although many population-based metaheuristics have emerged in recent years and have performed well in many Calculate the fitness of the solution set X . 6: Update density factor α.

9:
if r < 0.5 then 10: for j = 1 to N do 11: Update the position: x new,j = x prey,j + F · β · I i · x prey,j + F · r 2 · α · (x prey,j − x i,j )· | cos(2πr 3 ) · (1 − cos(2πr 4 )) | 12: end for 13: else 14: for j = 1 to N do 15: Update the position: x new,j = x prey,j + F · r 6 · α · (x prey,j − x i,j ) 16: end for 17: end if 18: end for 19: Calculate the fitness f new of the solution x new . 20: if f new < f i then 21: x i = x new , f i = f new 22: end if 23: if f new < f prey then 24: x prey = x new , f prey = f new 25: end if 26: end while 27: Returns the best solution x prey . continuous optimizations, a single algorithm does not perform well in large-scale discrete optimization problems. Therefore, we need to improve a single algorithm to achieve a balance between solution efficiency and solution quality. Therefore, this paper proposes an improved honey badger algorithm.
This section introduces a novel optimization algorithm, which we call HBA-GA, as shown in Fig. 2. The proposed algorithm is based on HBA, which combines several operators in GA and the Levy flight method. Our proposed HBA-GA method improves the ability to jump out of local optima when dealing with complex engineering problems. In HBA-GA, two search methods are added. First, several operators in the GA algorithm are used to speed up the convergence speed of the algorithm. Second, adding Levy flight increases the search space of the algorithm. Below we describe the proposed HBA-GA method in detail.

A. ENCODING AND DECODING
In any evolutionary algorithm, one of the first things we need to do is code the problem. A sensible coding method can make the problem much simpler.In CRP, each pilot has a name or number, however, since these numbers are all in the form of strings, we cannot use these numbers directly for encoding. Therefore, we need to convert these strings into real numbers that can participate in operations. We put all pilots into a list, and the position of each list represents a pilot. For example, we put 4 pilots into a list [A001, A002, A003, A004], and then we can get the corresponding pilot's code 1 ← A001, 2 ← A002, 3 ← A003 and 4 ← A004. Since the number of each pilot is unique, we can easily establish a one-to-one correspondence between the number and the real number. After establishing a one-to-one correspondence, we can get the encoding method of the candidate solution. Suppose we have two flight pairings, two captains [A001, A002] and two co-pilots [A003, A004], at this time, a solution that can get a candidate solution is [ [1,3], [2,4]]. Each position corresponds to a captain and a co-pilot, which means that a flight pairing is assigned a captain and a co-pilot.

B. DISCRETISATION OF HBA
As can be seen from the detailed description of the standard HBA and Levy flight, the two algorithms can be directly applied only when solving continuous optimization during the update process of the solution. In other words, the CRP solved in this paper is a discrete optimization problem, and these two algorithms cannot be applied to CRP. Therefore, we also need to develop a discrete version for these two algorithms.

1) GENERATE A NEW UNFEASIBLE SOLUTION
In HBA, for best solution x prey (t) in the population, we can generate a new solution x new (t + 1) according to Eq. 13 or Eq. 14.
Although a new solution can be generated in this way, the new solution may not be a feasible solution due to various constraints of the CRP problem. Therefore, we can take a series of further steps to make it feasible.

2) REPAIR INFEASIBLE SOLUTIONS
We can convert an infeasible solution to a feasible solution by following the steps below: STEP 1: Round down each position of the infeasible solution to produce a vector of integers. STEP 2: For each element that is out of bounds, we take x i,j = x i,j mod UB and the elements that are not out of bounds to form a new vector, where UB is the upper bound of the j-th position. STEP 3: For the resulting integer vector, keep the element at the first incompatible position, and set the element at other positions to 0. STEP 4: For each 0 position, find the element that is unassigned and that occurs most frequently in the population. If such an element exists, assign the element to this position, otherwise skip this step.
STEP 5: For the remaining positions with a value of 0, randomly assign unassigned elements to these positions.

C. CROSSOVER AND MUTATION OPERATORS
In genetic algorithm, crossover operator and mutation operator are two important operators in individual evolution. In the process of breeding the next generation, the exchange of genes in the same position of two different individuals is called the crossover operator. The transformation of a gene at a certain position in a single individual is called a mutation operator. In our proposed HBA-GA method, we will use these two operators to improve the candidate solutions. However, not all individuals will crossover and mutate, but the worst 10% individuals will crossover and mutate with a certain probability. In this process, we set the probability of crossover to 0.8 and the probability of mutation to 0.01. Suppose the father of the two operators selected is father = [ [1,4], [2,5], [3,6]] and the mother is mother = [ [2,6], [3,4], [1,5]]. The length of the cross is 1. The detailed operation steps are as follows.

STEP1:
The offspring first get all the genes of the father, that is, child = [ [1,4], [2,5], [3,6]] STEP2: The point where a cross is randomly generated is r = 2. Since the length of the cross is 1, the position that needs to be crossed is 2.

2) MUTATION OPERATOR
STEP4: The position where the mutation is generated randomly, here is set to 2. The position of the mutation is randomly generated, which is set to 2 here, that is, the second flight pairing is mutated. A Captain 2 and co-pilot 3 are randomly selected from all pilots. At this point, we can get child = [ [1,4], [1,3], [3,6]].
The individual produced after the crossover and mutation operations are completed may not be a viable solution, and we also need repair operation to fix it.

D. DETAILED DESCRIPTION OF HBA-GA
In the proposed HBA-GA algorithm, a set of feasible initial solutions are first randomly generated. Then use HBA and Levy flight with some probability to update the current solution. When rand < 0.5, Levy flight is executed, otherwise, HBA is executed. Finally, the quality of the solution is further improved by the crossover operator and mutation operator. Update the current best solution until the stop condition is met. A complete description of HBA-GA is shown in Algorithm 2.

V. EXPERIMENTAL RESULTS AND DISCUSSION
To demonstrate the performance of the previously proposed HBA-GA algorithm on CRP, we conduct extensive experiments on multiple instances. Our experiments are mainly conducted under two data types: instances without any qualifications and instances with qualifications. In addition to this, we also compare the results of HBA-GA with those of several other well-known meta-heuristics: genetic algorithm (GA), arithmetic optimization algorithms (AOA) [22], particle swarm optimization algorithm (PSO) [26], honey badger algorithm (HBA) [21], whale optimization algorithm (WOA) [23], grey wolf optimizer (GWO) [24] and aquila optimizer (AO) [25]. These algorithms have been proven to have excellent results when solving engineering problems. All experiments were coded in python on a laptop((Intel Core i7-1165G7 CPU @2.8 GHz on Windows 10)). Since these meta-heuristics are random, we run each algorithm 10 times independently on each instance for comparative accuracy. Each algorithm may have different parameters. if rand < 0.5 then 7: for i = 1 to M do 8: Updating the solution with Eq.15 produces a new solution x new .

9:
Repair the new solutions x new . 10: if f new < f i then 11: end if 13: if f new < f prey then 14: x prey = x new , f prey = f new 15: end if 16: end for 17: else 18: for i = 1 to M do 19: Update density factor α.

21:
if r < 0.5 then 22: for j = 1 to N do 23: Update the solution by Eq.13. 24: end for 25: else 26: for j = 1 to N do 27: Update the solution by Eq.14. Improve the worst 10% individuals by crossover and mutation operators.

40:
Repair X new2 and calculate its fitness. Find the minimum fitness f prey and its corresponding solution x prey . 41: end while 42: Returns the best solution x prey .
We use the parameter values recommended by the original paper during the experiment. In addition to this, we set the population size and the number of iterations to 20 and 2000 respectively, which are parameters that every algorithm has.

A. DATASET INFORMATION
In our simulation experiments, our basic data comes from question F of the 2021 Huawei Cup Graduate Mathematical Contest in Modeling. The dataset consists of two parts, Data A and Data B. Both Data A and Data B contain flight and crew information, of which Data A contains a total of 206 flights and 21 pilots, and Data B contains 13954 flights and 465 pilots. In flight information, it mainly includes 8 elements, namely flight number, departure date, departure time, departure airport, arrival date, arrival time, arrival airport, and minimum qualification configuration. In the crew information, it mainly includes 7 elements, namely the employee number, the captain, the deputy captain, the Deadhead, the base, the duty cost per unit hour, and the task pairing cost per unit hour. Data A is a small-scale dataset that spans a total of 15 days, as shown in Instance #1 in Table 3. However, Data B is a large-scale dataset that spans 30 days. In our experiments, flights within a scheduling period have been combined into flight pairs in advance. Since Data B is a super large-scale dataset, we divided flight pairings and pilots into datasets of different scales, as shown in Instance #2 to Instance #6 in Table 3. At the same time, we added qualification information in datasets Instance #5 and Instance #6. These 6 instances will be used to evaluate the effectiveness of our proposed algorithm. In this section, we will test the CRP problem with the objective function f 1 . The objective function f reflects the income balance of pilots of the same level and the same qualification in a scheduling cycle. We use this objective function to measure the fairness between pilots. To evaluate the performance of HBA-GA, 8 algorithms are tested on 6 instances. Each algorithm is run 10 times independently in each instance. Since evolutionary algorithms have a certain degree of randomness in each run, we separately recorded the best function value, worst function value, mean and standard deviation of each algorithm in 10 runs. In this paper, gains are made by improving two aspects and adding them to the HBA algorithm.
1. Since the global optimization ability of HBA is not strong, it is easy to fall into the local optimum. Therefore, we added Levy flight in the evolution stage, which significantly enhanced the global optimization ability of the algorithm.
2. After the evolution phase is completed, we also apply the crossover and mutation operators in the candidate solutions, which allows the algorithm to find better solutions. This search can improve the quality of the solution.
The statistical results of several classical algorithms are shown in Table 4. In Table 4, our proposed HBA-GA outperforms the other 7 well-known algorithms in best fitness,  worst fitness and average fitness. Our proposed algorithm provides the best fitness for all instances. As can be seen from Table 4, except for the standard deviation, several other metrics are better than the rest of the 7 algorithms. HBA-GA also outperforms more than half of the algorithms in terms of standard deviation. Therefore, we can see that a fairer crew scheduling scheme can be obtained significantly using the HBA-GA method.
In terms of the convergence of HBA-GA, the figure 4 gives box-plots for 6 instances, HBA-GA has better stability in 10 runs of each instance. We also compared the best performance of each algorithm out of 10 runs. As can be seen from the figure 3, the ability of HBA-GA to jump out of the local optimum is better than the other 7 algorithms.
In the process of convergence, HBA-GA continuously conducts global exploration, and converges to the global optimal value on several instances. Meanwhile, HBA-GA shows faster convergence than several other algorithms.
Instance #1 is a very small instance spanning 15 days, so we can list the results of the crew scheduling using the HBA-GA algorithm, as shown in the Gantt chart of Figure 5. In Figure 5, the ordinate represents the pilot (including the captain and co-pilot), and the abscissa represents the date of the flight pairings. Among them, A001 to A0011 are captains, and A0012 to A0021 are co-pilots. In Figure 5, each block represents a flight pairing, and each flight pairing needs to be assigned twice, i.e. blocks of the same color describe the captain and first officer performing the same flight pairing.   As can be seen from Figure 5, all pilots meet the rest time and duty time constraints, ie this is a feasible allocation. Due to the large amount of data in the other 5 instances, we only list the allocation Gantt chart of instance #1. Objective function 2 represents the cost minimization of airlines, which is also the objective function commonly used in most literatures. In this section, this objective is also optimized. In Table 5, the comparison of HBA-GA and several other algorithms is shown, as shown in Table 5, HBA-GA outperforms other algorithms on both the best solution and the worst solution. And it is also better than several other algorithms in terms of average fitness function. In terms of standard deviation, HBA-GA outperforms several other algorithms on instances #1, #5, #6. This shows that the stability of the algorithm of HBA-GA is only weaker than other VOLUME 10, 2022 TABLE 5. Results obtained from 10 independent runs of CRP with objective function f 2 in terms of mean, standard deviation and running time. algorithms in individual cases. This shows that the HBA-GA method performs well when solving the objective function to minimize the cost of CRP. This can be seen more clearly in the box-plot in Figure 6.

D. COMPARISON OF RUNNING TIME
In this section, we compare the computation time of several algorithms on two objective functions. As shown in Table 4 and Table 5, the running time of our proposed HBA-GA algorithm is in the middle of the 8 algorithms. From Table 4 and Table 5, we can clearly see that when the number of flight pairings and pilots in the dataset remains the same, while adding other highly qualified flight pairings and pilots, the running time of each algorithm increases significantly. Therefore, we can conclude that as flight pairing and pilot matching conditions are increased, the run time of the algorithm increases. At the same time, when the running time of other algorithms is not much different, the HBA-GA algorithm proposed by us can significantly improve the fitness value.

E. COMPARISON WITH BRANCH PRICING ALGORITHMS
In fact, our proposed model(objective function f 2 ) is an integer linear programming problem. At the same time, a large number of variables are included in our model. In this case, solving this model using branch pricing algorithm is a common method. Branch pricing (BP) algorithm is a hybrid optimization algorithm of branch and bound algorithm and column generation algorithm. When using column generation to solve an integer programming problem, the constraint main problem is usually relaxed into a linear programming problem, and after obtaining the optimal solution of the linear relaxation problem, integer programming is used to solve it [27]. However, the integer optimal solution is often not obtained by doing so, so it is necessary to use branch pricing to find the integer optimal solution. Therefore, the process of generating a new set of feasible solutions may take a considerable amount of time. In Table 6, we list the running  times of our proposed HBA-GA algorithm and BP algorithm on the objective function f 2 . At the same time, we also list the gap of the objective function value of HBA-GA algorithm and BP algorithm. As can be seen from Table 6, we greatly reduce the running time in large-scale instances while ensuring that the objective function value produced by the BP algorithm is not much different. When we adopt the objective function f 1 as the objective function value of the model, the model is no longer an integer linear programming, so the optimal solution of the model cannot be found using the BP algorithm. The HBA-GA algorithm can only get as close to the optimal solution as possible.

VI. CONCLUSION
Unlike most existing models, we consider a more realworld model that considers flights with high qualification requirements. Specifically, we formulate the objectives of the model from the perspectives of airlines (cost minimization) and pilots (equilibrium of benefits for pilots), taking into account qualifications. Combining HBA and GA algorithms to develop an improved HBA algorithm to solve the CRP model. Under the HBA algorithm framework, in order to prevent the algorithm from falling into local optimum prematurely, we use a Levy flight strategy with 4 schemes. It randomly selects a scheme to update individual information, which helps to explore the global optimum. In addition, after updating the individuals, we also use the GA algorithm to optimize the worst 10% individuals in the population. Since our model is a discretized model, we design a discretized population update scheme for the CRP.
Experiments are performed on data of different scales that simulate the characteristics of data in the real world. The results show that the computational time of the algorithm is significantly improved when the flight requires highly qualified pilots to perform. Experiments are performed on data of different scales that simulate the characteristics of data in the real world. It outperforms several other swarm intelligence algorithms in most cases. At the same time, it is faster than the branch pricing algorithm in the case of a small difference in fitness. The model we propose is more suitable for the real-world pilot scheduling problem, because most constraints are qualification-related constraints, such as language constraints can also be described as a qualification with language. Therefore, we can use these solutions to guide realworld crew rostering problem. In future work, we will further consider crew rostering problem for flight cancellations due to COVID-19.