A 2-stage Approach for the Nurse Rostering Problem

In this paper, we are addressing the NP-hard nurse rostering problem utilizing a 2-stage approach. In stage one, Monte Carlo Tree Search (MCTS) and Hill Climbing (HC) are hybridized in finding a feasible solution (satisfying all the hard constraints). We propose a new constant C value (which balances search diversification and intensification of MCTS) and tree policy/node selection function in the selection procedure of MCTS. In stage two, the feasible solution is further improved using Iterated Local Search (ILS) with Variable Neighbourhood Descent as the local search component. We introduce several unique neighbourhood structures for the ILS. In addition, we propose a novel perturbation strategy to allow the search to escape from local optimum. The proposed methodology is tested on the Shift Scheduling dataset (24 benchmark instances). New best results are reported for seven and two instances for the 10 and 60 minutes run respectively. An in-depth discussion on the attributes of the proposed methodology that lead to its good performance is provided.


I. INTRODUCTION
Combinatorial Optimization Problems (COP) involve finding the values for a set of variables from a discrete search space which maximizes or minimizes an objective function. Examples of these type of problems include vehicle routing [38], traveling salesman, bin packing, minimal spanning tree and timetabling. There are many types of timetabling problems e.g. educational timetabling [12], [41], [42], transportation timetabling [29] and personnel scheduling [46]. Nurse rostering is a specific type of personnel scheduling problem and plays an important role in healthcare management. It involves the assignment of shifts to nurses on a planning horizon (e.g. one month), satisfying a set of hard and soft constraints. The aim is not only to improve the operational efficiency of hospital wards by having an effective utilization of the limited resources, but also to focus on the well-being and job satisfaction of nurses.
Due to the number and nature of constraints, nurse rostering is complex and challenging for both researchers and administrators (personnel managers and head nurses) in hospitals. In fact, the nurse rostering problem is NP-hard [24]. Until recently, most nurse rosters were still constructed manually which can be tedious and time consuming. Having an effective automated nurse roster is crucial. Among issues that may be addressed by having a good roster in hospitals include: • Under or over staffing. The automated nurse rostering system will ensure that the right number (within a predefined range) of nurses will be assigned to a ward for each shift in a day. This will not only improve the operational efficiency but also reduce the operational cost of the ward. • Skills mismatch. Nurses with the right qualifications and skills will be assigned to shifts so that healthcare services can be delivered smoothly which will enhance the well-being and life span of the patients. • Job dissatisfaction of nurses. Shifts will be assigned to nurses according to their requirements spec-ified in contracts such as minimum/maximum work time, minimum/maximum consecutive shifts, preference/avoidance of shift pattern, night shift assignment, weekend assignment, days on/off, co-workers preference/avoidance etc. A nurse-centred roster will improve the life (health, family and social) quality and morale of nurses which in turn will improve the nursing service quality as well as the experience of patients. • Shortage of nurses. Nurse shortages in Malaysia is reported in [4]. The authors concluded that a supportive work environment is important, in addition to growing the workforce in addressing the issue. As the well-being of nurses is improved by having an automated rostering system, the nurse turnover rate and thus nursing shortage are expected to be alleviated. Subsequently, this will decrease staff management costs in terms of recruitment, retention and training. The coronavirus disease 2019 (COVID-19) was declared a pandemic by the World Health Organization (WHO) in March 2020. Since then, it has greatly impacted healthcare front liners in terms of psychological well-being, their morale and work performance due to long working hours under stressful conditions [20]. The urgency of having of an optimal roster is further underlined by the pandemic.
The contributions of this paper are as follows. We apply MCTS in nurse rostering problems for the first time. As far as we are aware, there is no such application found in the scientific literature. MCTS (as a relatively new search methodology) has become the focus of Artificial Intelligence (AI) due to its success in the games domain, particularly Go (where programs based on MCTS are competitive with the best human players) [6]. We propose a new C value (which balances the search diversification and intensification) and tree policy/node selection function in the selection procedure of MCTS. We introduce several unique neighbourhood structures and a novel perturbation strategy to allow ILS to escape from local optimum. The proposed methodology finds several new best results in a comparison to the current state of the art methodologies.
The remainder of this paper is organized as follows. The nurse rostering problem and its formal representation are described in Section II. Related work is reviewed in Section III. The proposed methodology is described in Section IV. We present the experimental results in Section V. An in-depth discussion is provided in Section VI. Finally, the conclusion and suggestions for future work are given in Section VII and VIII respectively.

II. PROBLEM DESCRIPTION
In this section, we describe the problem and the constraints involved. We present a formal representation of the problem using an integer programming model. The details of the studied problem can be found in [15].

A. FORMAL REPRESENTATION OF THE PROBLEM
y ds : total below the preferred cover for shift s on day d. z ds : total above the preferred cover for shift s on day d.

The objective function (Equation 3
) is to satisfy the shift requests of nurses (Equations 4 and 5) and minimize under and over staffing (Equation 6). These requirements are soft constraints which means they are optional and their degree of satisfaction will determine the quality of the roster. q nds and p nds in Equations 4 and 5 are weights that show the importance of shift on and off requests to a nurse. Meanwhile, w min ds and w max ds in Equation 6 are weights that underline the importance of under and over coverage respectively.

M inimize
This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. HC1: A nurse cannot be assigned more than one shift per day.
HC2: A minimum rest time is required after each shift thus certain shifts cannot follow others. For example, an early shift cannot follow a late shift.
The total number of a shift assigned to a nurse must not exceed the maximum allowed. For example, some nurses do not work night shift or work a maximum number of night shifts. d∈D x nds ≤ m max ns n ∈ N, s ∈ S HC4: The workload (in minutes) of a nurse must be between a minimum and a maximum.
HC5: The number of consecutive shifts assigned to a nurse must not exceed the maximum allowed.
HC7: The minimum number of consecutive days off. If the minimum is three, then the sequences {on-off-on} and {onoff-off-on} are not allowed.
..|D| − (e + 1)} (13) HC8: The maximum number of weekends. A weekend is considered worked if a worker has a shift on either Saturday or Sunday.

III. RELATED WORK
An overview of the nurse rostering problem can be found in [8], [11]. As policies, legalities and regulations differ between countries and hospitals, nurse rostering problems differ in terms of requirements (constraints) [16], [23]. The first international nurse rostering competition (INRC-I) was organized in 2010 with the goal to generate and compare new approaches to the problem in an objective manner [22]. The winner of INRC-I, Valouxis et al. [45] used a systematic two phase approach. In the first phase, the daily workload for each nurse was decided using an integer programming formulation with local search processes, while in the second phase, the specific shifts were assigned using an integer programming formulation. Other finalists of the competition were; Burke and Curtois [7] applied an ejection method for small instances and a branch and price algorithm for medium and long instances; Nonobe [33] employed a tabu search, with a mechanism to dynamically control the tabu tenure and constraint weights; Bilgin et al. [5] applied a hybrid approach where a greedy shuffle is deployed after running a hyper-heuristic for 80% of the computation time. Recent approaches for the nurse rostering problem include adaptive variable neghbourhood search [43], hybrid harmony search [3], randomized variable neighbourhood search [49], neutrality based iterated local search [32], population-based local search [2] and a hybrid of variable neighbourhood search and dynamic programming [1].
The second international nurse rostering competition (INRC-II) was organized in 2014 [9]. INRC-II is a simplified version of INRC-I. A multi-stage problem formulation was added. History information (border data such last worked shift of each nurse) has to be taken into account by solvers. Among the finalists of the competition were network flow based mixed integer linear programming [36], rotation based branch-and-price [28] and a sequence-based selection hyperheuristic [26]. Among the competitive approaches post competition were variable neighbourhood search [19], simulated VOLUME 4, 2016 annealing [10] and a hyper-heuristic based upon a hidden markov model [25].
Rahimian et al. [35] proposed a hybrid integer programming and variable neighbourhood search algorithm for the Shift Scheduling dataset (24 instances) introduced in [15]. A greedy heuristic was used to generate an initial solution. Then a variable neigbourhood search algorithm and integer programming (IP) based on a ruin and recreate framework was run alternately until the stopping criteria was met. In the framework, a scoring scheme was used to identify high penalty parts of the solution which were destroyed. The solution was then recreated by an IP solver. IP was applied again on the best found solution using the remaining time limit. Strong results were reported. Other approaches applied on these instances were branch-and-price, ejection chain heuristic and Gurobi IP solver as reported in [15]. New approaches applied on these instances were integer programming [39], linear programming based on a column generation heuristic [40] and a hybrid of mixed integer programming and simulated annealing [44]. The other popular NRP benchmark datasets (ORTEC and NSPLib) and the recent solution methodologies to address them are shown in Table 1.
MCTS is a relatively new technique and is being deployed in various application domains such as games [14], feature selection [30] and parameter tuning [17]. MCTS has rarely been used for combinatorial optimization problems. The application of MCTS algorithms on job shop scheduling problems can be found in [13], [37]. Matsumoto et al. applied Single Player MCTS (SP-MCTS) on reentrant scheduling problems [31]. The application of MCTS on variants of the traveling salesman problems can be found in [34]. Goh et al. applied MCTS to address course timetabling problems [18].

IV. PROPOSED METHODOLOGY
The overview of our proposed algorithm is presented in Algorithm 1. We employ a 2-stage approach. In stage 1, we attempt to find a feasible solution where all the hard constraints are satisfied. If a feasible solution is found, stage 2 is activated where we focus on decreasing the soft constraint violations without compromising any of the hard constraints. MCTS is depicted in Figure 1. Each node in the tree is a state and each link is an action leading to a state. Each node records an average value and a visit count. There are four main steps in MCTS which are selection, expansion, simulation and back-propagation. In the selection step, the tree is traversed from the root until a leaf node is reached. In the expansion step, a child node is added to the tree. In the simulation step, a simulation is run from the child node to produce an outcome. In the propagation step, the traversed nodes including the child node are updated with values from the outcome. These steps are repeated within available resources. In our MCTS implementation, in each exploration/iteration, we attempt to assign a shift s to nurse n on each day d (in the planning horizon) in a constructive manner. At the end of the exploration, nodes in MCTS are updated according to a reward value (based on the solution generated). These nodes will guide shift assignment for nurse n in the next exploration. The exploration stops whenever a feasible solution for nurse n is found or a certain number of iterations is reached. We maintain a record of the best solution bestSol for further processing by HC (in case we could not find a feasible solution for nurse n). The class diagram for the node class is shown in Figure 2. The approximate complexity for  The details of the MCTS procedure are presented in Algorithm 3. A root node, rootN ode is created (line 1). The diversification co-efficient, C is initialized with a value of 5.0 (based on computational experience). In each exploration, a current solution for nurse n, curSol n is initialized to an empty solution (line 5). Day d is set as 1 (line 6). The rootN ode is added to the list of visited nodes, visitedN ode (line 7). A tree is traversed in the TREE procedure which comprises of the selection and expansion steps (line 8). Shifts are assigned to nurse n in both the TREE and SIMULATION procedures. SIMULATION returns a reward value which is used to update the visited nodes in the BACKPROPAGA-TION procedure. C is updated using a decay rate of 0.9999 (line 11).
The details of the TREE procedure is shown in Algorithm 4. rootN ode is set as currentN ode. A tree is traversed through selected nodes until a leaf node is reached (lines 2-7). During tree traversal, the visited nodes are kept in the visitedN ode list (line 4), shifts (of the selected nodes) are assigned to nurse n, on day d (line 5) and d is incremented by 1 (line 6). At the leaf node, the tree is expanded by the EXPANSION procedure (line 8). One of the children of currentN ode is selected as childN ode (line 9). childN ode is added to visitedN ode (line 10). A shift (of childN ode) is assigned to nurse n, on day d (line 11). d is incremented by 1 (line 12). The SELECTION procedure is given in Algorithm 5. In this procedure, a node with the highest value among the child nodes of currentN ode is selected and returned. Instead of using the common Upper Confidence Bound (UCB) in Equation 16, we use Equation 17 as the node evaluation VOLUME 4, 2016 function or tree policy (line 4). In Equation 16, v i is the value and n i is the visit count of child node i while n p is the visit count of the current (parent) node. It can be observed that unvisited children are given the largest possible value so that all of them are considered at least once. C is a constant when set higher will promote search diversification as it prioritises less frequently visited nodes. In Equation 17, v i is the value of child node i. C is a co-efficient that determines the priority given to randomness (diversification) when evaluating the child nodes. The random component introduced in the node evaluation function/tree policy decreases the branching factor and effectively increases the tree depth. A deeper tree allows MCTS to make better decisions (in assigning shifts to a nurse) and is therefore more efficient.
The details of the EXPANSION procedure are shown in Algorithm 6. All the valid and feasible shifts for nurse n, on day d are added as the child nodes of the current node, currentN ode. The feasibility of a shift for a day can be fully tested for the hard constraints; minimum rest time (HC2), maximum total shift (HC3), maximum sequence (HC5), minimum sequence (HC6 and HC7) and pattern (HC8). For example, if a shift is feasible for a day, the violation of these constraints is zero. The workload hard constraint (HC4) is violated when the total time unit is more than a maximum value or less than a minimum value. Therefore, it can only be partially tested (for maximum value). A shift is considered feasible for a day even if the current total time unit is less than a minimum value.
In the SIMULATION procedure (Algorithm 7), a valid and feasible shift is assigned to nurse n on each remaining day d. All the valid shifts for nurse n are assigned to shif ts list (line 2). A shift is probabilistically selected from shif ts (line 5). If the shift is feasible for a particular day, it will be assigned to nurse n on day d (line 16). Otherwise, the The BACKPROPAGATION procedure is shown in Algorithm 8. We increment the visit count and the value (as a cumulative mean of reward) of each node in the visitedN ode list.

2) Hill Climbing (HC)
If MCTS could not find a feasible solution for nurse n, a HC procedure (Algorithm 9) is invoked. For each day d, we try to replace the shifts for a random block of days. We call this neighbourhood structure X-Extraction (Section IV-B1) where X is set to a random value between 1 to 4. We focus on minimizing the workload constraint (HC4) violations without compromising other hard constraints (zero violations). As previously noted, we have modified the violation calculation of this constraint. As the name of this procedure suggests, we only accept improving moves. If the candidate solution is worse than the current solution, the original shifts are kept. The process is repeated for 60 seconds. The procedure is exited when a feasible solution for nurse n is found (line 11). In stage 2, we focus on minimizing soft constraint violations. If a feasible solution is found, ILS (Algorithm 10) is executed. The procedures VND and PERTURBATION are run alternately until a time limit t is exceeded.

1) Varible Neighbourhood Descent (VND)
The VND procedure is shown in Algorithm 11. k neighbourhood structures are defined (line 1). k is initialised to 1 (line 2). At each iteration, a record of the current cost is kept (line 4) as record. Note that the f function is referring to soft constraint violations. Candidate solutions are generated using the neighbourhood structure k. The candidate solution is accepted as the current solution if the candidate cost is less than or equal to the current cost (lines 7-9). The best solution is updated if the current cost is better than the best cost (lines 11-13). If the current cost is equivalent to record, and replaced with new shifts, see Figure 5. In VND, we set the X variable to a random value in the range of 1 to 4.

2) Perturbation
The PERTURBATION procedure is presented in Algorithm 12. step and k are initialised to 1 (lines 2 and 3). A record of the current cost is kept as record (line 4). We attempt to gradually perturb the current solution until the cost changes VOLUME 4, 2016 ∆f ≥ 0.05 (5%). In each iteration, the current solution is perturbed by utilising a neighbourhood structure, N k . The value of k is incremented by 1 and the current solution is perturbed again if ∆f < 0.05. When k = k max , step is incremented by 1 and k is reset to 1. The current solution is repeatedly perturbed if necessary.
The PERTURB procedure is shown in Algorithm 13. A candidate solution is generated using neighbourhood structure N k . The candidate solution is accepted as the current solution if the candidate cost is less than or equal to record + record * step * 0.01. Obviously, the acceptance criterion is relaxed. The best solution is updated if the current cost is better than the best cost. This process is repeated for |N | × |D| × |S| iterations. We utilise the same set of neighbourhood structures in VND. However the X variable is set to a random value in the range of 5 to 7. The approximate complexity for this procedure is O(|N ||D||S|).

V. EXPERIMENTAL RESULTS
The experiments in this paper are conducted on a machine (Intel Xeon 3.3 GHz with 16 GB RAM) running Windows Server 2019. We tested the proposed methodology on the Shift Scheduling dataset introduced by Curtois and Rong Qu [15]. The dataset consists of 24 instances which were designed to reflect real world requirements. They vary from small (8 nurses, 14 days, 1 shift) to large (150 nurses, 364 days, 32 shifts) as shown in Table 2. These instances were designed to represent real world requirements (ranging from easy to challenging) and yet simple to use. The increment of |N |, |D| and |S| from instance 1 to 24 (Table 2) indicates the increasing size of the problem instances. Shift is referring shift types e.g., early, late, night shifts etc. We run the algorithm for a total of 30 times for each instance. Its important to note that all the solutions generated are validated using the software RosterViewer provided by the dataset owner. Table 3 shows the Hard Constraint Violations (HCV) and time to feasibility in seconds when using different C values in the SELECTION procedure of MCTS. Note that we are using the original Equation 16 as the tree policy. We propose to set C to 5.0 and gradually decrement it according to C i+1 = C i × α (decay rate) in each exploration. This geometric cooling schedule is adopted from the Simulated Annealing (SA) algorithm. We omitted the easy instances 1 to 12 as their HCV and time to feasibility are all zero. For the hardest instance 21, setting C too low (1.0) or too high (10.0) is 8 VOLUME 4, 2016 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. bestSol ← curSol 7 end 8 end 9 until this end condition equally bad. A low C value promotes search intensification, while a high C value promotes search diversification. It seems a balance is achieved when C is fixed to 5 but it is not sufficient to find even a feasible solution for this instance. On the other hand, our proposed C value with decay rate α = 0.9999 allows MCTS and HC to attain 100% feasibility. We perform t-tests to compare the means (HCV) between C value of 5.0 and the proposed C value with decay rate α = 0.9999. The p value of 0.000 (less than 0.05) reveals a significant difference between the means for the instance 21. Note that p value cannot be generated for the rest of the instances because the mean and therefore standard deviation for both groups are zero.

2) Comparing Tree Policies/ Node Evaluation Functions
In this section, we compare the HCV and time to feasibility using different tree policies in the SELECTION procedure of MCTS. Note that we are using the proposed C value in the previous section due to its effectiveness. The UCB tree policy Instance |N | |D| |S|  1  8  14  1  2  14  14  2  3  20  14  3  4  10  28  2  5  16  28  2  6  18  28  3  7  20  28  3  8  30  28  4  9  36  28  4  10  40  28  5  11  50  28  6  12  60  28  10  13  120  28  18  14  32  42  4  15  45   is shown in Equation 16, with the proposed tree policy is given in Equation 17. As evident in Table 4, the proposed tree policy allows the hybridisation of MCTS and HC to attain feasible solutions in shorter average times for instance 21.
The p value of 0.000 (less than 0.05) reveals a significant difference between the means (time to feasibility) for this instance. In addition, a lower overall average time (1.12 vs 1.25) is achieved.

3) Comparing Simulation and MCTS
Here, we compare the simulation component of MCTS with the fully fledged MCTS (with the proposed C value and tree policy). As shown in Table 5, simulation alone fails to find a feasible solution for instance 21 Table 6 shows the feasibility and time to feasibility of the solutions. The combination of MCTS and HC managed to find a feasible solution in every run for each instance. A feasible solution is found in < 1 second for instances 1 to 20. It takes ≤ 12 seconds for instances 22 to 24. Instance 21 seems to be the most challenging as it takes between 7 and 15 seconds to find a feasible solution.

1) Comparing the Influence of Neighbourhood Structures
To observe the influence of the proposed neighbourhood structures, we run each with a combination of neighbourhood structure(s) iteratively in a sequential manner using hill climbing acceptance criterion, with a runtime of 1 minute. The mean soft constraint violations are shown in Table 7. From observation, combinations of neighbourhood structures give better results than any individual neighbourhood structure. The combination of <NS1+NS2+N3> is good for most instances. The search may not have encountered local optimums yet for these instances where this combination of heuristics is not that effective.

2) Convergence Behaviour of the Proposed ILS
We run the proposed ILS on instance 8 using a random seed of 1. A hill climbing (HC) is run using the same seed for comparison purpose. Figure 6 shows the convergence behaviour of both the proposed ILS and HC. It appears that the HC stagnates relatively early at around 2200 (soft constraint violations). HC is assumed to be trapped in a local optimum. Meanwhile, the proposed ILS went through a series of fluctuations in terms of current cost (due to perturbations) and finally achieves a lower soft constraint violations (approx. 1500). It shows the perturbations in the proposed ILS are effective in helping the search to escape from local optimums.
3) Comparing with the State-of-the-art Methodologies Table 9 shows the quality of the solutions generated by the proposed algorithm with a runtime of 10 minutes, in comparison with other state of art methods (the details are given in Table 8). The dash "-" symbol indicates that the algorithm could not find a feasible solution within the time limit. The optimal solutions are underlined. The bold values indicate the best known result for an instance. Our proposed methodology obtained the best results for 10 instances (1, 2 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. (solver D and E) worked effectively for small and medium sized instances but was lacking in terms of performance for large instances. Our proposed methodology produces strong results for large instances. We believe it is due to its ability in finding a feasible solution faster and it therefore has more time to focus on improving the solution quality. However, its performance is mediocre for certain small and medium instances where an optimal solution is not guaranteed unlike the exact method such as Gurobi (integer programming). Note that the solver E did not include instances 20-24 in their experiments as they thought the data (planning horizon of six months or one year) is not practical in the real world application.  As the other state of the art methods were compared using the runtime of 60 minutes, we run the proposed algorithm for the same time limit. Table 10 compares the solutions generated by the proposed algorithm with those produced by other state of art methods. The proposed algorithm reduces the mean results for all the instances. It attains the best results for instances 1, 2, 3, 4, 21 and 24. Note that we did not include solvers F and G here as their runtime exceeded the time limit of 60 minutes which hinders an objective comparison.

VI. DISCUSSION
In MCTS, feasibility is tested before assigning a shift to a nurse on a particular day. All the hard constraints (HC3, HC3, HC5, HC6, HC7 and HC8) can be fully tested except workload constraint (HC4). We can only prevent a shift assignment that will cause the time unit (of a nurse) to exceed the maximum. A shift is considered feasible even though the time unit is less than a minimum after its assignment. To alleviate work underloading, the probability of selecting a vacancy is set relatively lower than selecting any shift. This setting also applies to HC, VND and perturbation procedures where the neighbourhood structure X-Extraction is utilized. MCTS is utilized in finding a feasible solution. Due to the We propose to set the C (Equation 17 in the SELECTION procedure) to a fixed value and gradually decrement it using a geometric cooling schedule. Setting C to a relatively high value in the beginning allows search diversification as nodes are randomly selected. As the search progress, C is decremented as well as the priority given to the random component. Effectively, priority is increasingly given to the value v i component when selecting nodes. This allows the search to intensify based on the information collected earlier towards the end of the search. Therefore, our proposed C value is effective.
A tree policy (Equation 17) for node evaluation in the SELECTION procedure is proposed. The commonly used UCB tree policy (Equation 16) gives unvisited nodes largest possible values so that they are considered at least once as a way to promote search diversification. However, we think that it is unnecessary to consider every unvisited node and replace the visit component in the equation with a random component instead. Therefore, the proposed tree policy manages to find feasible solutions in shorter times as shown in  Note that we modify the calculation of workload constraint (HC4) violations. The original calculation returns a penalty of 1 for a nurse whose total time unit is either more than a maximum or less than a minimum. The modified calculation returns a penalty of [minimum-time unit] if time unit < minimum and [time unit-maximum] if time unit > maximum. For example, at one time point, the time unit for a nurse is 5 and maximum is 1. The violation is 1 (based on original calculation) and 4 (based on modified calculation). In the original calculation, it is seen an as equal cost move. In the modified calculation, it is seen as a worsening move. This difference affects the acceptance and rejection of moves. This modification is important in providing the accuracy required in the calculation of the reward value (Equation 19) in the SIMULATION procedure so that MCTS can work effectively. Furthermore, this modification is crucial for our HC (Algorithm 9) to work properly.
Several new best results are achieved for the runtimes of 10 and 60 minutes as presented in Tables 9 and 10. We hold a competitive advantage as our proposed MCTS and HC combo can find a feasible solution quickly. This leaves us with plenty of remaining time to focus on improving the solution quality.
We opt to utilise VND as the local search for ILS because of its diversification capability. It can perform effective search through the employment of multiple neighbourhood structures. In addition, VND serves as a natural indicator of local optimum when it stops running (after considering all the neighbourhood structures).
From observation, a stagnating current cost indicates that the search is stuck in a local optimum. Therefore, when the execution of VND stops, we perturb the solution until the current cost changes at least 5%. We gradually (step by step) relax the acceptance criterion in case the current cost is still stagnant. The difference between the neighbourhood structures applied in VND and perturbation is the size range of block X. We hope the variation will make the search space more connected and allow the search to diversify and escape from local optimum in case it is trapped.

VII. CONCLUSION
We presented the implementation details of utilising a hybrid of MCTS and HC algorithms in finding feasible solutions. Specifically, the proposed C value (SELECTION procedure of MCTS) was compared with other fixed values. MCTS and HC perform best by gradually decrementing an initial C value based on a geometric cooling schedule. We also compared the proposed tree policy with the UCB tree policy commonly utilized in MCTS. The proposed tree policy allows feasible solutions to be found in shorter times. In addition, we demonstrated the importance of the learning component in MCTS. MCTS and HC manage to achieve 100% feasibility quickly. Furthermore, an ILS (VNS as local search) was proposed in improving solution quality. The proposed neighbourhood structures were shown to be suitable for the problem instances. A convergence curve of the proposed ILS was presented which shows its effectiveness in escaping from local optimums. Finally, a comparison was made between the proposed methodology and the state-ofthe-art methodologies, in terms of soft constraint violations. New best results are found for a number of instances for 10 and 60 minutes run.

VIII. FUTURE WORK
We notice that given the same amount of time (300 minutes), shorter runs (10 minutes × 30 runs) are better than longer runs (60 minutes × 5 runs) in obtaining the best results for instances 9, 12 and 13. This implies that the search becomes trapped in the longer runs for these instances. In future, we seek to improve the diversification of the proposed algorithm (ILS) to benefit from longer runs. We may consider devising additional neighbourhood structures, embedding tabu mechanisms and even hybridising the proposed methodology with population-based algorithms, which are often used for their diversification capability. The algorithms we would consider include genetic algorithms and particle swarm optimization. PROFESSOR DR. SALWANI ABDULLAH She obtained her BSc in Computer Science from The University of Technology Malaysia and her master's degree specializing in Computer Science from The National University of Malaysia. She did her Ph.D. in Computer Science at The University of Nottingham, United Kingdom. Now, she is a Professor in Computational Optimization at the Faculty of Information Science and Technology, The National University of Malaysia. Her research interest falls under Artificial Intelligence and Operation Research, particularly computational optimization algorithms (heuristic and meta-heuristic, evolutionary algorithms, local search) that involve different real-world applications for single and multi-objective continuous and combinatorial optimization problems such as timetabling, scheduling, routing, nurse rostering, dynamic optimization, data mining problems (feature selection, clustering, classification, time-series prediction), intrusion detection and search based software testing.
PROFESSOR DR. GRAHAM KENDALL is a Senior Vice President of the PETRA Group and the Chief Executive of the Good Capitalism Forum (GCF). Prior to this, he worked at the University of Nottingham for 21 years and, prior to this, he worked in the IT sector, holding several senior positions.
Prior to joining the PETRA Group, Graham worked in Malaysia for 10 years, being based at the University of Nottingham's campus in Semenyih where he was the CEO & Provost, and also a Pro-vice Chancellor of the global University.
He is (or has been, prior to leaving academia) a fellow of both the British Computer Society and the Operational Research Society. He holds honorary and distinguished professorial positions at universities in Hong Kong and India. He is an Emeritus Professor at the University of Nottingham.
He is a world recognised academic, with his research focussing on solving real world problems through the use of Operations Research, utilising Artificial Intelligence methods which are based on Charles Darwin's principles of natural evolution.
He has written over 250 peer reviewed papers and has served on several editorial boards including being the Editor-in-Chief of the IEEE Transactions of Computational Intelligence and AI in Games and an Associate Editor of the Journal of the Operational Research Society. VOLUME 4, 2016 15 This article has been accepted for publication in IEEE Access. This is the author's version which has not been fully edited and content may change prior to final publication. Citation information: DOI 10.1109/ACCESS.2022.3186097