Sequence- and Time-Dependent Maintenance Scheduling in Twice Re-Entrant Flow Shops

Industrial and academic interest converge on scheduling flow shops with sequence- and time-dependent maintenance. We posit that anticipatory, integrated scheduling of operational and maintenance tasks leads to superior performance to purely ‘wait-then-fix’ handling of the maintenance tasks. Motivated by an industrial problem with (sequence dependent) setup times, maximum separation constraints, and a combination of sequence- and time- dependent maintenance tasks, this paper introduces an integer programming solution, a constraint programming solution and a heuristic solution based on list scheduling. The motivating use case provides a unique combination of concerns that is to the best of our knowledge, not yet studied in the literature. We build on existing work where we can by extending models for sequence-dependent maintenance scheduling to accommodate sequence- and time-dependent maintenance scheduling and also propose other new models. We show the relative performances of our methods through empirical evaluations and also show significant improvements – up to 25% reduction in makespan – when compared to a reactive scheduling approach that does not consider maintenance in its planning. Based on our evaluations on exact methods, constraint programming models scale better than mixed integer programming models for this problem.


I. INTRODUCTION
We consider a sequence-and time-dependent maintenance scheduling problem.Our problem is motivated by an industrial use case of a large-scale printer (LSP) and is modelled as a flow shop.The operations in this problem have ordering constraints that enforce precedence and also maximum separation constraints that limit the delay between some of the operations.We also face setup time considerations.There are maintenance tasks which depend on the schedule: different sequences of operations have different deterioration effects on the machines.Additionally, the contribution of an operation to the total deterioration effect in a sequence is dependent The associate editor coordinating the review of this manuscript and approving it for publication was Yu Liu .on the timing of operations.Thus, our maintenance planning problem is both sequence-and time-dependent.The key question is how to handle both operational and maintenance tasks.This is a challenging question because deteriorated machines produce low-quality jobs; there are thresholds beyond which deterioration of machines must be fixed by carrying out a maintenance activity.The overall objective is to find a feasible schedule that minimises the makespan.
Integrated production and maintenance planning is a challenge in many industries such as in wind farms [1], [2], in the capital goods industry [3] and the pulp and paper industry [4].In many cases, machine deterioration is dependent on use, i.e., the maintenance required depends on how production operations have been scheduled.Sometimes, this dependence can be ignored and solutions can focus on preventive or policy based maintenance [5], [6], [7].Recent work in this direction has used reinforcement learning to come up with these policies [8].In other cases, the maintenance and production planning problem can be so integrated that the effect of use patterns on maintenance cannot be ignored.Previous work along these lines has considered different ways in which maintenance planning is integrated with production planning such as time-dependent maintenance [9] and position-dependent maintenance [10], [11].The literature also considers different models of maintenance activities with one of the most popular models being that maintenance activities affect the processing time of operations [12].While similar problems have been tackled in the literature, our work deals with a unique combination of maximum separation constraints and a deterioration effect not on the processing times of jobs but on the quality of work produced.
We present three solutions to this problem namely, (i) a mixed integer programming solution, (ii) a constraint programming solution, and (iii) a list-scheduling based heuristic solution that extends the capabilities of existing schedulers to handle the kind of maintenance activities presented in this problem.
Through empirical evaluations, we show that in comparison to the reactive approach of scheduling only production operations and then performing maintenance activities when deterioration thresholds are crossed during a production run, our proactive approach achieves significant improvements in the makespan.
Parts of this paper were presented in a non-archival workshop [13].In the workshop paper [13], we introduced the list-scheduling based heuristic (Section VI).The current paper presents the work in archival form, introduces two other solution methods, and also expands the scope of the evaluation to include more list-schedulers from the literature.This paper is organised as follows: Section II discusses related work, Section III provides the background and problem definition, Sections IV, V and VI present mixed integer programming, constraint programming and heuristic solutions respectively.We perform empirical evaluations in Section VII and Section VIII concludes the paper.

II. RELATED WORK
The literature has investigated the dynamic relationship between machine deterioration and production scheduling from multiple angles ranging from ways to accurately determine the deterioration of a machine [14], [15] to actually generating schedules.We group the research themes in this field based on two categories, namely; (i) the way deterioration is modelled and (ii) the way maintenance activities are modelled.
Based on the deterioration model, existing research can be split into three main categories or approaches [16].The timedependent approach relates deterioration to the time at which a job is scheduled, i.e., scheduling a job later in the schedule incurs some additional deterioration which typically leads to longer processing times compared to scheduling it earlier.
Closely related to this is a position-dependent approach, where deterioration effect of an operation is dependent on the number of preceding completed operations.Finally, there is the sequence-dependent approach in which the deterioration depends on the ordering or sequence of the preceding operations on the machine.As a result of the industrial challenge addressed in this paper, we focus on the sequence-dependent case with an additional challenge that the deterioration effect of an operation on a machine is not known apriori and is itself time-dependent.
The survey of Gawiejnowicz [9] into the state of time-dependent scheduling problems has shown that the problem has been studied for single machine, parallel machine and dedicated machine use cases with a wide range of solution methods.However, situations where time-dependence of maintenance activities is coupled with sequence-dependence are unaddressed.
Yang [17] consider the position-dependent maintenance scheduling problem on a set of parallel machines assuming that machines can only be maintained once within the planning horizon and with a constant maintenance duration.References [10], [11] and [18] all consider position-dependent maintenance on a single machine with varying considerations such as the impact of time-dependent improvements in machine conditions, constraining job processing times to lie within an interval, and a combination of time and position-dependent deterioration respectively.References [12] and [19] also consider the position dependent case but both add due-window considerations for just-in-time scheduling considerations.
The sequence-dependent approach is a more recent addition to the literature and can be considered as a generalisation of the time-and position-dependent approaches.Notably, [20] and [21] study sequence-dependent deterioration on a set of parallel machines without and with maintenance activities respectively.Reference [22] considers iterated greedy heuristics for a similar problem and [23] considers the case where the parallel machines are not identical and processing time is based on a combination of deterioration and the speed of the assigned machine.Recently, [16] explored multiple integer programming models for solving the sequence-dependent maintenance problem on parallel machines and provided a heuristic approach for larger instances.The combination of sequence-dependent maintenance with other approaches and its effect in more complex manufacturing systems has not yet been studied.
Based on the model of maintenance activities, there are also different approaches in the literature.Some works such as [20] do not consider the presence of maintenance activities at all and aim to schedule in a way that deterioration is minimized.Other works such as [16] consider maintenance activities that reset the status of a machine to full health or 0% deterioration while a third category [12] considers rate-modifying maintenance activities that restore machine health by modifying the rate such that machines are able to perform work faster after maintenance.The authors of [24] and [25], classify maintenance activities into those that completely reset the state of the machines and those that restore the machines to some better deterioration state only.Additionally, the maintenance activities can be of fixed duration or can also have varying types based on how deteriorated a machine is.
A core assumption in many scenarios is that deterioration makes machines slower, thus increasing processing times of operations.Our work differs fundamentally in this regard in that using deteriorated machines does not have an effect on processing times, but instead affects the quality of the jobs produced.Our problem defines deterioration thresholds beyond which maintenance activities must be carried out to meet the quality requirements of future jobs.We also consider the case of maintenance activities that reset the state of the machine but also consider that there exist different classes of maintenance activities each with their own deterioration thresholds and incurring different costs.
An additional complication in our problem is the presence of maximum separation constraints, which impose additional feasibility requirements on the problem.Exact solutions are able to easily model these additional requirements but heuristics run the risk of generating infeasible solutions in some cases.We therefore consider it necessary to design a solution for schedules that become infeasible due to the incorporation of maintenance activities.This concept of re-organising or repairing a changed schedule has been studied with various heuristics such as left and right shift [26], [27].[28] combines multiple of these heuristics and a genetic schedule repair algorithm to build a solution that caters to multiple classes of schedule disturbances in a prefabrication plant.
In the context of flow shops, an example of schedule repair algorithms can be found in [29] which considers re-scheduling in a two-machine flow shop where schedules are disrupted by machine breakdowns.Additionally, [30] considers re-scheduling due to inserting new jobs in already planned schedules and [31] considers re-scheduling due to a wider range of disruptions in flow shop schedules at runtime.These cases all consider unexpected interruptions and do not have the combination of precedence and maximum separation constraints which provide an additional challenge for our problem.
In summary, there is a gap in the literature for sequence-dependent maintenance scheduling where deterioration effects of operations are not known apriori but are themselves time-dependent.The particular industrial challenge we consider has additional requirements of maximum separation that add to the complexity of the problem.Further, the schedule repair that is needed for heuristic schedulers that may produce infeasible schedules when we introduce maintenance activities, also requires new techniques.

III. PROBLEM DEFINITION
We consider a maintenance-aware re-entrant flow shop with setup times and relative due dates inspired by an industrial use case of a large-scale printer (LSP).The LSP prints different types of duplex sheets that need to be processed twice by the same print head at a speed of 100 or more pages per minute.In this setting, jobs to be scheduled refer to sheets to be printed.
In three-field or Graham notation [32], the base problem without maintenance is defined as F|s i , s ij , limited − wait|C max indicating that it is a flow shop with both sequence-dependent and independent setup times, with maximum separation constraints between operations of the same job also known as limited-wait constraints, and with an objective to minimise makespan C max .There is no preemption allowed and all jobs are released at time 0.
We represent the n-job m-machine maintenance-aware problem as the tuple (M , J , O, P, S, D, δ, X , O M ) where M = {µ 1 , . . ., µ m } is the set of machines and J = ⟨J 1 , . . ., J n ⟩ is the sequence of jobs.The set O represents the set of operations for every job j i ∈ J where each operation o ij has a processing time P ij .Each job has the same number of r operations as is in a standard flow shop.Moreover, S : O × O → R ≥0 refers to setup times, which represent the required delay between the completion of an operation and the start of another operation.Setup times can exist between operations of the same job to model travelling time of a job for instance, or between operations on the same machine to model any machine preparation step that is needed between operations.Operations of the same job also have maximum separation constraints between them represented as D : O × O → R >0 , i.e., the maximum delay between the start times of two consecutive operations of the same job.Such constraints model the fact that operations of a job can often not be delayed indefinitely due to physical constraints in the plant like the buffer size.In a situation where such constraints do not apply, VOLUME 11, 2023 103463 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
separation constraints can simply be set to infinity and setup times to zero.
The solution to the problem is a schedule , i.e., a sequence of both maintenance activities and production operations where each production operation is assigned a start time such that ω ij represents the start time of operation o ij .
In addition to being a flow shop with the above properties, we have a situation with re-entrancy such that the sequence of machines for each job is ⟨µ 1 , . . ., µ k , µ k , . . ., µ m ⟩, i.e., there is one re-entrant machine that all jobs go through twice.Operations on the re-entrant machines are referred to as first and second pass operations.Re-entrancy occurs in many production processes, e.g.semi-conductor production, where wafers revisit machines at different stages of the production process and in painting processes where a job may revisit a machine for multiple coats of paint.Simple re-entrant setups have been shown to be NP-hard [33], [34].Our motivating use case of production printing has a twice re-entrant setup arising from duplex printing.
We further have the constraints that (i) jobs are not allowed to overtake each other, (ii) the required completion order of jobs is the same as the index of the jobs, and (iii) all setup times and due date constraints are hard constraints that must be obeyed.This situation means that the only scheduling freedom is in the sequence of operations on the re-entrant machine, i.e., first and second passes of the same jobs do not necessarily have to follow each other on this machine.This means we can also think of this as a single machine scheduling problem with precedence and maximum separation constraints.
In the same vein as only needing to schedule the re-entrant machine, we limit our maintenance planning to maintaining the re-entrant machine.While other machines also require maintenance, only the re-entrant machine requires maintenance in the same time scale as the operations being carried out, creating a very tightly coupled problem compared to maintenance of other machines.Additionally, the re-entrant machine is often a key machine of concern for cost reasons -re-entrant machines are too expensive to simply duplicate and remove the re-entrancy -or for quality reasons -some products need to be handled in a delicate state (chemical products for example) and moving the product from one machine to the other would change its state.

A. DETERIORATION MODEL
In our motivating industrial problem, there is a deterioration model δ : → R ≥0 , that maps a scheduled sequence of operations on a machine to a deterioration state, i.e., given a sequence of operations on a machine with their corresponding start times, i.e., a schedule, δ : × → R ≥0 informs us of the machine state at the end of the sequence.Here, δ is both sequence-and time-dependent in the sense that deterioration is measured by idle time of a machine part, i.e., the longer a machine part has been left idle, the more deteriorated it is.These idle times follow directly not only from the sequence themselves, but also from the assigned start times of operations in these sequences.We do not explicitly model machine parts and instead depend on the fact that different types of jobs use different machine parts and so it can be inferred which machine parts have been idle based on how long it has been since a certain job type has been scheduled.We assume that there is a set of job types T = {τ 1 , . . ., τ n } that can be presented to the machine and that there is a lexicographic ordering of job types such that every set of machine parts used by a job type τ x is contained in the set of machine parts used by a job type τ y>x .It then follows that at the start of an operation of type τ x , idle time is reset to 0 for all operations of type τ y≤x .Note that while there could be other kinds of problems where different jobs use completely different machine parts and such a lexicographic ordering of types is not possible, it is still a realistic assumption for many scenarios, e.g., scenarios where jobs come in different sizes and bigger sizes simply use more machine parts for production or scenarios where jobs can be customised with different add-on properties processed by additional machine parts.
Finally, we also take as input a maintenance policy X .In our problem, the policy has a set of maintenance activity classes C. For every class c ∈ C, there is a corresponding maintenance duration P c similar to processing times of production operations.The maintenance policy further maps intervals of deterioration values [θ c , c ) to classes of maintenance activities such that whenever the deterioration falls in [θ c , c ), a maintenance activity of at least class c is required before further production.Thus, [θ c , c ) defines the interval of deterioration thresholds for a maintenance activity.We assume that these intervals are non-overlapping and that maintenance activities triggered by higher thresholds, i.e., harsher deterioration, are more intense and require longer durations.The deterioration thresholds serve to capture the limits at which the quality of a job would be too low if production carries on without a maintenance activity.In this context, a low-quality job refers to a poor print typically with colours bleeding into each other, blurry prints or unintended lines running across a page.
An example problem is shown in Figure 1.The problem is represented as a constraint graph where the due dates and setup times are treated as a system of difference constraints.Operations are represented as circles and each column of operations belong to one job, while each row of operations are mapped to the same machine.Solid arrows represent the minimum separation between operations and are made of the sum of processing and setup times while dashed arrows represent the maximum separation between operations and can be thought of as relative due dates.Minimum separation edges are represented with positive values and maximum separation edges are represented with negative values as they connote at least and at most constraints on the difference between the start times of the operations they connect.

IV. INTEGER PROGRAMMING APPROACH
Mixed Integer Programming (MIP) is one of the most popular exact solving paradigms and has been applied to other 103464 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.maintenance planning problems in the literature [16], [35] with some success.Due to the existence of a wide variety of commercial solvers, mixed integer formulations of a problem are valuable as solutions can be provided by these solvers.Furthermore, MIP models can give additional insight to the structure of a problem.Thus, we also consider such a solution for our problem.
In this section we present an exact MIP model for this problem.The model uses the concept of event based formulation as introduced by [16] and extends this concept to accommodate the kind of maintenance policies in this problem.The key idea here is the notion of blocks where a block is defined as a sequence of operations uninterrupted by a maintenance activity, i.e., a block is a sequence of operations separated from other operations in the sequence by at least one maintenance activity.For our model, we extend this idea to also include the effect of job types.A block is then defined as a sequence of operations uninterrupted by either a maintenance activity or an operation with a higher type than any other operation within the block.
We define binary variables to mark whether an operation starts a block or not.These variables are further indexed by job type and maintenance activity class, i.e., an operation can be the start of a block delineated by a class of maintenance activities and/or the start of a block delineated by a job type.Blocks of different types are allowed to overlap with additional constraints added to ensure that maintenance is not triggered more than necessary.
The model retains all variables defined in the problem definition in Section III.Indices of variables corresponding to operations are either of the form x ij when both the job and operation identifier are important or of the form x a when it is only necessary to differentiate one operation from the other.For ease of modelling, we also define binary variables am , ∀o a ∈ O, µ m ∈ M to represent machine assignment.Then, am is set to 1 if operation o a is assigned to machine µ m .am is not a decision variable and is part of the problem description.
Additionally, since we only plan maintenance on the re-entrant machines µ k , many constraints only apply to operations on this machine and are denoted as R such that R = {o a ∈ O| ak > 0}.Furthermore, a dummy operation o dummy of processing time 0 is defined and constrained to be the first operation on each machine.We also extend the use of a job type τ to serve as a function that returns the type of an operation when written as τ (o).
The following additional variables are developed for the MIP model: ω ij refers to the start time of operation o ij ∈ O, B ab is a binary variable relating to the precedence constraints between operations o a and o b .Note that B ab refers only to direct precedence and not the general notion of o a being scheduled sometime before o b .We discretize job types such that τ a refers to the type of o a and assume that there is a lexicographic ordering of job types such that processing a job type with a higher value is sufficient to reset the machine for lower job types according to the maintenance policy described in Section III.Some of the constraints are linearised using big-M variables namely, M τ and M ω .We define some bounds for these variables in Section IV-B below.

A. THE INTEGER PROGRAMMING MODEL
In this section, we define the integer programming model made up of an objective function, decision variables and constraints.
Objective Constraints

Decision variables
The objective of the model is to minimise makespan denoted by Equation (1).The constraints in Equations (2b) to (2h) apply to all operations while the constraints in Equations (2j) to (2r) only apply to operations scheduled on the re-entrant machine.All non-binary variables are constrained to be non-negative, i.e., start times, idle times and deterioration values all have a lower bound of 0.
Equation (2a) enforces the fixed-order relationship between operations at the same level of the flow shop.Equations (2b) to (2c) enforce setup times and maximum separation constraints between operations of the same job respectively, while Equation (2d) enforces that the dummy operation has no predecessors.Equation (2e) ensures that every operation has exactly one predecessor and Equation (2f) enforces that every operation has at most one successor.Equation (2g) enforces that operations only follow each other if they are mapped to the same machine and Equation (2h) enforces that there is no overlap between operations leaving room for setup times.These make up the constraints that specify the problem without maintenance.
The maintenance constraints follow below.Equation (2i) enforces that there is no overlap between operations while leaving enough room for any maintenance activities that may have been triggered.Equation (2j) and (2k) specify constraints on the minimum time elapsed since an operation of a certain type has come through the machine.Similarly, Equations (2l) and (2m) specify the minimum time elapsed since a maintenance activity of a certain class has been scheduled.The constraints represented by Equations (2j) to (2m) are defined in a cumulative way based on predecessor operations.Equations ( 2k) and (2m) are activated depending on the presence of a job type or a maintenance activity respectively.This toggle is implemented by big-M values that are activated based on the binary variables Z and ζ .
The actual deterioration value is computed by Equation (2n) which is set to the minimum of both K and L. Equation (2n) computes deterioration based on the idle time so far and is set to the minimum of K and L so that maintenance is only triggered when necessary.Equations (2o) and (2p) specify that maintenance activities are triggered whenever deterioration thresholds are crossed thereby starting a new block while Equations (2q) and (2r) similarly start a new block based on the relationship between types of operations, i..e, a new block is triggered whenever an operations predecessor has a higher type.Note that this model allows multiple maintenance classes to be triggered simultaneously if the threshold violations cross multiple thresholds.However, Equation (2h) means that the gap left for maintenance corresponds to the largest processing time of all triggered maintenance activities, thus not paying unnecessary maintenance costs.
Finally, Equation (2s) calculates the makespan which in a fixed order problem, is the finishing time of the last operation of the last job.

B. BOUNDS FOR BIG-M VALUES 1) M ω
Throughout the model, M ω is used as a big-M constraint in two instances.The first is in Equations (2m) and (2k) to sum up the minimum times since the last maintenance or the last occurrence of a type of job and in Equations (2o) and (2p) to toggle maintenance if deterioration thresholds are crossed.In each of these cases, the upper bound is the maximum possible deterioration value that can occur.Because our deterioration deals with idle times, we are then looking for a value that is larger than or equal to the maximum time the machine can be left idle.
An idea for this bound is to use an upper bound on the makespan as there always exists a solution with a better makespan than one in which the machine is left idle for the upper bound on the makespan.
This upper bound assumes the worst case which is that every operations incurs the maximum possible setup time and the maintenance activity with the longest duration occurs before every operation.Thus, the bound is 2) M τ The tightest bound for M τ is the largest job type available in the problem.This holds because: -M τ is an upper bound on the types of jobs, -we assume that job types are all given integer values corresponding to their quality requirements, 103466 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
-we assume there is a lexicographical ordering of these types that corresponds with the order of the integers representing each job type.

C. LINEARISING THE MODEL
Some equations are still quadratic namely Equations (2h), (2i), (2k) and (2m).They however all involve the product of a binary variable and a non-negative continuous variable and can also be linearised via the big-M method.The corresponding M is also M ω .A detailed explanation of how this linearisation is achieved can be found in [36].
Additionally, Equation (2n) requires us to compute the minimum which is also a non-linear equation.We define one more auxilliary binary variable γ a that is set to 1 if K a is less than L τ (o a ) a and linearise the minimum constraint by replacing it with the following set of equations: Equations ( 4a) and (4b) set the deterioration value δ a to be upper bounded by the minimum of K a and L τ (o a ) a .However, this is not enough as δ a is still free to take any values less than this and can lead to violations of maintenance constraints.We further use Equations (4c) and (4d) which set δ a to be lower bounded by the minimum of K a and L τ (o a ) a .This lower bound is also achieved via big-M constraints which activate either equation based on γ a .The combination of the lower and upper bounds ensure that δ a is exactly set to the minimum of the two values K a and L τ (o a ) a .Finally, Equations (4e) and (4f) set γ a to 0 if K a is less than L τ (o a ) a and 1 otherwise.

V. CONSTRAINT PROGRAMMING APPROACH
Constraint programming (CP) has recently been shown to perform well for scheduling problems [37].This motivates us to also explore a constraint programming solution.In this section, we present a CP model.
Our CP model uses the idea of interval variables and sequence variables.These are known constraint programming concepts [37] with the following definitions.Interval variables refer to operations to be scheduled and are declared with a length equal to the processing time of the operation.The goal of the solver is to assign a start time to each of these variables.An additional characteristic of interval variables are that they have the option to either be compulsory, i.e., they must exist in any schedule produced by the solver, or be optional.Sequence variables on the other hand, represent orderings of interval variables.The solver receives these as a set of interval variables with its goal being to decide on a sequencing of these interval variables.
Apart from constraints and variables, constraint programming also provides some auxiliary functions such as startOf and typeOf which help us access variable properties -in this case, their assigned start times and types respectively.
We define two classes of interval variables, (i) operations which are always present and each retain the representation of o a and (ii) maintenance activities which are optional and referred to as m c a where m c a is a maintenance activity of class c that precedes an operation o a .The variables K , L and R retain their definitions from the MIP model in Section IV.
Given that we have |A| maintenance classes and |R| operations in total on the re-entrant machine, we define |A||R| maintenance activities since the worst case is that there is one maintenance activity of a class before every regular operation.We add constraints such that the maintenance activities are included in the sequence only when deterioration thresholds are violated.
Sequence variables are defined per machine and referenced as Sequence m for corresponding machine µ m where Sequence m contains all operations mapped to µ m including the optional maintenance activities.For the re-entrant machine, we define an additional sequence variable SequencePlain m as a sequence of only production operations -excluding maintenance activities -and constrain this sequence to follow the same ordering as Sequence m .The purpose of this duplicate sequence is to ensure that sequence-dependent setup times are respected.The details of how we achieve this follow in the constraint definitions below.
Objective min(C max )

Decision variables
C6 :noOverlapDirect(Sequence m , P, S) C7 :noOverlapDirect(SequencePlain k , P, S) for re-entrant machine µ k (C7) In this model, Constraint C1 enforces an ordering between the first operations of each job.The before constraint enforces precedence relationships between two operations in a sequence.We enforce the order of the first operations of each job which we know will be on the first machine (as we have a flow shop).Constraint C2 builds on C1 to then enforce that this ordering is respected across all other sequences using the sameSubsequence constraint.Note that we only enforce a subsequence because the re-entrant machine has operations on multiple levels of the flow shop.The sameSubsequence is set up such that only operations at the same level are constrained with the fixed ordering, which is in line with the requirements of our problem.Constraint C3 uses the sameSubsequence in a similar way to constrain the duplicate sequences -with and without maintenance activities included -to have the same ordering.
Next, Constraints C4 and C5 enforce the sequenceindependent setup times and maximum separation constraints respectively.Both of these apply to operations of the same job as is seen with the index of operations in the constraints.
Sequence-dependent setup time and no overlap constraints are handled by Constraints C6 and C7, which ensure that both the separations required by processing times and sequence-dependent setup times are obeyed.Since maintenance activities are also included in our sequences, we ensure correctness of Constraints C6 by extending the processing and setup times accordingly with setup times set to 0 for operations before or after maintenance activities.The noOverlapDirect constraint works such that the separation denoted by sequence-dependent setup times applies only between direct successors, i.e., say an operation o a is followed by o b with a maintenance activity m c b in-between, the setup time between o a and o b will not be enforced.Thus, setting the maintenance setup time to 0 can lead to constraint violations as the problem is now under constrained.It is worthy of note that there exists a noOverlapIndirect constraint, which applies sequence dependent setup time constraints to all successors; however, this over-constrains the problem. 1  We use the noOverlapDirect constraint and circumvent under-constraining the problem by using the supporting Constraint C7 on a duplicate sequence without maintenance.
Constraint C8 enforces the presence of a maintenance activity whenever the minimum deterioration is within the limits of threshold violations.We do not explicitly calculate a deterioration variable δ in this model but this is essentially the left hand side of Constraint C8.We depend on the fact that our problem defines non-overlapping maintenance threshold intervals to ensure that at most one maintenance activity is triggered before an operation.
Constraints C9 and C10 deal with the computation of the minimum time elapsed since a maintenance activity has occurred.Since we are guaranteed to trigger at most one maintenance activity per operation, we do not maintain different minimum elapsed times per maintenance class as was done in the MIP model.Similarly, Constraints C11 and C12 compute the minimum time elapsed since a job of a certain type has been through the machine.
Finally, C13 calculates the makespan, which we again know to be the finishing time of the last operation of the last job.
Worthy of note is that Constraints C10 and C12 are cumulative constraints that could be expressed using the cumulFunction constraint, which keeps track of each interval variables contribution to a function [37].However, many implementations of this function within available solvers require that the contribution of each interval variable be known apriori whereas, in our case, the contribution of each interval variable is itself based on decision variables [37] due to maintenance also being time-dependent. 2

VI. HEURISTIC SOLUTION APPROACH
While exact approaches such as those presented in Sections IV and V have lots of advantages, they often do not scale well.In this section, we present an alternate heuristic solution approach to handle larger problem instances.The work presented in this section has appeared earlier in a workshop paper [13].
Our heuristic approach is based on extending list schedulers to integrate maintenance activities in the schedule.Heuristic list schedulers have been developed for the ← selectHighestRanked( ′′ , rank) 12: ′′ ← ∅ return industrial problem we consider [38], [39], [40] and are also suitable for online scheduling.Thus, we look into extending them to handle integrated production and maintenance scheduling.The typical flow of a list scheduler is to order operations according to some metric and insert them in a schedule one after the other until all operations are scheduled [40].

A. MAINTENANCE-AWARE LIST SCHEDULING
To make a list scheduling approach maintenance-aware, we propose to evaluate the effect of any operation placement on maintenance triggering before making a decision.This leads to a schedule with the necessary maintenance activities triggered by the operation sequence already included.This is shown in Algorithm 1.In Line 1, the scheduler takes as input the flow shop to be scheduled, the chosen ordering of the operations order, and the ranking of decisions rank.Lines 2-6 initialise the variables used in the algorithm, i.e., an empty schedule that is filled with operations by the algorithm, empty sets of schedules ′ and ′′ used to keep track of scheduling options, and an operation o p to track the last operation that was inserted in the schedule.Specifically, o p is initialised to a dummy operation for the first run where no insertions have occurred yet.In Line 7, the scheduler loops through each operation o c in the chosen order and Line 8 finds positions to place the operation in the schedule being built with each possible option resulting in a different schedule stored in the set ′ .For every one of these schedules, we trigger predicted maintenance in Line 10, which updates the schedules with predicted maintenance activities included.We keep track of the last regular operation placed in the schedule o p to reduce the amount of work it takes to trigger maintenance as the schedule is already evaluated up to that operation o p .Eventually, we pick the best option in Line 12 where the 'best' is as determined by the supplied ranking rank.← insertMaintenanceOperation(a c , ) ← updateStartTimes(f , ) 8: feasible ← checkFeasibility(f , ) 9: if ¬ feasible then 10: ← repairSchedule(f , ) return The steps shown in Algorithm 1 are generic and can be customised to any list scheduler of choice.However, evaluating maintenance is performed according to the steps described in Algorithm 2. For a given schedule, we first go through the operations in the schedule from the last inserted operation o p to the current operation being inserted o c in Line 2. For each operation, we evaluate the deterioration state in Line 3. If a maintenance activity is triggered at any point in the schedule, the action is then inserted and the schedule is re-evaluated in Lines 5-9.We approach this by creating an operation a c to represent the maintenance activity and adjusting the edges in the graph such that the constraints of the original problem remain intact after the insertion of the new operation.This is illustrated in Figure 2 where we show the edges added after inserting a maintenance activity.Since we have hard timing constraints between operations, inserting a maintenance activity can lead to a previously feasible schedule becoming infeasible.In such a case, a schedule repair action is triggered to return the schedule to a feasible state in Line 11. Algorithm 2 assumes that a schedule is always repairable and below in Section VI-B2, we show what the necessary conditions are for this to be true.

B. SCHEDULE REPAIR
Flow shop schedules generally need to obey a certain ordering of operations to be valid.However, re-entrant flow shops with ← removeSecondPassOp(o i,k+1 , ) ← insertSecondPassOp(fp ′ , o i,k+1 , ) 14: 15: ← updateStartTimes(f , ) 18: feasible ← checkFeasibility(f , ) 19: return due dates have an additional validity criterion, which is the due date between operations.In a case where operations that are not completely part of the set of input operations -such as maintenance activities -have to be scheduled, due date violations become even more likely.Since these operations are only known when schedules are evaluated, we always have the possibility that a schedule becomes infeasible as a result of these insertions.Furthermore, it is still combinatorial to decide on the repaired version of the schedule that minimizes the makespan after an event that causes infeasibility occurs.We therefore need to develop a schedule repair strategy for this problem.

1) OUR STRATEGY
Schedule repair entails reorganising a schedule to obtain a state where the schedule is valid again [41].Since we start from a valid schedule that is rendered infeasible by inserting new operations, the infeasibility is due to a due date violation, i.e., an operation has been delayed too long after its preceding operation.Therefore, the fix is to systematically bring operations closer to their predecessors.However, it is not immediately obvious which operations need to be brought forward and how far this needs to go.As such we define a recursive strategy where we take small steps forward and reevaluate the fix until the schedule is feasible again.Additionally, moving operations around can violate the maintenance policy so after re-organisation, it is necessary to re-evaluate the schedule.This solution falls under the class of proactive-reactive dynamic scheduling [42].As shown in Algorithm 3, every time we reorganise the operations in the schedule, we first identify three key operations, namely, the penultimate first pass operation from the point where the schedule was broken, the last second pass operation from the point where the schedule was broken, and finally the last second pass operation that has been included in the schedule.This is shown in Lines 4-6 where we identify these key operations and their positions in the schedule.We then move all scheduled second pass operations belonging to jobs ranging from the last second pass to the ultimate first pass in the schedule -this occurs in the remove and insert calls on Lines 13-17.This way, the schedule has been reorganised such that second pass operations from the point of failure are at least a step closer to their first pass operations.We repeat this process until the schedule becomes feasible, 3moving the point of failure a step backward each iterationthis is as seen on Line 18 where the point of failure is updated ahead of the next iteration.After the schedule is deemed feasible, a last step is taken to trigger maintenance again in Line 20 as re-ordering operations could have invalidated or triggered maintenance activities.This re-ordering works because due dates exist only between consecutive operations of the same job.Figure 3 shows an example of the schedule repair process.In Step 1, the schedule is infeasible after the insertion of a maintenance activity highlighted in green.The ultimate first pass is identified as o 42 , the penultimate first pass as o 32 and the last second pass as o 13 .The operations after the maintenance activity are then brought forward as can be seen in the new placement of o 23 in Step 2. This continues in Steps 3 and 4 until the schedule is evaluated to be feasible.
It is valuable to point out that the overall algorithm proposed is flexible enough to adopt other repair strategies depending on the use case.An alternate example could be VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.the strategy of reducing the rate of production to prevent or delay maintenance activities.A host of possible rescheduling and repair strategies are surveyed in [41].

2) SAFE MAINTENANCE POLICIES
A maintenance policy X maps a deterioration state of the machine to an appropriate maintenance activity.The policy in use determines when and where maintenance activities are necessary.As discussed above, inserting a maintenance activity in a schedule may make the schedule infeasible.We define a safe maintenance policy as a policy that ensures that there exists at least one maintenance-aware solution to the flow shop provided there is a feasible schedule for the flow shop alone without considering maintenance activities.Since a schedule becoming infeasible after a maintenance insertion is a result of a violated due date, there should be enough room between consecutive first and second passes of the same job to fit a particular maintenance activity unless the policy is such that the maintenance activity cannot be triggered between first and second passes of the same job.Concretely, this means that the processing time of any maintenance activity a c that can be triggered between passes of the same job o ik and o i(k+1) should fit in the available time between them, i.e., Theorem 1: Given an infeasible schedule, the schedule repair strategy defined in Algorithm 3 is always able to return it to a state of feasibility in at most |J | iterations, where |J | is the number of jobs in the schedule, provided that a solution exists for the problem and the maintenance policy in use is safe.
Proof: For an insertion of a maintenance activity a c between operations o ik and o i(k+1) to become infeasible due to a due date violation, it means that o i(k+1) has been delayed too long, i.e., ω i(k+1) −ω ik > D(o ik , o i(k+1) ).To avert this, the maintenance activity must be able to fit in the slack between both operations.Bearing in mind that other operations could be placed between o ik and o i(k+1) , the slack (o ik , o i(k+1) ) left between o ik and o i(k+1) is where operations o a , . . ., o b represent operations possibly placed between o ik and o i(k+1) .Figure 4 The repair algorithm progressively brings operations closer to their direct predecessors by at least one step per iteration.In the last possible iteration of the schedule repair, each operation o i(k+1) follows its direct predecessor o ik .It follows that this occurs in at most |J | iterations of the schedule repair as the re-entrant machine can only have |J | higher pass operations to be re-ordered.At this point, Equation ( 7) becomes For this to be infeasible, it means that a c cannot fit in (o ik , o i(k+1) ), i.e., P c > D(o ik , o i(k+1) ) − P ik − S(o ik , o i(k+1) ), which violates the rules of a safe maintenance policy shown in Equation (6).

VII. EXPERIMENTAL RESULTS
This section evaluates the empirical performance of the three solution approaches we propose.We apply the heuristic approach as an add-on to three existing list schedulers in the literature to evaluate the applicability of this approach to listscheduling.We compare our heuristics against the two exact approaches (integer and constraint programming) to evaluate their accuracy and scalability.We generate benchmarks according to the types of jobs typically presented in our industrial use case as described in Table 1.We generate benchmarks with patterned arrivals of job types such that jobs of a type appear in repeated blocks, e.g., a set of 50 jobs can be made of 20 type 1 jobs followed by 10 type 2 jobs and then 20 type 3 jobs.We randomise the length of the blocks and number of times these blocks repeat to mimic arrival patterns of jobs in practice.We generate 50 instances for each job size in {5,10,50,100,150,200,300,500,1000}.
Our heuristic approach is implemented as an extension to three schedulers from the literature -Bounded Heuristic Constraint Scheduler (BHCS) [39], As Soon As Possible (ASAP) Scheduler, and Modified Nawaz-Enscore-Ham (MNEH) Heuristic [43].BHCS is a list scheduler developed specifically for our use case, while the ASAP scheduler is also a list scheduler that uses the same ordering requirements as BHCS but places operations as soon as possible (ASAP).MNEH is a modification of the popular NEH heuristic [44], [45] that is suitable for re-entrancy.Maintenanceincorporated versions of these schedulers are referred to as  MIBHCS, MIASAP, and MINEH respectively where the MI prefix refers to ''maintenance incorporated''.In each of these experiments, we tune the heuristic approach to include a maintenance activity if a deterioration threshold is crossed or if 90%4 of the upper bound of a threshold that affects the quality of an operation further down the line is crossed.Since we insert maintenance between two operations, we always have complete information about the next operation.We can also reliably infer what operations are further down the line for the entire planning window based on which operations have already been scheduled.
In the basic schedulers -BHCS, ASAP, and MNEHmaintenance is reactive and interrupts the schedule during production runs.We simulate the behaviour of reactive maintenance in these schedulers by evaluating the completed schedules they produce for maintenance and compare these with versions of the scheduler that incorporate our proactive maintenance heuristic.MINEH.MNEH has the least performance improvement due to it not being a pure list scheduler.With MNEH, only relative positions of operations are decided in each iteration and there is no partial sequence that is guaranteed to remain the same from one iteration to the next; as such the evaluation of the deterioration of a machine loses some meaning from one iteration to the next since sequences change at each iteration.The exact approaches -CP and MIP -should ideally always be better than all of the heuristic approaches but they are sometimes worse because they do not always solve till optimality within the time out.

B. PERFORMANCE EVALUATION
Figure 6 shows the distribution of the time spent on maintenance.We see that with maintenance-included versions, we spend up to 70% less time on maintenance.This is because considering deterioration allows us to perform maintenance before machines deteriorate to a state where we have to pay larger maintenance costs.The difference is also this significant because there is up to one order of magnitude difference between the durations of different maintenance activities for this use case (see Table 1).This difference translates to shorter makespans for the schedulers.
In both Figures 5 and 6, there are instances where the heuristic approach worsens the results particularly for smaller job sets.The instances that are worsened by the heuristic are a result of (i) scenarios where the heuristic maintenance trigger is too conservative and performs maintenance even though the job set could be completed without it, and (ii) scenarios where the list scheduler picks a sequence that triggers shorter maintenance activities.
Neither of the exact solutions are able to scale to provide solutions for larger job sets within the 30 minute time out -this accounts for the missing columns in Figures 5 and 6.In Table 2 we show the performance of the CP and MIP solutions.We see that the CP model is able to solve more instances than the MIP model but for the instances where the MIP model is able to provide solutions, the optimality gap is smaller.
The runtime increases with the number of jobs as expected and Table 3 shows the average runtime over the job size of the different schedulers compared in this evaluation.The exact approaches are given a 30 minute timeout and in bold are the solutions with the worst run times for a job size.Instances where no solution was provided by a method before time out are left unfilled and are the worst for that job size.The heuristic solutions are able to provide solutions in runtimes below 350ms for job sizes up to 500.Above that, the runtime grows to 1800ms.The biggest time sink for the heuristic solutions is how often the maintenance evaluation and consequently schedule repair is triggered. 5This is based on the operation of the base scheduler itself.MNEH evaluates whole sequences while ASAP and BHCS evaluate partial sequences at every decision point thus triggering maintenance evaluations more often, leading to higher runtimes.
In summary, we find that the heuristic approach is scalable and can produce competitive results compared to exact solvers even for small instance sizes.In general, we also find that apart from improving the actual goal of reduced makespan, integrated production and maintenance planning can also reduce the total time spent on maintenance which can result in reduced costs in some cases.

VIII. CONCLUSION
Efficient maintenance scheduling is important for sustained productivity of industrial processes.This paper studied the problem of sequence-and time-dependent maintenance and presented three solution methods namely, mixed integer programming, constraint programming and a heuristic solution.
As the problem is motivated by an industrial use case, we have evaluated all the methods on jobs in this case.We show that list scheduling heuristics can be extended to include proactive maintenance with significant performance gains over reactive approaches.
This paper considers maintenance activities that are on the same time scale as the jobs themselves.An interesting future direction is to include longer-term maintenance planning in the scope and to investigate the combined problem of production and maintenance planning over multiple time scales.
Additionally, we solve the problem from a predictive maintenance perspective, i.e., where maintenance actions are carried out based on the health status of machines.However, this requires knowledge of how machines deteriorate and this information is not always available.Many other papers consider a preventive maintenance perspective where the challenge is either scheduling around a set maintenance schedule or determining what the maintenance schedule itself should be.While we know that preventive maintenance runs the risk of either maintaining machine too little or too often compared to the needs-based approach of predictive maintenance, and both preventive and predictive maintenance have been shown to outperform reactive maintenance approaches, it is still interesting to compare both approaches and determine what problem properties make it necessary to use one or the other.This is because even when complete information on the health status of machines is available, the gains made by integrating them in the decision making process may not necessarily be worth the increased runtime.

FIGURE 1 .
FIGURE 1. Sample re-entrant flow shop where the operations are represented by circles.Column-wise, we have operations of the same job and row-wise, we have operations on the same machine with one of these being the re-entrant machine that appears on rows 2 and 3. Operations with the same colour or boundary lines are mapped to the same machine.Setup times and maximum separation constraints are shown by solid and dashed edges respectively.
Block starts are marked by binary variables Z c a and ζ τ a where Z c a determines if operation o a starts a block of operations delineated by a maintenance activity of class c and ζ τ a determines if operation o a starts a block of operations delineated by a job type τ .Idle time values are held by the variables K c a and L τ a , which correspond to the minimum time elapsed since a maintenance activity of class c preceding o a and the minimum time elapsed since an operation of type τ preceding o a respectively.Furthermore, deterioration values at the start of an operation o a are held by the variable δ a and are determined by the deterioration values K and L.

a
Minimum time elapsed since operation of a type τ preceding o a L a ∈ R K a Minimum time elapsed since any maintenance activity preceding o a K a ∈ R δ a Deterioration of machine at start of o a δ a ∈ R Z c a o a starts a block delineated by a maintenance activity of class c Z c a ∈ {0, 1} ζ τ a o a starts a block delineated by a job of type τ Z c a ∈ {0, 1}

1Algorithm 1 ω
Given a sequence of operations o a → o b → o c , sequence-dependent setup times will be considered from o a → o b , o b → o c and o a → o c whereas the only sequence-dependent setup times that should be considered are from o a → o b and o b → o c . 2 Start times of operations are decision variables.103468 VOLUME 11, 2023 Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.Maintenance Aware List Scheduling (MALS) 1: function MALS(flow shop f , operation ordering order, ranking rank) ← triggerMaintenance(o c , o p , f , ω)

FIGURE 2 .
FIGURE 2. Edge update after inserting a maintenance activity.The original constraints between operations o 22 and o 32 are now between operation o 22 and the maintenance activity with new edges added to connect the maintenance activity to operation o 32 This ensures the original constraints of the problem are present and the maintenance activity is scheduled before operation o 32 .

FIGURE 3 .
FIGURE 3. Schedule repair strategy showing progressive steps in the algorithm.In the first step, the schedule is infeasible because of the maintenance activity (highlighted in green).From this point on, the future steps re-organise the schedule until we achieve a feasible schedule in Step 4. In Step 5, a last step is taken to trigger maintenance again as re-ordering operations could have invalidated existing or triggered new maintenance activities.Operations encircled in dotted lines are the ultimate first pass from the point of failure, the ones circled in a thin line are the penultimate first pass, and the ones circled in a thick line are the last higher pass operation.

A
. EXPERIMENTAL SETUP All experiments are performed on a 16-core 1.9GHz AMD machine running Ubuntu 20.04 with 32GB RAM.Algorithms are implemented in C++ and the MIP and CP models are solved by CPLEX version 22.1 and CP Optimizer version 22.1, respectively.The exact approaches are all given a 30 minute timeout.

FIGURE 5 .
FIGURE 5. Makespan improvement of maintenance-included versions over base versions.Instances where the solver timed out without providing any solution are marked with *.

Figure 5
Figure 5  compares the makespan of the schedules produced by MIBHCS, MIASAP, and MINEH to the makespan of schedules produced by BHCS, ASAP, and MNEH respectively.We also compare the exact solutions CP and MIP with the best solutions provided by MIBHCS, MIASAP, and

FIGURE 6 .
FIGURE 6. Duration of maintenance activities.Instances where the solver timed out without providing any solution are marked with *.

TABLE 1 .
Properties of jobs in use case.All timings are in seconds and job travelling times are treated as setup times between operations of the same job.

TABLE 2 .
Performance of CP and MIP solutions.

TABLE 3 .
Average runtime of solution methods (s).