Modeling and Scheduling Methods for Batch Production Systems Based on Petri Nets and Heuristic Search

An approach is proposed for modeling and scheduling batch production systems based on Petri nets. First, a group of jobs to be carried out by a batch production system is modeled by a transition-timed Petri net according to logical relations between operations and time requirements on operations. Second, conflict relations between operations are obtained by operation-resource diagrams that describe how operations compete for resources, and are formally expressed by linear constraints. Third, monitor places are designed to enforce linear constraints on the transition-timed Petri net, and a plant net is consequently built and made free of conflicts between operations, which can well emulate a real batch production system. Fourth, an optimal schedule problem, where the optimization performance index is to minimize a makespan, is formalized based on a plant net. Finally, a new variant of filtered beam search method is proposed to solve the problem based on an ad hoc function and a dynamical timed extended reachability graph of a plant net. A typical chemical production plant illustrates the theoretic results.


I. INTRODUCTION
In a batch production process, operations require a large number of resources, such as valves and reactors. These resources are often shared by different operations such that some operations cannot be executed at the same time. However, certain operations that do not compete for resources can be performed simultaneously. Therefore, there are various possible sequences of operations to accomplish a given group of jobs, and they have different makespans. It is a valuable issue to express the relations between resources and operations in order to model a batch production process and to compute an optimal schedule. A batch production system can be viewed as a discrete event system (DES) since its evolution is driven by various events such as starting and ending an operation, opening The associate editor coordinating the review of this manuscript and approving it for publication was Ayaz Ahmad . and closing a valve, and switching on and off a heater. Petri nets are a graphical and mathematical tool that provides a uniform framework for modeling, analysis, and control of DESs [1]. Tittus and Akesson [2] design Petri net models for plant resources and production recipes, where recipes consist of elementary tasks, such as carrying, mixing, adding and splitting of material. Furthermore, general Petri net building blocks are used to represent elementary tasks, and, in turn, to formally describe a recipe. Falkman et al. [3] propose a process algebra Petri net that is utilized to represent desired specifications of a batch process. Ferrarini and Piroddi [4] employ a hierarchical approach to model complex fluid transportation operations, where a higher level is for allocating resources and synchronizing operations, and a lower one is for executing operations. Wu et al. [5], [6] design a hybrid colored Petri net for modeling a crude-oil refining process such that a scheduling problem can be formulated in the framework of the control theory of hybrid systems. The above mentioned methods fail to formally determine and represent conflict relations among operations by Petri net structures since it is a challenging issue due to the large number of devices and their complicated relations defined by recipes. However, to do so is valuable since a well developed Petri net model makes it possible to address the control and scheduling problems via the Petri net theory.
Furthermore, the processing times of operations need to be taken into account if a scheduling issue is addressed. Timed Petri nets (TPNs) that encode time specifications are used for modeling batch production systems [7]. Tsinarakis and Tsourveloudis [8] present a transition-timed Petri net (T-TPN) for the modeling, analysis, synthesis and performance evaluation of batch production systems. Ghaeli et al. [9] model batch plants with place-timed Petri nets (P-TPNs), where time constants associated with places represent durations of operations. In general, the size of a T-TPN model is smaller than a P-TPN one for a batch production system. Baruwa et al. [10] apply timed colored Petri nets (TCPNs) to model flexible manufacturing systems, where each operation has a certain number of preconditions, an estimated duration, and a set of postconditions. Yang et al. [11] extend a resource-oriented Petri net to model a cluster tool in wafer fabrication. They define colors of transitions, and associate both transitions and places with time. In addition, under uncertain environment, TPNs can be extended to associate transitions with stochastic transition firing times [12].
For batch production systems, a schedule issue can be represented by a mixed integer linear programing (MILP) [9]. However, the number of constraints and variables is large and MILP may become untractable. Since TPNs have powerful graphical and algebraic representations and asynchronous and parallel capabilities, TPNs can model systematically behaviors of batch production systems. Furthermore, firing rules of the enabled transitions of TPNs play the roles of complex time constraints in MILP. Pioneer works try to tackle schedule issues by using timed extensions of reachability graphs of T-TPNs, as the state class graph [13], [14], timed aggregate graphs [15] and modified state class graphs [16]. Based on a relaxed mixed semantics model [17], a state class method is presented for the verification and analysis of temporal properties. More recently, control and schedule issues are considered with timed reachability graphs [18] and timed extended reachability graphs (TERGs) at the earliest firing policy [19].
An alternative research direction is to develop informed graph search algorithms, such as Dijkstra, A* and beam search ones, with untimed reachability graphs of Petri nets. Lee and Dicesare [20] design an A* algorithm to heuristically search for an optimal branch in a reachability graph. Xiong and Zhou [21] propose two hybrid scheduling methods that integrates a heuristic best-first strategy and a controlled backtracking one. Mejia and Odrey [22] present an improved heuristic search, where an aggressive node pruning strategy and an improved evaluation function are used.
Li et al. [23] design an admissible heuristic function based on available periods. Huang et al. [24] adopt admissible and non-admissible heuristic functions to improve the searching efficiency. Xing et al. [25] propose a scheduling method integrating a deadlock control policy and a genetic algorithm to obtain a strategy. Luo et al. [26] redefine an existing deadlock prevention policy and embed it into a scheduling method. Zhang et al. [27] model an assembly process by TPNs, and dispatch tasks with a dynamic programming algorithm, where a depth-first search and a heuristic policy are utilized to select a most promising path. In [28], a filtered beam search (FBS) algorithm is presented where an evaluation function is used to pre-valuate and filter out nodes that seem not promising. Mejia and Nino [29], [30] introduce a filtering mechanism to limit the number of nodes at each expanding step such that the complexities in space and time are reduced significantly. These methods need to evaluate a cost function. Since time information is absent in reachability graphs, it is difficult to design and improve heuristic functions.
The main contents and contributions of this work are summarized as follows.
1) A method is presented to model batch production systems by T-TPNs, where productive processes and conflicts between operations are represented by Petri net structures. Compared with [2]- [6], conflict relations among operations are accurately described by Petri net structures, and this is helpful to formulate and solve scheduling problems of batch production systems. In detail, an operation-resource diagram is defined for valves and tanks to formally describe the relationship between operations and resources, and is used to correctly derive conflict sets which are sets of operations that cannot be performed in the same time.
In turn, linear constraints are used to express conflict sets, and are enforced on the T-TPN model by designing their monitor places. As a result, a plant Petri net is derived to accurately describe and to imitate dynamical behaviors of batch plants. 2) A dynamical timed extended reachability graph (D-TERG) is introduced to preserve the timed properties of T-TPNs on specific regions. In contrast to other existing timed graphs [15], [16], [19], the D-TERG encodes only the part of the timed language that is necessary for a particular schedule issue. The D-TERG is a dynamical graph that grows up according to a sequence of states that are successively explored. 3) Based on the D-TERG, a variant of the FBS is proposed with an improved heuristic cost function. This function uses the characteristics of the states of the D-TERG and refines the estimation of the remaining time required to complete the schedule compared to the usual cost functions that search the shortest paths from the marked places to a reference one. In particular it takes into account the residual times of enabled transitions and extra waiting times of non enabled transitions due to conflicts about resources.
The main challenge of the proposed approach is certainly to combine both the D-TERG design with the new variant of the FBS. On the one hand, the proposed heuristic cost function is based on the D-TERG obtained at a certain point. On the other hand, the iterative construction of the D-TERG is driven by the FBS that prunes non-promising branches. This new scheduling strategy is successfully applied to the T-TPN models of the batch production system.
The remaining parts of this paper are organized as follows. In Section II, the concepts and notations of Petri nets are reviewed. In Section III, a scheduling problem of batch production systems is formulated. In Section IV, a method is given for modeling a batch production system. Section V shows how to solve a scheduling problem with D-TERG and a new variant of FBS algorithm. In Section VI, examples are presented to illustrate the proposed methods. Section VII concludes this paper.

A. PETRI NETS
A Petri net structure is N = (P, T , Pre, Post), where P is a set of m places and T is a set of n transitions with P ∪ T = ∅ and P ∩ T = ∅; Pre := P × T → N, and Post := P × T → N are the pre-incidence and post-incidence functions that specify the arcs, where N is the set of nonnegative integers. Given a place p i ∈ P and a transition t j ∈ T , the element of matrix Pre at row i and column j is denoted by Pre(p i , t j ) or Pre(i, j). The same holds for Post. A marking is a function m : P → N that assigns to each place of a Petri net structure a non-negative integer number of tokens, represented by black dots. The number of tokens in p ∈ P at m is denoted by m(p).
A Petri net is (N , m 0 ) with a Petri net structure N and an initial marking m 0 . Given a marking m, P(m) ⊆ P denotes the set of places with a non-zero number of tokens at m, i.e., the support of m. A transition t is enabled at marking m if m ≥ Pre(:, t), which is denoted by m[t . The enabled degree of transition t at m is defined as n t (m) = min{ m(p)/Pre(p, t) , m(p) ∈ • t}, where • t stands for the set of input places of t: • t = {p ∈ P|Pre(p, t) > 0}, and x stands for the largest integer smaller than or equal to real number x. The set of all enabled transitions at m is denoted as m • . The fact that a Petri net reaches m from m via the firing of t is denoted by m[t m . A firing sequence σ is defined as σ = t σ 1 t σ 2 . . . t σ h where σ 1 , . . . , σ h are the indexes of the transitions, the length of σ is h = |σ |, and σ = ε stands for an empty sequence. A marking m is reachable from an initial marking m 0 if there exists a firing sequence σ such that m 0 [σ m, and σ is enabled at m 0 . R(m 0 ) is the set of all markings reachable from m 0 , and, for simplicity, we denote this set by R when no ambiguity exists. A Petri net is bounded if and only if (iff) there exists a natural integer k such that the number of tokens in each place does not exceed k for any reachable marking. The considered Petri nets in this paper are assumed to be bounded. A consequence of boundedness is that R is of finite cardinality.

B. TRANSITION-TIMED PETRI NETS
A transition-timed Petri net (T-TPN) is defined as (N , m 0 , D), where N is a Petri net structure, m 0 is the initial marking, and D : T → R + (R + is the set of nonnegative real numbers) is the function that associates each transition t ∈ T with a duration D(t), which is the minimal delay that t should take since it is enabled. If D(t) = 0, t can fire immediately once it is enabled. The time semantics of TPN depends on D(t), t ∈ T , and the server, memory and choice policies [13]. In this paper, an infinite server policy, and enabling memory policy are adopted. With the enabling memory policy, at the entrance in a marking, the residual durations associated with still enabled transitions are decremented and the residual durations associated with disabled transitions are forgotten. The choice policy is a preselection performed by an external agent (for example, a scheduler): in case of concurrency or conflict, the selection of transitions that will fire next is decided by the scheduler. These specifications are discussed in Section V.
A timed firing sequence is written as σ = (t j 1 , τ 1 )(t j 2 , τ 2 ) · · · (t j h , τ h ), where j 1 , j 2 , . . . , j h are the indexes of the transitions, τ 1 , τ 2 , . . . , τ h are the firing instants, and 0 ≤ τ 1 ≤ τ 2 ≤ · · · ≤ τ h . The timed trajectory associated with σ and starting at m 0 is defined by (1) such that m(0) = m 0 . We call m(h) the final marking of (σ, m 0 ). A path ph = x 1 x 2 . . . x K in a T-TPN structure is defined as an orderly and oriented succession of K nodes with x k ∈ T ∪P for x 1 , x K ∈ P and x k+1 ∈ x k • for k = 1, . . . , K − 1. The duration of ph is defined as: For any places p, p ∈ P, let us define PH (p, p ) as the set of paths from the place p to the place p and ph * (p, p ) ∈ PH (p, p ) as the path of shortest duration within PH (p, p ). In order to define the states of a T-TPN system, it is necessary to define iteratively the residual times δ(t) of each transition t enabled at marking m(k), k = 0, . . . , h of a given trajectory of Eq. (1).
• For k = 0, a set of residual times associated to m(0) = m 0 is defined as a multiset (m 0 ) on an ordinary set where the number of occurrences of the element (t, D(t)) in (m 0 ) is n t (m 0 ), i.e., for each transition t enabled at m 0 with degree n t (m 0 ), the residual time D(t) is repeated n t (m 0 ) times in (m 0 ).
• For k > 0, a set of residual times associated to m(k), conditioned by a timed trajectory (σ, m 0 ), is defined as a multiset (m(k)) on an ordinary set {(t, δ) ∈ (T × R + ), t ∈ (m(k)) • }, where the number of occurrences of the element (t, δ) in (m(k)) is Num(σ, m 0 ) that Num : (T × R + ) * × N m → N + is a function that gives the number of occurrences of the element (t, δ) in (m(k)) when the timed trajectory (σ, m 0 ) is executed; δ is the minimal residual time before firing t and computed as δ = max 0, δ − (τ k − τ k−1 ) (with τ 0 = 0) if (1) (t, δ ) ∈ (m(k − 1)) and (2) t is not disabled by the firing of t j k ; otherwise δ = D(t). It is noted that the number of occurrences of the element (t, δ) in (m(k)) is at most n t (m(k)) when t is enabled n t (m(k)) times at m(k) (due to the infinite server policy). Observe that some interactions exist between the infinite server policy and the enabling memory policy. At the entrance in m(k), the enabling degree of some transitions t may decrease to a non-zero value. In that case the n t (m(k − 1)) − n t (m(k)) latest minimal residual times δ are used to update (m(k)). In addition to the time semantics previously defined, we consider in this paper a subclass of TPN where the firing times are computed with the earliest firing policy (EFP). When a transition is preselected for the next firing, it will fire as soon as its residual time has elapsed and its firing cannot be deferred. More formally, a feasible timed trajectory (σ, m 0 ) of the form (1) is an EFP-trajectory if each transition t j k , k = 1, . . . , h, of the trajectory satisfies (t j k , τ k − τ k−1 ) ∈ (m(k − 1)) and for all (t j k , δ) ∈ (m(k − 1)), δ ≥ τ k − τ k−1 . Note that the set of feasible timed trajectories is the timed language of the TPN system and that the subset of EFP-trajectories can be viewed as the sublanguage of the TPN under EFP.

C. EXTENDED TIMED REACHABILITY GRAPH
In this section we give some basic notions about the extended timed reachability graph (TERG) for T-TPN systems that behave under EFP. The TERG has been defined and studied in [38] for T-TPN systems that satisfy the following assumptions.
• A1: the TPN system is bounded, i.e., there exists a positive constant k such that, for all m ∈ R(m 0 ) and for all p ∈ P, m(p) ≤ k; • A2: the minimal firing time D(t) of any transition t ∈ T is a multiple of a common time stamp. In particular in [38], the TERG is proved to represent the timed language of a T-TPN system that behaves under EFP, and each state of the TERG coincides to a state of the T-TPN under EFP. Let (N , m 0 , D) be a TPN system. The TERG of (N , m 0 , D) is defined as a four-tuple ( is defined as an orderly and oriented succession of K states with The duration of ph is defined as: For any states S, S ∈ S E , let us define PH E (S, S ) as the set of paths from the state S to the state S and ph * (S, S ) ∈ PH E (S, S ) as the path of shortest duration within PH E (S, S ).

III. PROBLEM DESCRIPTION
In a batch production system, each type of product corresponds to a job that consists of a series of operations. The execution of an operation requires a set of resources, such as valves and tanks, and takes a certain time. This indicates that multiple operations may compete for the same resource. The executive logic of operations depends not only on orders defined by jobs but also on conflicts due to operations competing for limited resource. Thus, it is complicated to describe processes of a batch production system. Given a considered batch production system, O =  Some operations cannot be concurrently executed since they compete for the same resource. For a same job, there exist different processing sequences that cost different time. VOLUME 8, 2020 A typical scheduling problem is to search for a processing sequence that is carried out in a minimal time. In order to illustrate the proposed concepts and notations, a typical chemical plant is adopted as a running example.
Example 1: A typical chemical plant is shown in Fig. 1. Seven processing units, which are three supply tanks: u 1 , u 2 and u 3 , two reactors u 4 and u 5 , and two storage tanks u 6 and u 7 , are connected by a pipeline system including 13 valves v 1 , v 2 , . . . , v 13 .  Two jobs of the batch production system shown in Fig. 1.
and d(o 1.1 ) = 20 minutes, which means that o 1.1 requires valves v 1 and v 3 to be opened, valves v 2 and v 5 to be closed, and tanks u 1 and u 2 are filled by fluid from tank u 5 for 20 minutes. For this production system, we assume that J 1 and J 2 are executed twice, thus ρ(J 1 ) = ρ(J 2 ) = 2. Obviously, some operations cannot be executed simultaneously, such as , which are in conflict since o 1.2 requires v 3 to be opened and o 1.2 requires v 3 to be closed. Operations that do not compete for resources can be executed simultaneously. For example, o 1.1 can be executed with o 2.2 simultaneously. Different sequences have different processing time, and the key of this scheduling problem is to find an appropriate one to minimize the processing time.

IV. MODELING METHOD FOR BATCH PRODUCTION SYSTEMS A. JOB MODELING
In essence, a job is composed of a series of operations in a certain order. For the scheduling of a batch production system, Algorithm 1 is presented to model a job.

Algorithm 1 Modeling a Job by Petri Nets
, and the number of times ρ(J j ) of the job J j manufactured; Output: The Petri net model (P J j , T J j , Pre J j , Post J j , m 0,J j , D J j ) of the job J j ; 1: The starting place p j.0 is designed, and its initial marking equals times by which J j is to be manufactured, i.e., P J j = p j.0 and m 0,J i p j.0 = ρ J j ; 2: for all 1 ≤ i ≤ I do 3: A place p j.i is designed, which is the buffer place of the operation o j.i , i.e., P J j = P J j ∪ {p j.i };

5:
The arcs (p j.i−1 , t j.i ) and (t j.i , p j.i ) are designed; 6: end for Algorithm 1 is presented to design a Petri net model of a job J j including the ordering relationship, the processing time of operations, and the initial marking.
In Algorithm 1, Step 1 is to design the starting place for a job, and to mark it with ρ(J j ); Steps 2-6 are to design the place, transition and arc sets of operations, and to determine the minimal duration D(t i.j ) of each transitions.  Similarly, the Petri net model of J 2 is designed based on Algorithm 1.

B. MODELING CONFLICTS BETWEEN OPERATIONS
The competition of resources among different operations results in conflicts. To accurately express conflicts is important for finding an optimal sequence to execute operations. In this section, the specific definition and model of conflicts are designed. Definition 4: Given a considered batch production system, its operation-resource diagram G OR = (N O , N R , E OR ) is an oriented graph, where N O and N R are the sets of nodes that represent operations and resources, respectively and E OR is the set of monodirectional edges that represent the relationship between operations and resources (valves and tanks).
The procedure for designing an operation-resource diagram is presented in Algorithm 2. A rectangular node n is designed, i.e., N R = {n }∪N R ; 9: end for 10: for all jobs J i do 11: for all operations o i.j of J i do 12: for all resources states of s do 13: if v s ∈ r(o i.j ) (or u s ∈ r(o i.j )) then end if 20: end for 21: end for 22: end for VOLUME 8, 2020 In Algorithm 2, Steps 2-6 are to derive the set N O of nodes for all operations, denoted by triangles; Steps 7-9 are to obtain the set N R of nodes for all resources, denoted by rectangles; and Steps 11-23 are to draw monodirectional edges among these triangle and rectangular nodes.
Example 3: Let us design the operation-resource diagram for the batch production system in Example 1, as shown in Fig. 3. According to Steps 2-6 of Algorithm 2, ten triangle nodes are designed, which correspond to operations o 1.1 , o 1.  An operation-resource diagram correctly shows the relationship between operations and resources. The conflicts between different operations come from their competition for resources with different states. For representing and obtaining conflicts among operations, an operation-conflict graph is given in Definition 5. The procedure for designing an operation-conflict graph is presented in Algorithm 3.  In Algorithm 3, Step 1 is to design the nodes, and Steps 2-9 are to determine undirected edges among these nodes.

Algorithm 3 Design of an Operation-Conflict Graph
Example 4: For the operation-resource diagram in Fig. 3, the operation-conflict graph is obtained by Algorithm 3, as shown in Fig. 4. An operation-conflict diagram clearly represents the relationship between operations. Actually, a maximal conflict operation set O max corresponds to a subgraph of G OC . The corresponding definition and theorem are given as follows.
Definition 6: A clique of an operation-conflict graph G OC is a subgraph of G OC , denoted by Cl, if any two nodes of it are connected by one edge. Theorem 1 converts the computation of O max into a classic problem in the graph theory. There exist several algorithms to find maximal clique in the graph theory. In this paper, Bron-Kerbosch Algorithm [31] is used. In Algorithm 4, monitor places are designed based on maximal cliques of G OC in order to prevent a Petri net model from entering any conflict state.
In Algorithm 4, by Steps 2-5, the Petri net model for all jobs of the batch production system is designed; by Steps 6-7, the operation-resource diagram G OR and the operation-conflict graph G OC are drawn; by Steps 8-9, all maximal cliques and all maximal conflict operation sets O max are obtained; by Steps 10-21, monitor places are designed.
Example 5: Consider the batch production system in Fig. 1, Petri net models of J 1 and J 2 are modeled by Algorithm 1 and The job Petri net model (P J j , T J j , Pre J j , Post J j , m 0,J j , D J j ) is designed by Algorithm 1; 4: is designed by Algorithm 2; 7: The operation-conflict graph G OC = N OC , E OC is designed by Algorithm 3; 8: Find all maximal cliques of G OC based on Bron-Kerbosch Algorithm. 9: Find all maximal conflict operation sets O max , where one maximal conflict operation set corresponds to one maximal clique of G OC ; 10: for all maximal conflict operation sets O max do 11: The linear constraints of places corresponding to operations in O max are got based on Theorem 1, and the monitor place p c and corresponding arcs are designed based on those linear constraints [32]- [37]: 12: A monitor place p c is designed, and m 0 (p c ) = 1, i.e., P = {p c } ∪ P; 13: if all operations o j.i of O max belong to a sub-job then 14: Let o j.i and o j.i be the first and last operations of this sub-job, respectively; 15: The arcs (p c , t j.i ) and (t j.i , p c ) are designed; 16: else 17: for all operations o j.i of O max do 18: The arcs (p c , t j.i ) and (t j.i , p c ) are designed; 19: end for 20: end if 21 last operations of this sub-job, respectively. Arcs (p c1 , t 1.1 ) and (t 1.3 , p c1 ) are designed by Steps 12-20. All monitor places and corresponding arcs are designed by Steps 12-20, which are p c1 , p c2 , . . . , p c6 .
Finally, the plant Petri net of the batch production system is obtained, as shown in Fig. 5.

V. A NEW VARIANT OF FILTERED BEAM SEARCH BASED ON D-TERG
The determination of the optimal path ph * (S 0 , S ref ) for any state S ∈ S E can be obtained with global optimization algorithms [39], for example the well-known Dijkstra algorithm. Observe that such a method requires the computation of the complete TERG. One important problem with such a computation is that the size of the graph may become rapidly large (the increase in size is a critical limitation for all variants of state class graphs that are based on time). Intuitively, the price to pay, when S E is used instead of R, depends on the number of different earliest firing times that may occur for a given marking. In some particular cases, the state space becomes infinite even if the untimed net has very few reachable markings. Consequently, to obtain an applicable and tractable approach it is necessary to ensure that the number of explored states remains finite and as small as possible.
Note first that the infinite expansion of S E can be avoided by formulating all earliest firing times as multiples of a given period dt. More precisely, each transition t is associated with an earliest firing time D(t) = α(t) · dt and α(t) ∈ N + (N + is the set of all positive integers) then S E is of finite cardinality [38]. In that case, the number of states is bounded even if it may be large. Note that this simplification is reasonable from a practical point of view because any real number can be approximated by a rational number with a given precision (i.e., the multiple of a given period dt that depends on the precision). Then, the computation complexity to design the complete graph of the TERG can be measured by the ratio N E /N that equals at least 1.
In order to make the method more tractable, we propose to expand only the part of the TERG that will be required to design the expected schedule by pruning the non promising branches of the graph. We refer to this subgraph as a dynamical TERG (D-TERG). • S 0 is the initial state.
The following remarks hold • as far as each state S(k), k = 1, . . . , K has a predecessor within {S(1), . . . , S(k − 1)}, there exists at least one path from S 0 to S(k). Consequently, the D-TERG can be viewed as a subgraph of the TERG, • the D-TERG is completly defined by the sequence .

C. FILTERED BEAM SEARCH METHOD FOR T-TPN
The filtered beam search (FBS) is an aggressive and efficient pruning method based on original beam search (BS) [30]. The different variants of BS algorithms expand only a predefined number of nodes (the beam width β g ) at each iteration according to a cost function that will be discussed in the next section. The FBS is an improvement of the basic BS that combines the global filter β g with a local filter β l that allows the expansion of up to β l nodes from any parent node [30]. In the FBS method, the selection of nodes to expand is based on a cost function f composed of the actual cost g from the initial node to the current node, and an estimate h of the remaining cost to the reference node. The search uses a list containing at most β g nodes ranked with the cost function f . The search stops when one or several solutions are found with a certain cost g * and when all candidates in the list satisfy g > g * . The search principle of FBS as explained in [30], [42] is illustrated in Fig. 6 for parameters β g = 3 and β l = 2, and the explored nodes are colored in grey. At step 1, the list of nodes contains only the initial state 1. At step 2, the list is limited by β l and only 2 nodes are selected: 3 and 5. At step 3, the list of nodes is limited by β g and 3 nodes are selected: 7, 8 and 10. Finally at step 4, the list of nodes is limited by both parameters β l and β g and is computed as 12, 13 and 16. The search continues up to the verification of the stopping criteria. The core of the beam search approaches is the definition of the heuristic function h that is used to select at each step the next node to be expanded. To be efficient, this function should have two important properties: • in order to ensure that the algorithm converges to a solution, h should never overestimate the true cost to the reference, • in order to avoid to remove promising candidates, the difference between the true cost and its estimation computed by h should be as small as possible.
In [40], [41],  m[σ 2 m ref ). The cost function may be rewritten as (5): The function g(σ 1 , m 0 ) gives the cost, i.e., the duration in our case, of the already computed trajectory (σ 1 , m 0 ) and the function h(m, m ref ) approximates the residual duration of (σ 2 , m). For the part of the trajectory (σ 1 , m 0 ) already computed, the method in [40] computes the duration g(σ 1 , m 0 ) of (σ 1 , m 0 ) by transforming any untimed trajectory into a timed one. This algorithm uses the chronological firing order of the transitions in σ 1 and the earliest firing policy to update, at each new marking m, the residual durations of the transitions enabled at m. These transitions and their residual durations are stored in a calendar that is updated at each intermediate marking of the trajectory. Observe that the transformation previously mentioned and the use of the calendar are required because the usual method is based on untimed trajectories and adds a posteriori timing information. In this paper, on the contrary, we derive directly g(σ 1 , m 0 ) from the D-TERG.
For the unknown part of the trajectory, several estimations h(m, m ref ) of the residual duration from the current marking m to the reference m ref have been studied [30], [42]. In particular, Luo et al. in [41] propose a heuristic function based on the search of resource and operational places. Lefebvre proposes an estimation based on residual firing count vector [43]. The previous methods are based on the following schema: • search for the paths that exist in the T-TPN structure from all places that have at least one token at marking m, • compute the durations of these paths with Eq. (2), • estimate the residual duration to the reference marking with Eq. (6): where t 1 is the first transition in ph. In this paper, we refine the heuristic function h based on the D-TERG.

D. FILTERED BEAM SEARCH BASED ON D-TERG
Based on D-TERG( ) as previously defined, the cost function defined in Eq. (5) is reformulated for any state S ∈ S as: The B (S(k), S(k + 1))}, (8) where PH (S, S ) is the set of paths from state S 0 to those S in D-TERG( ). Observe that all paths that exist in TERG from S 0 to S are not in general reported in D-TERG( ) (as far as N < N E ). For this reason, depending on , the final results obtained with the combined use of the D-TERG and the FBS is not, in general, the optimal schedule. But, compared with the method in [40], the use of D-TERG has two advantages: • once D-TERG( ) is computed, g(S 0 , S, ) becomes easy to compute by Eq. (8) because timing information is included in D-TERG( ), and there is no need to use the transformation of untimed firing sequence into timed sequence as proposed in [40], • more important, all timing information available at state S is also explicitly encoded in (S) and will be used to Consequently h(S, S ref ) is refined as: +δ(S, t 1 ) + WT (S, t 1 )}}. (10) where t 1 is the first transition in ph. The heuristic function h(S, S ref ) never overestimate the true duration to the reference.

VI. APPLICATION TO A CASE STUDY
In this section, we aim to schedule the operations of the batch production system in order to minimize the total duration C max of the operations according to the T-TPN model of the system, the D-TERG and the new variant of FBS introduced in the previous section. Dijkstra algorithm, FBS algorithm [40], [41], and a variant of FBS based on D-TERG in this paper are applied to the case study. For these methods, solutions and performance are discussed with respect to the initial number of products to  be processed m 0 (p 1.0 ) = m 0 (p 2.0 ) = ρ(J 1 ) = ρ(J 2 ) = k. Table 2 sums up the results obtained for Dijkstra algorithm based on the computation of the TERG. For this purpose, a period g * = 10 minutes has been considered. In this table we report the number of markings in the usual reachability set R(m 0 ) (i.e., the size of R(m 0 ) denoted by N ), the number of states in TERG (i.e., the size of S E denoted by N E ), the average number of states in TERG for one batch (i.e., N E /k), the size (as a number of firings) of the control sequence that is found, the corresponding makespan C max and the computation time required to obtain the solution with an Intel(R) Core(TM) i7-6600U CPU @ 2.60GHz-2.80 GHz. In particular for k = 1, the solution σ 1 is found using Dijkstra algorithm in TERG: σ 1 = (t 6 , 30)(t 1 , 50)(t 7 , 70)(t 2 , 80)(t 8 , 110) (t 3 , 140)(t 9 , 160)(t 4 , 180)(t 10 , 220)(t 5 , 220) One can notice that Dijkstra algorithm fails rapidly when k increases due to the numerical complexity (we limit the search for TERG with a maximal size of 15,000 states). On the contrary, when a solution is found, it is of minimal duration. Table 3 sums up the result obtained for local optimization based on the use of FBS with parameters β g = 20 and β l = 20. In this table, the number of markings expanded by FBS is also reported, denoted by N . In particular for k = 1, two solutions σ 2 and σ 3 are found: σ 2 = (t 1 , 20)(t 2 , 50)(t 6 , 50)(t 3 , 80)(t 7 , 90) (t 4 , 120)(t 8 , 130)(t 5 , 160)(t 9 , 180)(t 10 , 220) σ 3 = (t 6 , 30)(t 7 , 70)(t 1 , 70)(t 8 , 110)(t 2 , 110) (t 3 , 140)(t 9 , 160)(t 4 , 180)(t 10 , 220)(t 5 , 220) Table 4 sums up the result obtained by the proposed variant of FBS based on D-TERG with parameters β g = 20 and β l = 20. The number of states in D-TERG (i.e., the size of S ), denoted by N , is reported in Table 4. In particular for k = 1, the solution σ 4 is found using the proposed variant of FBS: σ 4 = (t 6 , 30)(t 1 , 50)(t 7 , 70)(t 2 , 80)(t 8 , 110) (t 3 , 140)(t 9 , 160)(t 4 , 180)(t 5 , 220)(t 10 , 220) The size of D-TERG in Table 4 is 476 that is less than the size of TERG in Table 2 when k = 3. The size of D-TERG is 3456 when k = 20, it is still much less than 15000. One can notice that FBS and the variant of FBS based on D-TERG find always a solution when k increases but the optimality cannot be ensured. It is obvious that C max reflects the performance of solutions obtained by different algorithms, and that N E , N and N reflect the complexity of the different algorithms. However, these indicators strongly depends on k. In order to obtain performance and complexity indicators only depending on the Petri net structure and on the specific algorithm, but not on k, we further compute C max /k, N /k and N /k in Tables 3 and 4. Observe that values of C max /k, N /k and N /k converge to asymptotic values as k increases. It is better to use C max /k, N /k and N /k as the performance and complexity indicators, respectively. From Tables 3 and 4, we can find that the proposed variant of FBS has better performance indicators (i.e., C max /k) than classical FBS. At the same time, classical FBS and the new variant of FBS have better complexity indicators (i.e., N /k and N /k) than Dijkstra algorithm. Table 5 provides the duration of the solution found with respect to k as the required computational time to compute the solution for larger values of k. Note that the performance of the variant of FBS based on D-TERG also depends on the parameters β g and β l . By decreasing β g and β l , the size of D-TERG also decreases as the computation time, but the makespan increases. In particular, using a small number of global or local beams can lead to degraded performances: one can notice from Tables 4 and 5 that the makespan obtained for k = 5 is 880 min. when β g = β l = 20 in Table 4 and increases to 910 min. when β g and β l decrease to 5 in Table 5. This is a typical characteristic of the beam search method: The performance is obviously better when numerous candidates are expanded (i.e. large values of β g and β l ) but rapidity is penalized, whereas performance is weaker when only few candidates are expanded (i.e. small values of β g and β l ) but rapidity is improved.
To further illustrate the application and complexity of our method, we apply it to a more complex batch process with more jobs. This example is inspired by the model proposed in [9]. The model in [9] is first transformed and expanded into a T-TPN model shown in Fig. 7 with T = {t 1 , t 2 , . . . , t 15 }, P = {p 1 , p 2 , . . . , p 23 }, m 0 = (1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 0, 0, 0, 1, 1, 1 The T-TPN models a batch production system with five jobs J 1 , J 2 , J 3 , J 4 , and J 5 . Transitions t 1 to t 3 , t 4 to t 6 , t 7 to t 9 , t 10 to t 12 , and t 13 to t 15 define J 1 , J 2 , J 3 , J 4 , and J 5 , respectively. Minimal durations of transitions are shown in Table 6.    Tables 2, 3, and 4, r affects C max , N Pn , N E , N and N , in Table 7. Normalized indicators C max /N Pn , N E /N Pn , N /N Pn and N /N Pn tend to asymptotic values and may be used as the performance and complexity indicators, respectively. The distribution of these indicators for different algorithms is shown in Fig. 8. We can find that the distribution of the performance and complexity indicators of the new variant of FBS tend to smaller values, and it has better TABLE 7. Results of different algorithms for the first r jobs of Fig. 7.  Table 7.
performance and complexity than the other two approaches. This verifies that the new variant of FBS is suitable for solving the scheduling problem of complex batch production systems.

VII. CONCLUSION
In this paper, modeling and scheduling methods for batch production systems are proposed. The approach is based on T-TPNs and a new variant of FBS methods. A method to model a batch production system with multiple resource units as T-TPN with conflicts resulting from the use of resource units is proposed. Then a D-TERG is presented according to a sequence of states that are successively explored for the schedule issue. Based on T-TPNs and D-TERG, a variant of FBS is designed with an improved heuristic cost function to obtain optimal or suboptimal solutions. The application of this method to the batch production systems illustrates the proposed scheduling approach.
In the next work, fault diagnosis such as equipment failure and deadlock control will be considered for the scheduling issue of production systems and multi-agent systems [44]- [48]. In addition, maximal time constraints will be also introduced to consider chemical reactions that should respect maximal processing times.
JIAZHONG ZHOU received the M.S. degree from the Institute of Information Science and Engineering, Huaqiao University, Xiamen, China, in 2016. He is currently pursuing the Ph.D. degree in control theory and engineering with Xidian University, Xi'an, China. His research interests include Petri net theory and applications, the scheduling of discrete event systems, and the space abstraction of time Petri nets. ZHIWU LI (Fellow, IEEE) received the B.S. degree in mechanical engineering, the M.S. degree in automatic control, and the Ph.D. degree in manufacturing engineering from Xidian University, Xi'an, China, in 1989China, in , 1992, and 1995, respectively. He joined Xidian University, in 1992. He is currently with the Institute of Systems Engineering, Macau University of Science and Technology, Taipa, Macau. He has published two monographs with M. Zhou in Springer, in 2009, and Y. Chen in CRC Press, in 2013. His current research interests include Petri net theory and applications, the supervisory control of discrete event systems, workflow modeling and analysis, system reconfiguration, game theory, and data and process mining. He is listed in Who's Who in the World (27th Edition, Marquis, 2010). He is the Founding Chair of the Xi'an Chapter of the IEEE Systems, Man, and Cybernetics Society. He serves as a frequent Reviewer for over 80 international journals, including Automatica, and a number of IEEE TRANSACTIONS and many international conferences. VOLUME 8, 2020