Analysis of a Retrial Queue With Two-Type Breakdowns and Delayed Repairs

This article studies an M/G/1 retrial queue with two types of breakdowns. When the server is idle, it is subject to breakdowns according to a Poisson process with rate <inline-formula> <tex-math notation="LaTeX">$\delta $ </tex-math></inline-formula> and it cannot be repaired immediately. While when the server is busy, it may break down according to a Poisson process with rate <inline-formula> <tex-math notation="LaTeX">$\theta $ </tex-math></inline-formula> and can be immediately repaired. Firstly, based on embedded Markov chain technique and probability generating function (PGF) method, we present the necessary and sufficient condition for the system to be stable and the PGF of the orbit size at the departure epochs. Secondly, we give the steady-state joint queue length distribution by supplementary variable method, and present some important performance measures and reliability indices. Thirdly, we provide the analysis of sojourn time of an arbitrary customer in the system when the system is in stable state. Finally, some numerical examples are presented to illustrate the effect of the some system parameters on important performance measures and reliability indices.


I. INTRODUCTION
Retrial queues with unreliable servers have been investigated extensively, due to their applications in various fields, such as telephone switching systems, call centers, computer communication and telecommunication networks, manufacturing systems etc. On one hand, retrial queues can reflect the characteristics of customer service requirements, i.e., arriving customers who find the server unavailable may join into a retrial group (orbit) and ask for their services again some time later. For the survey papers, the books, the bibliographical information and recent literatures on retrial queues, readers are referred to Falin [10], Falin and Templeton [11], Artalejo and Gómez-Corral [4], Gómez-Corral [17], Artalejo [2], [3], Gao and Zhang [15], Zhang et al. [31] and references therein. On the other hand, due to some unexpected factors in reality, such as limited lifetime of the server, external interference, malfunctions of the server, starting failures, etc., the servers may break down and need repair during idle period or busy The associate editor coordinating the review of this manuscript and approving it for publication was Roberto Sacile.
period. Severs' failures and repairs were introduced by Aissani [1] and Kulkarni and Choi [21]. Since then, related studies regarding retrial queues with unreliable servers and repairs have been carried out successively from queuing and reliability viewpoints. In earlier relevant papers, the types of breakdowns of the servers may be divided into as follows: (1) active breakdowns, i.e., the server is subject to breakdowns when it is busy. In this case, the server's life time is often assumed to be exponential distributed. Wang [23] studied both queueing characteristics and reliability issues for an M/G/1 retrial queue with server breakdowns and general retrial times. Falin [12] dealt with an unreliable M/G/1 retrial queue, in which the server's lifetime follows exponential distribution and the repair time is generally distributed. Different from classical retrial queues with only one orbit queue, the retrial queue in Falin [12] has two waiting queues, one is normal waiting queue which is formed by the arriving primary customers who find the server unavailable at their arrival epochs, the other is orbit queue which is formed by those customers whose services are interrupted by the failures of the server. Chang et al. [6] considered a multi-server retrial queue with customer feedback and impatient, in which the server's breakdown is incurred by exponentially distributed lifetime when it is working. Yang et al. [29] considered an unreliable retrial queue with J optional vacations, where the server is subject to random breakdowns and repairs when he is working. Gao et al. [13] treated an M/M/1 retrial queue with an unreliable server from the economic viewpoint.
(2) passive breakdowns, i.e., when the server is idle, the server may break down and needs immediately repair. Taleb and Aissani [22] considered the performance measures and reliability indices for a new unreliable M/G/1 retrial queue, in which persistent and impatient customers, active and passive failures and preventive maintenances are both taken into account. Performance analysis was considered by Krishna Kumar et al. [19] for a Markovian retrial queue with passive and active breakdowns.
(3) catastrophic failures, i.e., the sever breakdowns are caused by external attacks or shocks (called as negative customers). In such retrial queues, if a negative arrives at a system, it removes one or all present customers in the system at once (called as individual or complete removal) and makes the server breakdown and repair. Many studies on such retrial queues have been carried out from queuing and reliability and economic viewpoints. Interested readers are referred to Wang et al. [24], Wang and Zhang [25], Wu and Lian [26], Wu and Yin [27], Gao and Wang [14] and references therein.
(4) starting failures, i.e., when the server is idle, an arriving (new or returning) customer must start the server. If the server is successfully started with a certain probability, the customer receives service immediately. Otherwise, the server undergoes repair immediately. Yang and Li [28] presented an M/G/1 retrial queue with the sever subject to starting failures. Krishna Kumar et al. [18] addressed the performance analysis of an M/G/1 retrial queue with feed back and starting failures. Atencia et al. [5] developed a discrete-time Geo/G/1 retrial queue with general retrial times, Bernoulli feedback and starting failures. Recently, Yang et al. [30] generalized the model of Krishna Kumar et al. [18] to a multi-server retrial system with feed back and starting failures. For more retrial queues with breakdowns and repairs, the readers are referred to the recent survey paper by Krishnamoorthy et al. [20].
Most unreliable retrial queues assume that the server can be immediately repaired when it breaks down. For example, Zhang [32] studied an M/M/1 retrial queue with passive breakdowns and active breakdowns from economic point, in which whenever any type of breakdowns occurs, the sever immediately enters a repair stage and the repair times for these two types of breakdowns are identical and exponential distribution. Zirem et al. [33] dealt with a batch arrivals retrial queue with active breakdowns, where the sever can be immediately repaired when breakdown happens and reserved service schedule is considered for the interrupted customer. However, in many realistic situations, such as in the area of computer communication networks and flexible manufacturing systems, etc, it may not be possible to start the repair process immediately due to non-availability of the repairman or of the apparatus needed for the repairs or due to being undetected timely. Recently, Choudhury and Tadj [9] studied the steady-state behavior of an unreliable retrial queue with a second optional service phase and delayed repair. Choudhury and Ke [7], [8], respectively, studied a batch arrival and a single arrival unreliable retrial queue with general retrial times under Bernoulli vacation schedule, in which the server is subject to active breakdown and delaying repair, i.e., when the server's failure occurs, it can begin its repair after some delaying time. For such retrial queues, the authors obtained some important performance measures and reliability indices.
In this article, we analyze an M/G/1 retrial queue with passive and active breakdowns and delayed repairs for passive breakdowns. To the best of the authors' knowledge, studies for such retrial queue do not yet exist. The motivation of this work is that such retrial queue arises in various practical fields, such as in communication networks and manufacturing systems, it not only characterizes the retrial phenomenon of customers, but takes the delayed repairs for passive breakdowns into consideration. Moreover, another motivation for considering such retrial model is to obtain analytical solution in term of closed form expression by supplementary variables technique and evaluate the performance measures and the reliability of the considered system which may be suited to many communication networks. The basic findings of the paper and their significance are outlined as follows: • We introduce a new repairable M/G/1 retrial queue with passive and active breakdowns, in which passive active breakdowns are subject to delayed repair. Such model has potential applications in packet-switching networks.
• We give the stable condition of the system, stationary analysis of joint distribution of the orbit size and the server's state. Based on these analysis, we can give the expressions of important performance measures of the system.
• Sojourn time of an arbitrary customer can reflect the quality of service of the system, so we present the expression of Laplace transform of the sojourn time of an arbitrary customer, and prove that Little's law still hold in our model.
• Reliability indices including the steady state availability of the server, the failure frequency of the server and the mean time to first failure of the server are provided. The rest of this article is organized as follows. Section 2 gives the system description and a practical example. Section 3 presents the stable condition of the system and the steady-state analysis, and gives some system measures. Section 4 focuses on the reliability indexes of the system. Section 5 studies the distribution of the sojourn time in the system of any customer. Section 6 gives some numerical examples to illustrate the features of our model.

II. MODEL FORMULATION AND A PRACTICAL EXAMPLE A. MODEL DESCRIPTION
In this section, we consider an unreliable retrial queue with two types of breakdowns and delayed repairs due to passive breakdowns. Assumptions of the queueing system are as follows.
• Arriving process and general service times. Customers from outside arrive at the system according to a Poisson processes with rate λ. The service time B of each customer follows an arbitrary distribution with cumulative distribution function (c.d.f.) B(x), probability density function (p.d.f.) b(x), finite first two moments β 1 , β 2 . If an arriving customer finds the server idle, the customer obtains service immediately. Otherwise the arriving customer who finds the server busy or inoperative because of failures will produce a source of unsatisfied customers, who may retry several times for service. Such unsatisfied customers are said to be in ''orbit'' and form a queue according to FCFS discipline.
• Two types of breakdowns and delayed repairs. The sever is subject to passive and active breakdowns, respectively, in idle period and busy period. When the server is idle, it breaks down at an exponential rate δ (called as a passive breakdown). When the server is busy serving a customer, it breaks down at an exponential rate θ (called as an active breakdown). When an active breakdown occurs, the server can be immediately repaired and the repair time R follows general distribution with c.d.f. R(x), p.d.f. r(x), finite first two moments ν 1 , ν 2 . However, due to lack of monitoring of the server in the idle period, when a passive failure happens, the server can not obtain immediate repair and stays there until a customer arrives at the service station from outside or the orbit if any. The repair time G for a passive failure follows general distribution with c.d.f. G(x), p.d.f. g(x), finite first two moments µ 1 , µ 2 . It is assumed that, when the service of a customer is interrupted by an active breakdown, the customer in service waits there to accept its remaining service as soon as the repair is completed. While the customer who starts the repair for a passive failure doesn't leave the service facility and can immediately obtain its service after the completion of the repair.
• Constant retrial policy. Under such retrial policy, only the first customer in the orbit is permitted to apply for service when the server becomes idle and the retrial time follows exponential distribution with rate α.
• All random variables defined above are assumed to be mutually independent.
Throughout the rest of the paper, for a c.d.f. F(x), we denote Obviously, we can obtain that F * (s) = 1− F(s) s . Define the functions β(x), µ(x) and ν(x) as the conditional completion rates for service time, for repair time for an active breakdown and repair time for a passive failure, respectively, i.e.,

B. A PRACTICAL APPLICATION EXAMPLE
Besides its theoretical interest, our retrial queue has potential applications in a packet-switching network, in which messages are divided into IP packets before they are sent. For instance, most modern Wide Area Network (WAN) protocols, including TCP/IP, X.25, and Frame Relay, are based on packet-switching technologies. The router is an interconnected device over which a packet is transmitted from a source host to to a destination host in a packet switching network. If the source host wishes to send a package to a destination host, it first sends the package to the router to which it is connected, and then the package is transmitted to the destination host. Assume packages arrive at the source host from outside according to a Poisson process. Upon receiving a package, the host immediately sends it to its router. If the router is available, the package is accepted and is transmitted immediately and the transmission time is assumed to be generally distributed. Otherwise the package is blocked by the router due to limitations in the TCP/IP network path MTU (Maximum Transmission Unit) or active breakdowns, in this case, the blocked package is stored in the buffer of the source host (called as orbit) and has to be retransmitted some time later according to FCFS. Besides, due to external attacks or other technical faults, the router may break down during idle period or during the packet transmission period. We assume that the network administrator who is responsible for failure management of the network always does some secondary auxiliary jobs when the router is idle until a packet arrives at the router and always is on duty when the router is busy. If the router fails when it is transmitting a packet, it can be immediately repaired by the network administrator and resumes the transmission of the interrupted packet as soon as its repair process is completed. While if the router breaks down when it is idle, the repair may be delayed till the arrival epoch of the next packet from outsider or the orbit at which the network administrator returns and immediately begins the repair process of the router. The time interval from the epoch at that the passive failure occurs to the epoch at which the next packet arrives is called as delayed period.
Here the packet who arrives during the delayed period can be transmitted immediately after the completion of the repair for the passive failure. This scenario can be modelled as our retrial queueing system with two-type failures and delayed repairs.

III. STABILITY CONDITION AND STEADY-STATE ANALYSIS
This section focuses on investigating the stability condition of the system and deriving some steady state distributions of the system, respectively, by embedded Markov chain technique and supplementary variable method.

A. STABILITY CONDITION
Let S B be the generalized service time interval of a customer from the beginning of his service to the end of his service, with c.d.f S B (x), LST S B (s). Taking into account the possible occurrence of active breakdowns in the service process, In the following, we give some useful notations: where a k is the probability that there are k customers who enter into the orbit during the repair time for a passive failure, h k is the probability that k customers who join the orbit during the generalized service time.
then c k is the probability that k customers enter into the orbit during the passive repair time and generalized service time.
To develop the necessary and sufficient condition for the system to be stable. we first establish the embedded Markov chain of the system at departure epochs.
Let T k (T 0 = 0) be the time epoch at which the k-th customer leaves the system, N k = N (T k ) be the orbit size at the time of the kth departure, then the process {N k , k ≥ 0} is a Markov chain with state space N. Then we have the following theorem.
The same inequality is also the necessary condition for ergodicity. Assume that ρ + δ λ+α+δ ρ 1 ≥ α λ+α , which implies that x m ≥ 0 for all m ≥ 0. Furthermore, according to the one-step transition probabilities, we know that the down drift VOLUME 8, 2020 which implies that the Markov chain {N k , k ≥ 0} satisfies Kaplan's condition namely if the sequence {D m , m ≥ 0} is bounded below. Thus the Markov chain {N k , k ≥ 0} is not ergodic, and then the necessity of the ergodicity is proven.

Remark 1 (Special Case):
Suppose that no passive breakdown occurs in the retrial system, i.e., δ = 0, then our system is reduced to the M/G/1 retrial queue with active breakdowns and constant retrial times, which is a special case by taking A(x) = 1 − e −αx , x > 0 in Wang [23].

B. STEADY STATE ANALYSIS
In this subsection, we study the steady state distribution of the system by using supplementary variable method.
At time t, the state of the system can be described by the Markov process {(N (t), J (t), ξ 1 (t), ξ 2 (t), ξ 4 (t)) , t ≥ 0} , where N (t) is the number of customers in the orbit, J (t) denotes the state of the server defined as: 0, the server is idle 1, the server is busy 2, the server is under repair for an active breakdown 3, the server is during the delayed period 4, the server is under repair for a passive breakdown when J (t) = 1, ξ 1 (t) is the elapsed service time; when J (t) = 2, ξ 2 (t) is the elapsed repair time for an active breakdown; when J (t) = 4, ξ 4 (t) denotes the elapsed repair time for a passive failure.

Theorem 3.3: (1) The marginal PGF of the orbit size when the server is busy is give by
αP 0,0 .
(2) The marginal PGF of the orbit size when the server is under repair for an active breakdown is given by

) The marginal PGF of the orbit size when the server is under repair for a passive breakdown is given by
, which denotes the PGF of the number of customers in the orbit, let N S be the number of customers in the system at arbitrary time under stability condition, with PGF (z) = E[z N S ]. Then by (z) = 4 j=0 P j (z) and (z) = P 0 (z) + zP 1 (z) + zP 2 (z) + P 3 (z) + zP 4 (z), we can obtain the following Corollary.

C. PERFORMANCE MEASURES OF THE SYSTEM
Based on the results given in section 3.2, the main purpose of this subsection is to provide main performance measures of the queueing system. By direct calculation through L'Hospital's rule and routine differentiation, we can have the following Theorem 3.4.

Theorem 3.4: (1) Under the steady state condition, we have the following results:
• The Probability P 0 that the server is idle is given by • The Probability P 1 that the server is busy is given by • The Probability P 2 that the server is under repair for an active breakdown is given by P 2 = P 2 (1) = θ ν 1 P 1 .
• The Probability P 3 that the server is during delayed period is given by • The Probability P 4 that the server is under repair for a passive breakdown is given by Next, we make the analysis of a cycle of the system. A cycle of the system is defined to be the length of the period that starts at the epoch when the server completes a service and the orbit is empty, and ends at the epoch at which the server becomes idle and the orbit is empty once again. Obviously, = 0,0 + 0,1 4 j=1 j , where 0,0 is the length of the server's idle period with empty orbit, 0,1 is the length of the server's idle period with nonempty orbit, 1 is length of the server's busy period, 2 is length of possible repair period for an active breakdown, 3 is length of possible delayed period, 4 is length of possible repair period for a passive breakdown. Taking into account the possible occurrence of a passive failure in server idle period, we have that E[ 0,0 ] = 1 λ+δ . By applying the argument of an alternating renewal process, we know that Then the expressions for 0,0 , 0,1 , j , j = 1, 2, 3, 4, are given as follows: , j = 1, 2, 3, 4, . VOLUME 8, 2020

IV. RELIABILITY ANALYSIS
In this section, we aim to provide some important reliability indexes of the queueing model based on the results obtained in Section III. Suppose that the system is stable, let A be the steady state availability of the server, W f be the failure frequency of the server, then we have that (14) Next, we focus on studying the mean time to first failure MTTF of the server.
At initial time t = 0, the system is assumed to be empty and the server is idle, i.e., P 0,0 (0) = 1. Let Y be the time to the first failure of the server, then the reliability function of the server is . The expressions of U * (s) and MTTF are given in the following Theorem.
, where ζ (s) is the minimum absolute value root of the equation in the unit circle and Res(s) > 0.
(2) The expression of MTTF is given by .
Proof: To find U (t), define the failure states J = 2, 3, 4 of the server are absorbing states. For the new system with absorbing states, using the same notations as in Section 3, we know that and we have the following set of differential equations at time t: 1 (t, x), n ≥ 0, x > 0, (16) P n,1 (t, 0) = λP n,0 (t) + αP n+1,0 (t), n ≥ 0, where δ 0,n is the Kronecker's symbol.
By Rouché's theorem, the denominator of (25) has exactly one zero point z = ζ (s) inside the unit circle and it is also the zero point for the numerator of (25), which leads to .

V. ANALYSIS OF THE SOJOURN TIME IN THE SYSTEM
Sojourn time of an arbitrary customer can reflect the quality of service of the system. Based on this point, this section is devoted to discuss the distribution of the sojourn time T of any arbitrary tagged arriving customer, which is the length of the time interval from the epoch at which the tagged customer arrive at the system to the epoch at which the tagged customer leaves the system with his service completion. Let T (s) = E[e −sT ], by conditioning on the system's state at the tagged customer's arrival epoch, we have that where To derive the explicit expression of T (s), it is necessary to introduce two auxiliary random variables, one is the random variable T 1 , which denotes the length of time interval calculated from the epoch when the server becomes idle and the tagged customer is at the head of the system to the epoch when the tagged customer leaves the system; the other is the random T d , which denotes the length of time interval calculated from the epoch when a passive breakdown of the server occurs and the tagged customer is at the head of the system to the epoch when the tagged customer leaves the system.
With the help VOLUME 8, 2020 of the auxiliary variable T d , we can derive the expression of T 1 (s). Lemma 5.1: The Laplace transform T 1 (s) of T 1 and its mean value are given by as follows: Proof: For T 1 (s), by considering the order of the new arrival from outsider, the passive failure and the retrial time of the tagged customer who is at the head in the orbit, we have that Similarly, for T d (s), by conditioning on the beginning epoch of the repair for the passive breakdown whether at the epoch of the arrival from the outside or from the orbit, we have that Following from (32) and (33), we can obtain the result (30). By differentiating T 1 (s) with respect to s and then taking limit s → 0, i.e., E[T 1 ] = − d ds T 1 (s)| s=0 , we can get (31). Now we can derive the expressions of T (s) and the mean value E[T ], which are given by Theorem 5.1.

Theorem 5.1: The Laplace transform T (s) of the sojourn time T and its mean value E[T ] are as follows
where Proof: Recall that in reliability theory, if a nonnegative random X denotes a life time of a unit, with p.d.f f (x), c.d.f F(x), then the random variable X x = X − x|X > x is called residual lifetime, and the p.d.f f x (y) of X x is given by f x (y) = f (x+y) F(x) . In the following we first consider T k,1 (x; s) = E e −sT |N = k, J = 1, ξ 1 = x . Given that the tagged customer finds that the system is in the state (N , J , ξ 1 ) = (k, 1, x) at its arrival epoch, then the tagged customer joins the (k + 1)-th position of the orbit and its sojourn time is the sum of the three random variables: is the sum of k + 1 independently and identically distributed (i.i.d.) random variables with generic random variable T 1 , M is the number of active breakdowns occurring during B x , and R (M ) is the total repair times for M active breakdowns. Therefore we have that Adopting similar analysis line to the above, we can obtain the expressions for T k,2 (x, y; s) and T k,4 (x; s) as follows: Remark 3: Eq.(36) shows that the Little's law still holds in our retrial queue system, which will also be shown by the following numerical examples.
Under the stationary condition ρ + δ λ+α+δ ρ 1 < α λ+α , the base case for setting these system parameters is set below: δ = 0.25, θ = 0.95, µ = 2, and ν = 4. We assume that the values of the retrial rate α varies from 1 to 10 in the following Figures 1-4 and Tables 1-4, and each of the system parameters δ, θ, µ, and ν takes turn to change in a certain rang but keeps other system parameters fixed given in the base case. The purpose of this section is to illustrate the effect of these parameters on some important reliability indices, including steady-state availability A, the failure frequency W f and mean time to first failure of the server MTTF, and queueing measures, including the mean system length L s , the expected length of a cycle • The increase in the passive failure rate δ and active failure rate θ makes the server breakdown more frequently, and then decrease the steady-state availability A and the mean time to first failure of the server MTTF, but increase the failure frequency W f , the system length L s , the expected length of a cycle E[ ] and the mean sojourn time of an arbitrary customer E[T ], which is shown in Fig.s 1, 2 and Tables 1,2.
• The increase in µ and ν can shorten the repair time of the server and makes the server more available, which increases the steady-state availability A, but decreases the failure frequency W f , the system length L s , the expected length of a cycle E[ ] and the mean sojourn time of an arbitrary customer E[T ], which is shown in Fig.s 3, 4 and Tables 3,4. However, the changes in the values of µ and ν in the repair times of passive and active failures have no effect on the mean time to first failure of the server MTTF, because it is not calculated after the server fails for the first time, which can be seen from Tables 3,4.

VII. CONCLUSION
In this article, we have conducted an exhaustive study on an unreliable M/G/1 retrial queue with two-type breakdowns: one is passive failures with delayed repairs, the other is active breakdowns with immediately repair. Of course, such delayed repair process is different from that incurred by starting failures. The feature of starting failures is that the server may be broken down at the arrival epoch of a customer who arrives from outside or orbit and finds the server idle, in this case, the customer must start the server to receive its service. If the server is unsuccessfully started with some probability, it immediately accepts repair, otherwise, if it is successfully started with complimentary probability, it immediately renders service to the customer. However our delayed repair process is that when the server breaks down in idle period, i.e., a passive breakdown occurs, the server can begin its repair at the arrival epoch of a customer from outside or orbit. That is to say, the repair process of a passive failure is started by the next arriving customer (new or returning). For this model, we analyzed the sufficient and necessary condition for the system to be stable, the stationary queueing indexes, sojourn time in the system from the queueing viewpoint, and obtain reliability measures such as availability, server failure frequency, and mean time to first failure from reliability viewpoint. Some numerical examples were given to study the effect of some parameters on the important performance measures and reliability indices of the model. As one direction of further future research, it is very interesting to develop the discrete-time counterpart of our continuous-time retrial queue, the reason is that the discrete-time queueing system is more feasible to model computer and telecommunication systems. Another direction of future research, one can consider the equilibrium balking policy for the Markovian counterpart of our retrial queue from economic viewpoint.