Coordination Protocol and Admission Control for Distributed Services in System-of-Systems With Real-Time Requirements

System-of-Systems (SoS) offer unprecedented potential for new types of emerging services, which significantly exceed the capabilities of the constituting systems. SoS in safety-critical domains (e.g., medical applications, smart grid, disaster recovery, defense) are prominent examples, but they have stringent real-time and reliability requirements. Therefore, a suitable temporal and spatial allocation of resources is required both within each constituent system and in the wide area networks between them. This paper introduces an algorithm for admission control and resources’ allocation, which considers these requirements and the autonomy of the constituent systems. To simulate a realistic admission control and resources’ allocation process of a typical SoS network, a simulated case study with eight constituent systems, six services, and twenty-five processes/requests is developed. The suggested admission control and resources’ allocation process’s performance is measured in terms of gain in the execution time and blockage probability. A sensitivity analysis is carried out to evaluate the influence of the number of constituent systems and the number of services sought by the received processes/requests on the efficacy of the proposed process. The results show that the proposed admission control and resources’ allocation process have very low blockage probability, high gain in the execution time, and high resources’ utilization.

applications and services driven by a mixture of these tech- 23 nologies, has been introduced to cope up with our infor-24 mation age, thus research around SoS network architecture 25 The associate editor coordinating the review of this manuscript and approving it for publication was Amjad Ali. and paradigm becomes more in focus. Electronic health 26 care services [6], remote security and monitoring [7], pre-27 cision agriculture [8], aviation [9] and industrial automa-28 tion [10] are some examples of these technology driven 29 applications and services which can be effectively integrated 30 and implemented in the context of SoS. However, to estab-31 lish a reliable and effective SoS infrastructure for these 32 services, the conventional information technology (IT) infras-33 tructure that solely depends on the resources available at 34 one single network, should be revamped in a way to support 35 the diverse resources needed by these services. Thus, the 36 SoS infrastructure requires the integration and collaboration 37 introduced. In [17] and [18], we focused on the security and 94 network resource reservation between different CSs within 95 the SoS utilizing the Multiprotocol Label Switching (MPLS) 96 paradigm. In [19], the SoS network architecture taking into 97 account the services' provisioning concept for SoS have 98 been described. Furthermore, a preliminary description of a 99 proposed distributed admission control algorithm has been 100 described in [20]. This paper expands our previous work 101 in the field of SoS, mainly the one presented in [20] by 102 expanding the proposed admission control algorithm, where 103 a more detailed description is provided, in addition to pre-104 senting a detailed simulation and performance evaluation of 105 the proposed algorithm for SoS network architecture, which 106 addresses the problem of performing admission control for 107 service provisioning utilizing the distributed SoS resources. 108 Further, the paper provides a more holistic view on the net-109 work architecture, considering an SoS paradigm, where the 110 resources must be first checked from availability point of 111 view, then reserved and utilized by the requesting service. 112 The proposed algorithm has been evaluated by simulation, 113 which showed the effectiveness of the proposed algorithm. 114 In summary, the original contributions of this work can be 115 summarized as follows: 116 • The development of an admission control and resources' 117 allocation algorithm that considers the application under 118 consideration requirements and the autonomy of the 119 CSs.

120
• The verification of the proposed algorithm with respect 121 to a simulated case study, properly designed to mimic 122 a realistic admission control and resources' allocation 123 process of a typical SoS network.

124
• The evaluation of the proposed algorithm's effectiveness 125 by computing various performance metrics from the 126 literature, such as the gain in the execution time and the 127 blockage probability. 128 • The investigation of the influence of the number of 129 CSs and the number of services sought by the received 130 processes/requests on the efficacy of the proposed algo-131 rithm by a proper sensitivity analysis. 132 Furthermore, most of the published works in the litera-133 ture propose admission control algorithms for one system, 134 while this work focuses on proposing an admission con-135 trol algorithm for distributed services in SoS with real-time 136 requirements. 137 The rest of the paper is organized as follows: Section II 138 summarizes the state of the art papers. Section III describes 139 the SoS network architecture. Section IV discusses the pro-140 posed admission control and resource allocation algorithm. 141 The performance evaluation of the proposed algorithm is 142 presented in Section V. Finally, the paper is concluded in 143 Section VI and future recommendations are stated.

145
The SoS literature covers a wide range of applications, types, 146 and research issues. We limit ourselves to the admission 147 VOLUME 10, 2022 control and scheduling part. There has been numerous works 148 in the field SoS, however, there is a little work that focuses 149 on and solve issues related to the coordinated and distributed 150 admission control and scheduling for SoS networks. 188 An aircraft fleet planning and architecture framework 189 assessment for air mobility and distribution using a system-   The work in Roy et al. [30], proposed two integer linear 213 programming (ILP)-based schemes, namely ILP with explicit 214 time reduced (ILP-ETR) and ILP with non-overlapping 215 constraints (ILP-NC), for optimally scheduling real-time 216 precedence-constrained task graphs (PTGs) on platforms 217 composed of heterogeneous processing elements intercon-218 nected through a set of heterogeneous shared buses in contrast 219 to conventional schemes that deal with homogeneous ele-220 ments and communication channels. The suggested schemes 221 were shown to be realistically efficient when tested on an 222 automotive cruise controller case study. In the same con-223 text of a heterogeneous distributed platform, the work of 224 Roy et al.
[31], proposed two low-overhead heuristic algo-225 rithms, namely global slack aware quality-level allocator 226 (G-SLAQA) and total slack aware quality-level allocator 227 (T-SLAQA) for optimally scheduling real-time directed-228 acyclic task graph (DTG) combined with multiple quality-229 level tasks, with convenient computational efforts. The 230 proposed schemes were shown to be more effective than the 231 traditional ILP scheme when tested in an automotive traction 232 controller case study. Similarly, in the work of Roy et al. [32], 233 the authors proposed a low-overhead heuristic algorithm, 234 namely the contention cognizant task and message sched-235 uler (CC-TMS) for optimally scheduling real-time DTG with 236 convenient computational efforts. The proposed scheme was 237 shown to be more effective than the traditional ILP scheme 238 when tested in an automotive traction controller case study. 239 The authors in [33] illustrated the use of AI to sort space 240 habitation sub-systems for NASA technological groups and 241 to classify applicable sources of data for these sub-systems. 242 The authors demonstrated how AI agents can support the 243 recovery and retrieval of composite information needed to 244 feed existing SoS analytic tools and discussed possible chal-245 lenges and future steps.

246
The authors in [34] discussed the problem of managing 247 volatile and unpredictable variations of SoS networks. The 248 authors proposed a dynamic reconfiguration scheme that 249 aims at enhancing the SoS agility to quickly respond from 250 failures. The proposed scheme employs estimated dynamic 251 programming technique to calculate the dynamic recon-252 figuration choices and decisions that can allow failed or 253 degraded sub-systems to be detached and the allocation of 254 new resources to be changed rapidly. 256 As shown in Fig. 1, a typical SoS consist of several CSs 257 where each CS can deliver certain services provided by its end 258    delays, network reliability). Confirmation messages will 293 transform the short-term admission into a long-term admis-294 sion. Short-term admissions without a confirmation from the 295 initiator will expire at the target-CS.

E. LONG TERM ADMISSION AND EXECUTION OF SoS
The selected target-CS will execute a distributed algorithm 299 for the incremental resource reservation of these CSs. There-300 after, the SoS-application is executed based on the allocated 301 resources and the service contracts. Service revocations can 302 occur in case of resource conflicts with SoS-applications of 303 higher criticality.

IV. ADMISSION CONTROL AND RESOURCES ALLOCATION 305
To establish the process of dependable SoS applications, 306 a coordination protocol needs to be defined that is responsible 307 for resources' scheduling among CSs in a distributed manner. 308 In the proposed SoS network architecture, several requests 309 VOLUME 10, 2022  different constraints increases the admission process 365 complexity. As such, heuristic resource allocation algo-366 rithms must be proposed in all CSs to deal with the 367 problem efficiently.

368
To have an optimal distributed resource allocation between 369 different CSs, a distributed resource allocation protocol is 370 proposed. The protocol runs on all CSs of the SoS and imple-371 ments the following tasks: 372 1) Resource Discovery. Each CS must have up-to-date 373 knowledge about the available resources at its ESs 374 with their current status. To achieve that, a resources 375 allocation manager (RAM) is proposed, which runs a 376 periodic resource discovery process to explore any new 377 resources that were added and also to exclude van-378 ished resources. While running the discovery process, 379 a resources allocation  a resource reservation process will take place. 429 To achieve that, a resource reservation protocol 430 (RRP) is proposed. The RRP has the following main 431 functionalities:

432
• Process Pre-admission: When a process arrives to 433 the SoS, the RRP will admit it temporarily till it 434 checks whether it can fulfil its constraints. The 435 RRP will save the received requests in a tempo-436 rary queue, extract its requirements and check with 437 other SoS for resource availability.

438
• Path Determination: the RRP will consult the CSM 439 of the CS in order to determine all possible paths 440 toward the next CS. Different paths can be utilized 441 to choose the one that can fulfil the process con-442 straints.

443
• Resources Reservation: Once the path of the next 444 CS is determined, the RRP will send a resource 445 reservation request (RRR) with the following 446 fields: The Process ID, which defines the ID of 447 the process that is asking for a resource. The 448 Source CS ID is the ID of the CS that the pro-449 cess belongs to. The Resource ID is the ID of the 450 resource required by the process and the Priority 451 Level defines the process priority level to assess 452 its criticality. The constraints define the process 453 constraints in terms of E2E delay, fault tolerance, 454 security and reliability.

455
• Process Admission. The RRR will be sent to all 456 the required CSs needed by the process. If all 457 the needed CSs can fulfill the process require-458 ments, then the needed resources in all CSs will 459 be reserved, and the process is moved from the 460 pre-admission queue in the receiving CS to the 461 admitted queue. All the CSs will update their RAT 462 to reflect the new admitted processes. However, 463 if the reservation process was unable to fulfill the 464 process requirements and constraints, then another 465 admission process called priority-based admission 466 (PBA) will take place, which is used as a miti-467 gation procedure for the potential failure of the 468 normal admission process. This may happen if the 469 admission process was unable to admit a process 470 since other processes are occupying the available 471 resources. In this case, the process priority should 472 be considered. The flow chart and the unified mod-473 eling language (UML) sequence diagram of the 474 admission control algorithm are depicted in Fig. 1, 475 and Fig. 1   and it needs the following resources in order: R 1 , R 2 , and R 5 . 512 Currently it is scheduled as follows:

513
It will be served first by CS 1 : R 1 , the busy time slot will 514 be from 0 to 75 ms, and the CS-route is as follows: CS 1 , CS 2 , 515 and then CS 2 again. One can notice that the needed resources 516 for this process have been reserved in the respected CSs. For 517 instance, after the process is served by CS 1 :R 1 , it will be 518 routed to CS 2 and be served by R 2 , the busy time slot for 519 CS 2 :R 2 will be from 75 to 125 ms. Notice that the processing 520 time for CS 2 : R 2 is equal to 50 ms according to the RPT entry. 521 Further, to simplify the scheduling problem, we ignored the 522 communication and transmission delay between CSs. Finally, 523 the process will be delivered to CS 2 again, where it will 524 occupy R 5 from 125-175 ms. Here the processing time for 525 CS 2 :R 5 is equal to 50 ms.

526
Once a resource reservation request arrives, the admission 527 control manager at the receiving CS checks the resources 528 utilization trees for all the resources in all CSs, and assigns 529 the required resources to the available ones such that the 530 assignment process will meet the process constraints. In this 531 example, the admission control manager of CS 1 will attempt 532 first to assign P 1 to CS 1 , since CS 1 :R 3 is free, then it will 533 attempt to find the second needed resources (R 1 ) to a suitable 534 CS, in this case, there is two available free resources for 535 R 1 , one in CS 1 with RTP equals to 75 ms, and another 536 one in CS 4 with RPT equals to 100 ms. The admission 537 control manager will chose the one with lowest RTP (i.e. 538 CS 3 ), after that, the admission control manager will try to 539 allocate the last resource to P 1 (R 4 ), which is allocated 540 to CS 2 .

FIGURE 6. Demonstration example of the admission control process and resources allocations.
After discussing the RAT tables and entries. We will dis-542 cuss the process of admitting a new request (P 2 ) which 543 assumed to arrive at the same time of the previous two 544 processes. The process P 2 request table is shown where the 545 process was originated from CS 0 , it is requesting the follow-546 ing resources (R 1 , R 2 and R 5 ), the process has a 250 ms E2E 547 delay constraint. Finally, the process priority is equal to 1.

548
In order to decided whether the process can be admitted or

566
Note that the assignment process may become more com-567 plicated, especially if the goal is to perform the resources allo-568 cation in an optimal manner. Another challenge appears more 569 than one request arrive at the same time with different priori-570 ties. Then the admission process has to take into account not 571 only the available resources, but also the process priorities. 572 In some case, it may issue a resource revocation command to 573 an assigned process to a specific resource with lower priority, 574 to allow higher priority processes to be served. However, this 575 revocation process should be done without jeopardizing the 576 constraints of the lower priority process. For example, if a 577 new request arrives (P 3 ) as shown in Table 1, with high prior-578 ity (PL=1) and with a very strict E2E delay (150 ms), which 579 requires two resources (R 1 , R 2 ). Then if the admission control 580 manager assigns the free available resources that exists on 581 CS 4 , CS 1 , respectively, then the process will miss its E2E 582 delay constraints, since R 1 and R 2 in these CSs require 200 ms 583 processing time, which is higher than the 150 ms E2E delay 584 constraints of P 3 . However, according to the priority level 585 of P3, the admission control manager will give the priority 586 to P 3 compared to the P 0 and P 1 (priority level is 2 and 3, 587 respectively). Then, the resources requested by P 3 will have 588 higher priority than P 0 and P 1 . Thus, P 3 may get admitted 589 and both P 0 and P 1 may get executed later (if they will still 590 meet their E2E delays) or blocked.

592
In this section, an SoS network architecture embedded with 593 the proposed admission control process and resources' allo-594 cation has been designed and evaluated. The SoS network 595 architecture has been simulated to mimic an SoS senario 596 VOLUME 10, 2022  The number of services to be requested by each pro-633 cess/request is selected randomly from an arbitrary 634 range of values that span the interval [1,6], where 1,6 635 are the minimum, maximum number of services that a 636 process could request, respectively.

637
• The order of services to be requested by each process is 638 randomly initiated.

639
• The priority level of each request is selected randomly 640 from an arbitrary range of values that span the inter-641 val [1,3], where 1 and 3 are the minimum and the 642 maximum priority level, respectively implied by each 643 process/request.

644
• The E2E delay of each process/request is selected ran-645 domly from an arbitrary range of values that span the 646 interval [300, 450] ms, with a step size of 10 ms, where 647 300 and 450 are the minimum, maximum E2E delay 648 constraint, respectively implied by each process/request. 649 • The number of processes/requests to arrive at a time is 650 selected randomly from an arbitrary range of values that 651 span the interval [1,2], where 1 and 2 are the minimum, 652 maximum number of processes/requests, respectively 653 that could be arrived at a time instant.

654
• The processes/requests arrive according at Poisson dis-655 tribution with a parameter (λ) equals to 20.

656
For clarification purposes, Table 2 and Table 3 show 657 the randomly generated CSs and processes/requests, respec-658 tively. For instance, looking at Table 2, one can notice that 659 the first CS (CS 1 ) can deliver the six considered services (i.e., 660 R 1 −R 6 ), each with a particular processing time. For example, 661 the processing time needed for R 1 by CS 1 is equal to 25 ms, 662 whereas the last CS (CS 8 ) can only deliver five services (i.e., 663 all except R 5 ), each with a particular processing time. For 664 example, the processing time needed for R 1 by CS 8 is equal 665 to 40 ms. 666 Furthermore, Table 3 reports the simulated random pro-667 cesses/requests received at the SoS under consideration, their 668 requested constraints, i.e., execution order, priority, and the 669 E2E delay. For instance, the second service (P 2 ) to be 670 received either individually or together with the previous or 671 the other subsequent process (e.g., P 2 or P 2 in this case, 672 respectively) requires the execution of the following services 673 in a chronological order as follows: R 1 , R 3 , R 6 , then R 4 . 674 It necessitates the execution of the above-mentioned services 675  with a maximum E2E equals to 300 ms. Last, the priority level 676 of P 2 is low (e.g., is equal to the lower bound 3), which entails 677 that receiving two processes at the same time (i.e., P 2 and P 3 ) 678 implies the execution of either P 3 or P 2 since the priority level 679 of P 3 is high (i.e., 1).

680
The admission control process and resources' allocation   In summary, Table 4 reports the above-mentioned SoS 702 performance metrics used in the simulation which shows a 703 considerable average gain in time (for illustration, the average 704 gain in time is computed as per Eq. (1)) and a low blocking 705 probability (for illustration, the average gain in time is com-706 puted as per Eq.(2)), thus illustrating the effectiveness of the 707 proposed admission control algorithm.

726
• The processes/request P 24 is blocked due to the fact 727 the CSs available to fulfil the process' services (i.e., six 728 services as per Table 3) require more time (i.e., 411 ms) 729 than the required E2E of the process/request P 24 (i.e., 730 400 ms). Thus, P 24 is blocked. • At t = 0, the first process/request (P 1 ) is received.

742
It requires the execution of the following services in 743 order (Table 3): R 1 , R 3 , R 6 , R 5 with a priority level of 744 execution equals to 3 and E2E delay equals to 410 ms. 745 The CSs that fulfil the process/request's services with 746 minimum execution time are occupied, they are: CS 1 , 747 CS 2 , CS 3 , CS 3 , respectively. It is worth to mention that 748 these CSs will be occupied for the requested services of 749 the P 1 until they have been fully fulfilled.

750
• Two processes/requests have been received later at time 751 t = 14 ms and executed on time, i.e., P 12 and P 3 . 752 However, since P 3 is with a high priority level (i.e., 1) in 753 execution with respect to the P 2 (i.e., 3), it will be exe-754 cuted first, as shown in the Figure. Notice, for instance, 755 that P 3 requires the execution of service R 1 first, thus the 756 CS that offers this service with the minimum execution 757 time is CS 1 and it will be now reserved for P 3 , while at 758 the same time P 2 also requires the service R 1 , but the 759 next best option for P 2 /R 1 is CS 3 .

760
• The remaining processes/requests have continued to be 761 received and executed/blocked on timely basis following 762 a Poisson distribution function depending on the CSs 763 resources availability and their ability to fulfil the pro-764 cesses/request E2E delay constrains.

765
• Last, it is worth mentioning that no time slots have been 766 reserved for the blocked P 24 process/request.

767
To further study the performance of the proposed admis-768 sion control process and resource allocation, a sensitivity 769 test has been carried out. Specifically, the influence of two 770 parameters of the SoS network architecture on the gain in the 771 execution time (in ms) and the blockage probability has been 772 investigated. The two parameters are:   2) The number of resources required by each received