Reinforcement Learning Based Fault-Tolerant Routing Algorithm for Mesh Based NoC and Its FPGA Implementation

Network-on-Chip (NoC) has emerged as the most promising on-chip interconnection framework in Multi-Processor System-on-Chips (MPSoCs) due to its efficiency and scalability. In the deep sub-micron level, NoCs are vulnerable to faults, which leads to the failure of network components such as links and routers. Failures in NoC components diminish system efficiency and reliability. This paper proposes a Reinforcement Learning based Fault-Tolerant Routing (RL-FTR) algorithm to tackle the routing issues caused by link and router faults in the mesh-based NoC architecture. The efficiency of the proposed RL-FTR algorithm is examined using System-C based cycle-accurate NoC simulator. Simulations are carried out by increasing the number of links and router faults in various sizes of mesh. Followed by simulations, real-time functioning of the proposed RL-FTR algorithm is observed using the FPGA implementation. Results of the simulation and hardware shows that the proposed RL-FTR algorithm provides an optimal routing path from the source router to the destination router.


I. INTRODUCTION
Nowadays Network-on-Chip (NoC) became popular on-chip communication paradigm for many core systems. NoCs facilitate parallel processing by providing high bandwidth and low latency, which helps to deliver high computational power for real-time as well as safety-critical applications [1]. NoCs are designed using regular or irregular topologies. Mesh is a basic regular topology formed by interconnecting the neighboring routers in a grid manner. The structure of mesh topology is simple and easy to explore.
In NoC, routers route the packets received from the source cores towards the destination cores via links. The routing algorithm embedded inside the router is responsible for forwarding the packets towards destination, which plays a vital role in successful delivery of packets [2]. Miniaturization of transistor and technology scaling help to integrate more number of transistors in a small chip area. NoC serves the The associate editor coordinating the review of this manuscript and approving it for publication was Nitin Nitin . requirements of more processing elements (PE) or cores, resulting in high switching activity, and heat dissipation. As a consequence, network components are likely to malfunction [3]. Failure of the network components adversely affects the performance and reliability of the system. So, to improve reliability and efficiency, NoCs require faulttolerant approaches to tackle faults.
In general, conventional fault-tolerant routing algorithms (or) shortest path routing algorithms make decisions based on the predefined rules. Also, the packets always go through the same node in the path to the destination because of lack of intelligence, which creates congestion and queuing problems. The rules are user-defined based on frequently occurred routing problems observed by the programmer, and on every new scenario, human intervention is required to update the rules [4]. However, as the number of routing problems increases, demand to define new rules increases to accurately address all routing problems, resulting in loss of efficiency or accuracy. In Machine Learning (ML), algorithms are programmed to learn to perform the task [5]. During the learning process, the ML algorithm acquires knowledge of different routing scenarios, which helps to handle complex situations efficiently and accurately. This motivated us to work on ML to propose a fault-tolerant routing algorithm for mesh-based NoC.
Machine Learning (ML) is one of the most demanding techniques in the current market. It has the ability to perform intelligent tasks such as classify, recognize, advise, optimize, and predict. RL is one of the ML algorithms, it is a goaloriented learning based on the interaction with the environment. The environment is a set of states or tasks. In RL, the agent takes the actions in order to minimize or maximize the cumulative reward depending on the reward policy [6]. The cumulative reward helps to find the solution. In this paper, Reinforcement Learning based Fault-Tolerant Routing (RL-FTR) algorithm is proposed to address the link as well as router faults present in mesh topology based NoC. The proposed RL-FTR algorithm uses multi-agent reinforcement learning (MARL) algorithm to find the optimal routing path between the source router and destination router. MARL is the area that focuses on the implementation of autonomous, self-learning systems with multiple agents. Conceptually, MARL is a deep learning discipline that focuses on models, which include multiple agents that learn by dynamically interacting with their environment.
The proposed RL-FTR algorithm is tested on software and hardware platforms to observe its functionality and efficiency in-terms of latency and packet delivery. As part of software platform testing, the proposed RL-FTR algorithm functionality is tested by implementing in System-C based cycleaccurate NoC simulator. As a part of hardware testing, the real-time functionality of the proposed RL-FTR algorithm is observed using the FPGA-based NoC prototype. FPGA implementation helps to identify and solve the timing and functional issues and also, reduces the pre-silicon design verification time. Significant contribution of this paper are listed below: 1) RL based fault-tolerant routing algorithm is proposed to tackle the links and routers faults in mesh based NoC. 2) The proposed RL-FTR algorithm is implemented in a System-C based NoC simulator, and a detailed analysis of average network latency and packet loss is reported. 3) Real-time behavior of the proposed RL-FTR algorithm is observed using the case studies by implementing it on FPGA. 4) Resource utilization and power analysis are reported for the proposed RL-FTR algorithm and compared it with the algorithms proposed in [7], [8] and [9].
The structure of the paper is as follows: A brief literature review on related works is reported in Section II. Outline of the mesh topology and reinforcement learning are described in Section III. In Section IV, formulations for the faulttolerant routing algorithm are discussed. Simulation results analysis and FPGA implementation with case studies of the proposed RL-FTR algorithm are discussed in Section V. Section VI concludes the paper.

II. RELATED WORK
Many researchers proposed fault-free and fault-tolerant routing algorithms for mesh topology based NoCs. In [10], authors proposed a fault-tolerant routing algorithm using virtual channels (VCs). The algorithm requires a knowledge base to get fault-related information, which increases processing overhead. The authors in [11], developed a reconfigurable router architecture called DSPIN (Distributed Scalable Predictable Interconnect Network) and a reconfigurable routing algorithm for the developed architecture. The DSPIN router architecture has VCs as well as turn around model, which increases the area overhead. The authors in [12], proposed a MinFT routing algorithm for mesh topology, which is partially fault-tolerant. In [13], the authors proposed Improved-Fault-Tolerant-Algorithm (i-FTR), it uses VCs to pass the faulty region and provides dead-lock free routing. A fault-tolerant NoC architecture is proposed [14] by adding spare links and control units to mesh topology. The proposed architecture has a new routing algorithm that accesses the required information from control units to provide routing paths and works only for faulty routers. The authors in [9], developed a dynamic fault-tolerant XY-YX routing algorithm. In fault-free situations, it serves as a traditional XY routing algorithm, but it switches from XY to YX routing in the event of failure. Further in [15], the authors added extra switches and links to the router architecture. Switches at the failed node transfer the data to spare links. Additional switches and links improve the latency while increasing hardware overhead. The authors in [16], introduced a new concept un-routing in the proposed Dn-FTR routing algorithm. Un-routing allows the packet to traverse back to the previous node whenever the forward-path is not available. The authors in [8], proposed a routing algorithm to address link faults and modified the packet header to include fault related information. The packet header has a 2-bit field to define the fault direction. Based on the fault information, the packet is routed towards the destination.
Q-routing based self-regulated routing algorithm for NoC is proposed in [17]. Based on the congestion, the proposed algorithm dynamically changes the NoC routing scheme to improve the packet latency. In [18], authors proposed RL based routing for adaptive traffic optimization to improve the performance of NoC. In [19], the authors proposed a control policy using RL to enhance the performance of NoC. The proposed policy optimally operates the error detection, error correction and re-transmission of the packet to reduce power consumption and latency. RL based control policy is proposed in [20] to improve the energy efficiency of NoC by observing and optimizing the usage of different components such as cache and buffers. The authors in [21], proposed an Intelligent NoC design framework (IntelliNoC) using RL. It manages the complexity of the design while optimizing energy efficiency, performance, and reliability. In [22], the authors proposed a learning-based NoC design CURE. It has a reversible multi-function adaptive channel, enhanced faulttolerant router circuitry, ten unique operation modes, and a Deep Reinforcement Learning (DRL) based dynamic control policy. DRL-based control policy acquires NoC behavior knowledge and operates it in optimal mode to improve energy efficiency, performance, and reliability. A Q-learning based fault-tolerant routing algorithm is proposed in [7]. The Q-learning has Q-table, which helps to take the routing path related decision. In practical, for the systems with high number of routers, requires more memory to accommodate Q-table and also the learning agent require more iterations to explore the network.
In the literature, most of the works are addressing a limited number of faulty routers and links in NoC. In some of the literature works, RL-based techniques have been used to improve the NoC design, to enhance the energy efficiency, performance, and reliability of the NoC. Previous research works show that RL has the potential to tackle the routing problems in NoC. This motivated us to use RL techniques to propose an efficient fault-tolerant routing algorithm for mesh topology based NoC. The proposed Reinforcement Learning based Fault-Tolerant Routing (RL-FTR) uses decentralized MARL [23] with networked agents. In decentralized MARL, each agent makes its own decision, based on only local observations and information transmitted from its neighbors, and without coordination by a central controller. This helps multiple agents perform sequential decision-making in a common environment and improves the scalability of Q-table. The proposed Reinforcement Learning based Fault-Tolerant Routing (RL-FTR) algorithm addresses link and router faults present in mesh topology based NoC.

III. OVERVIEW OF MESH TOPOLOGY AND REINFORCEMENT LEARNING
A. MESH TOPOLOGY NoC designs have regular and irregular (application-specific) topologies. Mesh is considered as one of the regular topologies in NoC. In general, the size of mesh topology is represented as m × n, where m and n indicate the number of rows and columns respectively. Figure.    three neighbour routers and one core, and center routers are connected with four neighbour routers and one core.

B. REINFORCEMENT LEARNING
RL is a sub-area in ML, where an agent learns to perform a task in an environment by taking sequential actions and observing the rewards received for actions [6]. Depending on the reward policy, every action gets a reward. RL agent learning is an iterative process, which updates the reward of every state-action pair. As a result of the learning process, the cumulative reward provides the optimal solution. The procedure of RL agent and environment interaction is presented in Figure. 2.
In general, RL is considered as a Markov Decision Process (MDP) [24]. MDP is a tuple denoted by (S, A, T, R, γ ), where 'S' is the set of states, 'A' is the set of actions available at each state, 'T' is the state transition function T : S × A × S → [0,1], 'R' is the reward function provides the reward for the actions R : S × A × S→ R, and γ is the discount factor decides the significance of future rewards respectively. In RL, the environment is entirely observable, which means that the agent interacts with the environment at every action. Based on the current state and the action taken, the agent receives a reward as feedback for the action and moves to the next state.

IV. PROPOSED ROUTING ALGORITHM
Routing in mesh topology is modeled as a Markov Decision Process (MDP) [25]. In general, the RL agent learns using updating policies to find the optimal solution. Similarly, the RL agent explores the NoC environment and finds the optimal routing path between the source and destination routers. MDP for the routing in mesh topology is as follows: router, and current router are considered as the initial state, target state, and current state, respectively.
• Action (A): Links connected to the routers in mesh topology are assumed as the actions. Based on the position, each router have a different number of adjacent routers. Correspondingly, every state has a different number of actions.
• Transition (T): Since the routers modeled as states, the movement from one router to its neighbour router is treated as state transition from current state to next state.
• Reward (R): Based on the reward policy, the RL agent receives a positive or negative reward for each action performed at every state. The cumulative reward of stateaction pairs (s i ,a i ) help to obtain the solution.
• Learning rate (α): Learning rate determines the amount of newly learned value which is used to update the old value. It varies between 0 and 1. If the learning rate is 0, the agent will not acquire any new knowledge. If the learning rate is 1, the agent will discard the old learned value and acquires only the new learned value.
• Discount factor (γ ): The discount factor shows the importance of past, present and future rewards received by the agent at the current state. It is a real value ranges from 0 to 1. If the value of discount factor is '0', the agent takes care of the achieved reward. The discount factor value '1' will enable the agent to aim for a high long-term reward. In the deep learning ecosystem, multi-agent reinforcement learning (MARL) is the area that focuses on the implementation of autonomous, self-learning systems with multiple agents [23]. Conceptually, MARL is the deep learning discipline that focuses on models that include multiple agents that learn by dynamically interacting with their environment. While in single-agent reinforcement learning scenarios the state of the environment changes solely as a result of the actions of an agent, in MARL scenarios the environment is subjected to the actions of all agents. Routing on largescale interconnect networks is by nature a MARL problem because routers in the system can be considered as independent agents. In this work, we consider mesh routing problem as a cooperative independent MARL process that routers behave as independent learners for a common objective, that is, to deliver messages in the shortest path. The proposed RL-FTR has three major components: (1) two-level Q-  memory as the Q-table used in Q-routing. New techniques are adopted for RL-FTR to ensure timely update of values in the two level Q-table, hence guarantee a fast and stable model convergence. Figure 3 shows the structure of two level Q-table in mesh of size 4 × 4. The routers in the mesh topology are grouped based on the row. The first level of the table contains the group information. The second level is the Q-values of the ports that are associated with the router.

B. Q-TABLE UPDATE
In RL, many methods are available to obtain the optimal solution. Q-learning is one of the model-free approaches in RL [26]. It is a value-based learning algorithm, which updates values based on actions of learning agent in the environment. The proposed RL-FTR algorithm used Q-learning to obtain the shortest routing path from the source to destination router in mesh topology. Q-learning has a Q-table to hold the Q-value of every state-action pair V(s i ,a i ) in the environment. Q-value update policy equation for the Q-learning is as follows: In the proposed RL-FTR algorithm, the mesh topology is modeled as the environment, routers as states, and the links associated with the routers as the actions. As shown in Figure.   the source router as an initial state (s 0 ). At every state (s i ), the agent performs an action, accordingly receives a reward, and makes a transition to the next state (s i+1 ). Based on the received reward and state transition, Q-value for the current state-action pair V(s i ,a i ) is updated using equation (1). The whole procedure is repeated for a limited number of iterations until the convergence of Q-table. Algorithm 1 describes the pseudo-code for the Q-learning algorithm to find the shortest routing path in mesh topology. Algorithm 1 works for mesh topology based NoC without any faults or with link and router faults because the rewards are given based on the condition of NoC. Reward policy for updating the Q-value is taken from the structure of mesh topology (fault-free and faulty), and every action has a predetermined reward value specified by the programmer. An action leading to the state transition, i.e., from router to router, if both are connected, will receive a reward of 100. If both routers are not connected or faulty, no reward will be given, and if the next state is the destination, it will receive a reward of 1000. Q-values are updated during the learning process based on the interaction of the agent with the environment as well as with the other agents. At the end of the learning process, Q-table has the cumulative Q-values (V(s i ,a i )) for every state-action pair. In the process of finding the optimal routing path, the action with the highest Q-value is picked at the initial state, and based on the action, state transition occurs. This process is repeated at every current state until the packet reaches the destination router.

C. ROUTING WITH Q-TABLE
In the proposed RL-FTR, the router first identifies the group of the destination. Based on the group the destination is present, port with the best Q-value is selected at the source router. Depending on the selected port the data reaches to the next router and the process is continued till the data reaches the destination. During the learning process, all the agents in the RL-FTR explores the complete mesh topology. Figure. 5 shows an example to find the shortest path between the routers R1 and R16 using the proposed RL-FTR algorithm. Here R1 is the source router and R16 is the destination router. Figure. 5a shows the updated table after the learning process. Out of all available ports at the source router R1, East port has the highest accumulated Q-value. So, the east port is selected, accordingly the data transfers to the next router, i.e., router R2. Similarly, the port selected at R2 router is west, which transfers the data to R6. This process is continued until the data reaches destination router R16. Figure. 5b shows the obtained shortest routing path between the routers R1 and R16.
Similar to the example shown in Figure. 5, for all the sources to destinations the RL-FTR algorithm has control over the complete routing path, hence the proposed RL-FTR algorithm is free from dead-lock and live-lock.

V. EXPERIMENTAL RESULTS AND ANALYSIS
In this section, the performance of the proposed RL-FTR algorithm is observed using the NoC simulator. Besides, realtime functionality is observed using FPGA implementation.

A. IMPLEMENTATION ON NoC SIMULATOR
In this section, scalability and efficiency of the proposed RL-FTR algorithm is tested using the System-C based cycle accurate NoC simulator [27]. The simulations are performed on different sizes of mesh topology. The algorithm is tested for various conditions of NoC by increasing the number of faulty links or routers or the combination of faulty links and routers present in the mesh network.

1) EXPERIMENTAL SETUP
The proposed RL-FTR algorithm is implemented in System-C language and integrated with the System-C based cycleaccurate NoC simulator [27]. The routing algorithm takes the connection file as the input during the initial setup process. It updates the Q-tables, which contains information about routing in mesh topology. The fault related information is provided to the routing algorithm through the connection file. Table 1 shows the NoC simulator configuration parameters used for all the experimentations. The simulations are performed on a machine having Intel Xeon E5-1650 v3 processor and 32 GB RAM. 6. presents the effect of link faults on the packets delivery in different sizes of the mesh network. It is observed that as the percentage of faulty links increases in the mesh network, the percentage of packets delivered decreases. The failed links isolate some of the destinations and sources from the mesh network, which resulted in packet loss. The proposed RL-FTR algorithm delivered more packets than the algorithms proposed in [7], [8] and [9]. Figure. 7. presents the effect of link faults on the average network latency. It is observed that the average network latency increased till 25-30 percentage of link faults present in the mesh network because the algorithms delivered packets using fault-tolerant routing paths that are often longer compared to the fault-free routing paths. However, the average network latency for the proposed RL-FTR algorithm is less than the algorithms proposed in [7], [8] and [9]. After 30 percentage of link faults in the network, average network latency is decreased because the algorithms delivered packets only to the nearest destinations. But, it is increased in the proposed RL-FTR algorithm than the algorithms proposed  in [7], [8] and [9]. This is because the proposed RL-FTR algorithm delivered more packets to the longer distance destinations.  routers in mesh topologies of different sizes. Like the faulty links, the packet delivery is decreased with an increase in the percentage of faulty routers. However, the packet delivery is less for faulty routers compared to faulty links. This is because, a single faulty router can block up to four links and also it may disconnect the destination from the mesh network. In this case also, packet delivery is more for the proposed RL-FTR algorithm than the algorithms proposed in [7], [8] and [9]. Figure. 9 presents the effect of faulty routers on the average network latency in different mesh topology sizes. Similar to faulty links case, the average network latency in the proposed fault-tolerant algorithm increased till 20-25 percentage of faulty routers present in the mesh topology. The algorithms deliver packets through fault-tolerant routing paths, which are typically longer than fault-free routing paths, resulting in increased latency. After 25 percentage of faulty routers in the network, average network latency is decreased because the algorithms delivered packets only to the nearest destinations. However, in comparison to the algorithms proposed in [7], [8] and [9] , the average network latency in the proposed RL-FTR algorithm is high. This is due to efficiency of the proposed RL-FTR algorithm to deliver more packets to the longer distance destinations.   The change in percentage of packets delivered and average network latency for the change in percentage of faulty links and routers are presented in Figure. 10 and Figure. 11, respectively.
From Figure. 10, it is depicted that the percentage of packets delivered are less compared to faulty links and faulty routers. Because the mesh topology has more number of faulty components (links and routers). From Figure. 11, it is evident that the average network latency is less compared with the faulty links and the faulty routers because less number of packets delivered to longer distance destinations.
In this section, the performance of the proposed RL-FTR algorithm is evaluated using NoC simulator. The percentage of packets delivered and average network latency are reported for all the experimentations.

B. FPGA IMPLEMENTATION
The proposed RL-FTR algorithm is implemented on FPGA to observe the real-time functioning. For FPGA implementation the router architecture is taken from [27] and routing logic is modified as per the proposed algorithm. Figure. 12 shows the overview of 5-port NoC router architecture. Out of 5 ports one port serves the core and other ports are used to connect with the neighbouring routers. Depending on the position of the router in mesh topology the number of links associated with the router varies.
The router transfers the messages in the form of flits. Flits are flow control units formed by dividing the packet into small parts. Each packet contains header, payload, and tailer flits. Header flit includes the source and destination addresses. Payload and tailer flits contain the actual message. Figure. 13 shows the structure of different flits. Each flit is 32-bit in size, and the type of the flit is defined by end-ofpacket (EOP) and begin-of-packet (EOP) bits.
The structure of the header flit for mesh topology is shown in Figure. 14. It has the source and destination address fields of each 8-bits, followed by the address field, 11-bits are unused (or) reserved for future requirements. Next, 2-bits are used for virtual channel identification (VCID). For EOP and BOP, the next 2-bits are used, and the last bit is unused. We have used 8-bits to address each core. The core address field has two parts of each 4-bits, the first part of the address field holds the column index, and the second part of the address field holds the row index value. For example, 4 × 4 mesh has four rows and four columns. The index of row and column ranges from 0 to 3. So, the address of core 1 is 0000_0000 (0 × 00). Similarly, the address of core 15 is 0011_0010 (0 × 32).

1) EXPERIMENTAL SETUP
The proposed RL-FTR algorithm is developed using the Verilog programming language and integrated in place of the routing logic of the router architecture [27]. The algorithm takes the connection file consisting of mesh topology structure (without any faults or with link and routers faults) as input. It produces a Q-table with the routing path information in mesh topology based NoC. Figure. 15 shows the experimental setup of FPGA.
Mesh based NoC design of size 4 × 4 is implemented using a 5-port router. The router architecture is taken from [27]. The complete NoC design is developed using Verilog HDL programming and implemented on Xilinx Kintex-7 FPGA KC705 [28] Evaluation Kit using the Vivado tool [29]. Kintex-7 FPGA kit has a limited number of physical input/output ports. So, we have used the Vivado IP Virtual Input/Output (VIO) core to drive the inputs and to observe the output signals in real-time [30]. The output of VIO is input to design, and the input of VIO is the output of the design. Accordingly, the input core link of NoC is connected to the output of VIO to inject the packets into NoC. Similarly, the output core link of NoC is connected to the input of the VIO to observe the data received at the core link. A DIP switch present on the FPGA evaluation kit is used to manually shift the routing algorithm from fault-free to fault-tolerant and vice versa, which is shown in Figure. 16. NoC uses XY as the routing algorithm when the switch is in OFF state, and NoC uses the proposed RL-FTR algorithm for the routing when the switch is in ON state.

2) EXPERIMENTAL RESULTS ANALYSIS
The proposed RL-FTR algorithm is tested on FPGA with case studies by considering various conditions of NoC. We have used mesh topology of size 4 × 4 for the case studies, and the results are compared with the algorithms proposed in [7], [8] and [9]. Following are the case studies:   For the case study, routers R1 and R15 are considered as the source and destination routers, respectively. Figure. 17 shows the routing path obtained from source to destination router using the proposed RL-FTR algorithm and the algorithms proposed in [7], [8] and [9]. All algorithms are taking five hops to reach the destination from the source, but the traversal paths are different. . FPGA implementation of algorithms proposed in [8] and [9] for mesh topology of size 4 × 4 without any faults. Figure. 18a shows the post-implementation timing simulation output of the proposed RL-FTR algorithm and [7]. In this, the data is sent from the router R1 to the router R15, as shown in Figure. 17. The router R1 started sending the header flit at time 150.10 ns and router R15 received the header at the time 450.00 ns, i.e., it took 299.90 ns to reach router R15 from router R1. Figure. 18b shows the run-time inputs/outputs of FPGA implemented NoC design in VIO for the case taken in Figure. 17. In Figure. 18b, the input data to source router R1 is the output of VIO and the data received at the destination router R15 is observed as the input to the VIO. Similarly, Figure. 19a shows the post-implementation timing simulation output and Figure. 19b shows the run time inputs/outputs of FPGA implemented NoC design in VIO for the algorithms proposed in [8] and [9]. From Figure.  In this case study, a link fault is injected in the routing path between source and destination. Here, we have considered the source-destination pair same as the fault-free case, to compare the obtained result with the fault-free case. Figure. 20 shows the routing path obtained using the proposed RL-FTR algorithm and the algorithms proposed in [7], [8] and [9]. The routing path generated by the algorithms [8] and [9] is same. The proposed RL-FTR algorithm and [7] requires five hops to reach the destination, whereas the algorithms proposed in [8] and [9] requires seven hops to reach the destination. Compared with the algorithm in [8] and [9], the proposed RL-FTR algorithm requires two hops less, which is same as the fault-free case. Figure. 21a shows the postimplementation simulation result of the proposed RL-FTR algorithm. From Figure. 21a, it is observed that the time taken to reach R15 from R1 is equal to the time taken in fault-free case i.e., 299.90 ns. Figure. 21b and Figure. 22b show the real time inputs/outputs of FPGA implemented NoC design in VIO. In Figure. 22a, post-implementation simulation result    [8] and [9] for mesh topology of size 4 × 4 in the presence of link faults.
shows the time taken to reach R15 from R1 using the algorithms proposed in [8] and [9] is 399.90 ns, which is 100ns more than that of the proposed RL-FTR algorithm and [7].
From Figure. 21a and 22a, it is evident that in the link fault case the proposed RL-FTR algorithm is providing the shortest routing path compared to the routing path provided by the algorithms proposed in [8] and [9].

c: CASE3: ROUTER FAULT
Similar to the link fault case in this case study, a router fault is injected in the routing path between source and destination. As considered in previous cases, the same source and destination pair is used for comparative purposes. The obtained routing path is shown in Figure. 23. The routing path obtained is same for the algorithms [8] and [9]. The proposed RL-FTR algorithm requires five hops to reach the router R15 from the router R1 in router fault condition in mesh topology, which is equal to fault-free and two hops less when compared with the routing path obtained from the algorithms proposed in [8] and [9]. Figure. 24b and Figure. 25b show the input/output of the FPGA implemented NoC design in VIO.
From Figure. 24a and 25a, it is observed that the proposed RL-FTR algorithm is providing the shortest routing path in case of router faults.

d: CASE4: COMBINATION OF LINK AND ROUTER FAULTS
In this case, the routing path between the source and destination routers is injected with the arbitrary router and link faults. For comparison purpose we have used the same source and destination pair, as considered in the previous cases.  25. FPGA implementation of algorithms proposed in [8] and [9] for mesh topology of size 4 × 4 in the presence of router faults.   Figure. 26 depicts the obtained optimal routing path between the routers R1 and R15. In this case, the routing paths obtained from all the algorithms are different. But, the proposed RL-FTR algorithm requires 5 hops to reach the destination, which is equal to the fault-free routing path and two hops less than the path obtained using the algorithms proposed in [8] and [9]. Figure. 27b and Figure. 28b show the input/output of the FPGA implemented NoC design in VIO. From Figure. 27 and Figure. 28, it is evident that the proposed RL-FTR algorithm is providing the shortest routing path in presence of both link and routers faults. Table 2 shows the summary report of all case studies.

3) HARDWARE RESOURCE UTILIZATION
Any design implemented on FPGA consumes the available resources on FPGA. Table 3 shows the detailed hardware   [8] and [9] for mesh topology of size 4 × 4 in the presence of link and router faults. resource utilization report for mesh topology of size 4×4 with the proposed RL-FTR algorithm, the algorithms proposed in [7], [8] and [9]. The reports are generated from Vivado Tool.
From Table 3, it is observed that except global clock buffer (BUFG) utilization the proposed algorithm is utilizing less hardware resources than the algorithm proposed in [7] and more hardware resources than the algorithms proposed in [8] and [9]. Compared to the algorithms proposed in [8] and [9], RL-FTR algorithm has overall hardware utilization overhead is 2.27% and 8.02%, respectively. The proposed RL-FTR algorithm generates a Q-table with the information related to routing. Placing and accessing the Q-table inside the router requires more hardware resources. So, the proposed RL-FTR algorithm is using more hardware resources compared to the algorithm in [8] and [9]. But, the proposed RL-FTR is taking 2.73% less resources than the algorithm proposed in [7]. This is because in the proposed RL-FTR the every router has a individual Q-table which require less resources, whereas in [7] all the routers have a common Q-table which holds large amount of data and requires high volume of resources.

4) POWER ANALYSIS
A detailed power analysis for mesh topology of size 4×4 with the proposed RL-FTR algorithm, the algorithms proposed in [7], [8] and [9] is reported in Table 4.
The power analysis report is generated using Vivado tool. According to Table 3, the proposed RL-FTR algorithm is using more hardware resources than the algorithms proposed in [8] and [9], relatively the power consumption is also high for the proposed RL-FTR algorithm. Similarly, compared to the algorithm proposed in [7], RL-FTR consumes less power. Based on the power analysis reported in Table 4, it is evident that the proposed RL-FTR algorithm requires only 0.006 Watts and 0.012 Watts more power than the algorithms proposed in [8] and [9], and 0.007Watts less power than the algorithm proposed in [7]. Compared to the conventional algorithms proposed in [8] and [9], the proposed RL-FTR algorithm has very little overhead of resource utilization and power consumption. Compared to the RL based algorithm the proposed RL-FTR has less overhead of resource utilization and power consumption. However, the proposed RL-FTR algorithm is more efficient than all the compared algorithms in packet delivery and average packet latency.
In this section, functioning of the proposed RL-FTR algorithm on FPGA is observed with the case studies. The resource utilization and power analysis are reported for the FPGA implementation.

C. DISCUSSION
The proposed RL-FTR algorithm delivered an average of 4.1%, 13.1% and 21% more packets in link faults case, 4.6%, 12.5% and 17.3% more packets in router faults case, and 7.1%, 15% and 20.1% more packets in link and router faults case than the algorithms proposed in [7], [8] and [9], respectively. As the number of faults increases, the routing paths to a few nodes are blocked by faults, which results in packet loss by the proposed RL-FTR algorithm.
Overall, in both FPGA implementation and simulator, the proposed RL-FTR algorithm always provides an optimal routing path between the source and destination routers in the presence of faults. It also shows a significant improvement VOLUME 10, 2022 over the algorithms proposed in [7], [8] and [9] against all the performance parameters.

VI. CONCLUSION
In this paper, a reinforcement learning based fault-tolerant routing algorithm is proposed and implemented on System-C based NoC simulator and FPGA. RL-FTR uses MARL with two-level Q-table. The two-level Q-table not only provides more learning information, but also mitigates outdated Q-value issue commonly experienced in large-scale systems. The experimental results show that the proposed algorithm acted well in both faulty links and routers present in the mesh topology. Performance parameters such as average network latency, packet delivery, power and hardware resource utilization are reported in this paper. In comparison with the conventional routing algorithms proposed in [8] and [9], the proposed FTR algorithm has an overall improvement of 15% and 20.1% in packets delivery, with a hardware utilization overhead of 2.27% and 8.02%, and the power consumption increased by 0.78% and 1.7%, respectively. In comparison with the RL based routing algorithm proposed in [7], the proposed FTR algorithm has an overall improvement of 7.1% in packets delivery, with 2.7% less hardware utilization, and 0.89% less power consumption. In future work, the algorithm will be extended to other topologies and implemented the same on FPGA.