From MPC-Based to End-to-End (E2E) Learning-Based Control Policy for Grid-Tied 3L-NPC Transformerless Inverter

This paper proposes an end-to-end (E2E) learning-based control policy to directly control a transformerless grid-tied three-level neutral-point-clamped (3L-NPC) inverter powered by a photovoltaic (PV) array. This E2E control policy is represented by an artificial neural network (ANN) and a time-delay neural network (TDNN), namely, ANN- and TDNN-based control policies, to properly estimate the optimal switching vector of the 3L-NPC. With such learning-based control policy, there exists no need for deriving or understanding deeply the complex mathematical model of the 3L-NPC, as the dynamics of both the system and the control scheme, as well as the cost function to be minimized, are learned via an end-to-end learning fashion that maps directly from the raw observations to the optimal switching states. This definitely eliminates the major barriers of the model-based control strategies (i.e., model predictive control (MPC)) such as (i) the need for an accurate system model, and (ii) the exponential increase in computational complexity. In order to train the two control policies, the conventional MPC is employed, as an expert, for acquiring a set of training data (i.e., input-output pairs) and, thereafter, for assessing our proposed control schemes. The proposed E2E control strategies are validated using MATLAB/Simulink software, where the impact of having different input features and training data are studied. With the proposed control policies, especially the TDNN-based control policy that has only one time-delay window, a high-quality sinusoidal grid current is achieved with low total harmonic distortion (THD), resulting in enhancing the power quality of the utility grid. In addition, the leakage current is minimized compared to the conventional MPC by more than 25%. However, the same dynamic behavior is almost obtained during the irradiation changes compared to the MPC strategy. Moreover, the experimental verification of the proposed E2E control strategy is implemented on the basis of the Hardware-in-the-Loop (HIL) real-time simulator using the C2000TM-microcontroller-LaunchPadXL TMS320F28379D kit, demonstrating the applicability and good performance of our proposed control strategy under realistic conditions.


I. INTRODUCTION
R ECENTLY, transformerless grid-tied photovoltaic (PV) systems have attracted enormous attention, and have been adopted in wide applications all over the world [1], [2]. These systems increase the utilization of renewable energy in the generation of electrical energy. Therefore, this will reduce the dependence on fossil fuels and the global warming problems. Nevertheless, the transformerless topology has a serious problem, socalled the earth leakage current [3]. It is considered an unfavorable phenomenon as it reduces efficiency, pulls down safety, increases losses, and increases grid distortion. It is thought that the variations in the inverter common-mode voltage are the source of the earth leakage current in a certain system [4]. In general, there are two ways to reduce the earth leakage issue, which are the inverter topology and the inverter modulation (i.e., control phase) [5]. Much research has been conducted to improve the earth leakage current problems in 3 − Φ systems [6]- [10].
Recently, multilevel transformerless inverters have been introduced. The most well-known topology of the multilevel inverters is the neutral-point-clamped (NPC) topology [11], [12]. It has the merit of low total harmonic distortion (THD) of the output current and voltage. Despite a large number of components, the NPC components have low ratings. Two PWM modulation strategies have been introduced in [13], to reduce the earth leakage current in the three-level NPC inverter (3L-NPC). Although these strategies give better performance, the THD of the output current is increased. Many control techniques are used, in this regard, such as fuzzy and model predictive control (MPC) [14]. In some cases, the controller acts as a new modulation technique. Hence, it represents a new attempt to reduce the leakage problem.
Model predictive control has been initialized since the 1970s in chemical processes, petrochemicals, and oil refineries [15], with the aim of acquiring a low sampling time resolution for those applications [16]. Later on, it became widely deployed in other fields such as power systems [17], [18], power electronics [19]- [21], and electrical drive systems [22] as a result of innovations in the digital platform's technology, which entails extensive calculations at high-resolution sampling times in the microsecond scale [19]. As the name implies, the key role of MPC is to explicitly use the mathematical model of the system to predict the future behavior of the controlled variables within a certain time horizon. Compared with existing control approaches, it possesses advantages such as: (i) simple conception and high efficiency, (ii) fast dynamic response during the system transient, (iii) small steady-state error [23], and (iv) easy inclusion of system constraints and non-linearity, and multiple controlled variables within the control law. However, in practice, big challenges remain concerning the real-time implementation of MPC due to its significant computational burden, especially when long prediction horizons are deployed [24], [25].
Generally speaking, in the field of power electronics, MPC can be classified into two types, namely, the continuous control set-MPC (CCS-MPC) and the finitecontrol set-MPC (FCS-MPC) [26]. The latter one is the most preferable and widely utilized due to its design simplicity as the switching state is applied directly to the converter switches, rather than through a modulation stage as in the CCS-MPC. The optimal control action (i.e., the switching state) of the FCS-MPC is selected based on solving an online optimization problem over a finite time horizon. The main objective of this optimization problem is to determine the minimal value that guarantees the controlled variables are extremely close to their setpoints while being within the permitted limits. When there are numerous variables to be controlled within the cost function, weighting factors are used in the control law for each term to penalize its importance in the control action, and their values are derived using a heuristic technique [19]. Additionally, the quality of the FCS-MPC depends on selecting the suitable control variables and weighting factors inside the control law [27].
In the literature, FCS-MPC is employed to control the three-phase 3L-NPC to interface the PV systems with the utility grid, as proposed in [2]. The simulation results have revealed that FCS-MPC has a small leakage current, lower THD, and higher efficiency compared to the conventional proportional-integral (PI) controllerbased sinusoidal pulse width modulation (SPWM). The FCS-MPC is also used to decrease the leakage current in the PV system without addressing the complexity of the grid-side filter, even if the THD of the current injected signal to the associated utility grid is required to be reduced [14]. Moreover, in [28], Lyapunov function is used with the FCS-MPC in the grid-tied 3L-NPC, aiming to reduce the computational burden by excluding the switching vector that will not ensure the system stability. In [29], the FCS-MPC is designed to overcome the variable switching frequency issue for 3L-NPC with the standalone RL load, without considering the redundant vectors in the switching patterns. Furthermore, fixed switching frequency for 3L-NPC based on the modulated MPC (M 2 PC) to control the grid current and achieve a THD of 2% in the injected grid current has been discussed in [30]. Despite all the foregoing, the FCS-MPC algorithm of 3L-NPC still has a significant computational load. As a result of reducing the computational burdens, fast dynamic performance can be achieved and the steady-state features can be improved. Moreover, the mathematical model has always been difficult, particularly when the parasitic elements of the component are taken into consideration during the design phase. Thus, it would be preferable if the 3L-NPC is considered a black box, without using sophisticated mathematical models.
Lately, the research community has started to integrate artificial neural networks (ANNs) into model predictive control, aiming to exploit the outstanding benefits of ANNs such as: (i) lower computational burden due to the capability of performing massive com- The proposed PV powered 3L-NPC transformerless inverter connected to the grid and its controllers.
putations in parallel; thus, they are very suitable to be implemented in a DSP or dSPACE controller [31], [32], (ii) the ability to learn complex systems, and (iii) there is no need for the mathematical model of the system to be controlled. Furthermore, the ANN-based control technique is generally an end-to-end (E2E) learningbased strategy that seeks a direct mapping from the raw input data to the desired outputs. However, no analysis can be done between the input data and its outputs since the entire system is considered as a black box. Another concern is that there is no specific rule for determining the structure of the ANN and its hidden layers, as well as its input features and training data. Consequently, the appropriate network structure and its setups are achieved through experience and trial-and-error.
In [31], the ANN-based FCS-MPC is engaged to control the three-phase output voltage of the conventional two-level voltage source inverter (2L-VSI), where the implemented ANN network was a simple pattern recognition network. In this work, the proposed ANNbased controller shows better performance and lower THD of the output voltage compared to that of the conventional FCS-MPC scheme, considering different linear and nonlinear loading conditions. Soon afterward, the ANN learning technique was investigated with more complex systems such as modular multilevel converters (MMC) [33], [34], flying capacitor multi-level inverters (FCMLIs) [32], [35], DC microgrid applications [36], and drive systems [37], [38]. Furthermore, several popular neural network architectures, such as time-delay neural networks (TDNNs) and recurrent neural networks (RNNs), have been widely applied to various applications due to their capabilities to effectively learn the temporal dynamics of the signal even from short-term feature representations [39], [40].
The core aim of this article is to propose two classes of E2E learning-based control policies for a three-phase 3L-NPC grid-tied inverter powered by a PV system. These two policies are represented by ANNs and TDNNs. With such learning-based control policies, lower computations can be achieved, and the leakage current can be minimized. Consequently, the switching pattern of the converter switches will be properly selected and, in turn, the grid current THD could be reduced. The training dataset can be collected using any of the control strategies in the literature. However, we have selected the conventional FCS-MPC as our expert system because it is an extension work for the same authors in [2]. The testing phase is carried out online to assess the performance of the proposed control strategies after fine-tuning the proposed neural network, taking into account different operating conditions and uncertainties in the system parameters. The major contributions of the paper can be summarized as follows: • To the best of our knowledge, this is the first attempt to directly control a 3L-NPC not only using an ANN-based control policy but also using a TDNN-based control policy, taking into account the impact of the neural network's topology structure and the choice of the input features and training dataset on the quality of the E2E control policy as they play a critical role in assessing the capability of learning the mathematical model of the system and its dynamics. • There is no need for deriving the mathematical model of the system to be controlled as the dynamics of the system and the controller, as well as the cost function, are learned via end-to-end learning fashion; this definitely eliminates the major barriers of the conventional FCS-MPC, which are (i) the need for accurate system models, and (ii) the exponential increase in computational complexity. • The proposed E2E control policy improves the system performance during transient operation and achieves a small steady-state error in the injected current into the grid. Moreover, it minimizes the leakage current in the 3L-NPC and reduces the THD of the grid current, which firmly enhances the power quality. • The two proposed E2E learning-based control policies have been compared with the well-known FCS-MPC scheme presented in [2]. • Finally, we provide the dataset and simulation files as an open-source to enrich the research community and support the interested researchers, offering more concentration on proposing new E2E control VOLUME 4, 2016 policies instead of designing other control strategies for collecting the training data 1 . The rest of the paper is organized as follows. The system structure is outlined in Section II, whilst Section III describes the control schemes of the boost converter and NPC, namely, PV MPPT, DC-link voltage, and FCS-MPC control schemes. The proposed E2E control schemes are discussed in detail in Section IV. A thorough discussion of the simulation results is presented in Section V. Finally, conclusions and future work are given in Section VII.

II. SYSTEM ARCHITECTURE
The proposed system is a grid-tied 3L-NPC transformerless inverter, as shown in Fig. 1. The system energy source is a PV panel that converts the solar insolation energy into electrical. To stabilize the PV operation and provide a better performance, a capacitor must be attached to the PV output. Therefore, the output of the PV is connected to DC/DC converter. It represents a way to implement the maximum power point tracking (MPPT) condition for the PV panel. The type of DC/DC converter is a boost converter. Consequently, it supports improving the utilization of the system. On the other hand, the converter helps in matching and controlling the PV voltage level to the DC bus voltage. The DC/DC converter output provides the system DC bus, where the voltage of the DC bus must be regulated to achieve stable operation of the inverter and low THD values. Then, a 3L-NPC transformerless inverter with an LC filter is inserted between the DC bus and the utility grid. The function of the filter is to inhibit the grid current dynamics and prevent high-frequency oscillations [2]. In the mathematical model of the system, it is assumed with ideal switching devices and neglecting a snubber circuit. However, the internal impedance of the utility grid is considered. All the neglected issues in the mathematical model are taken into consideration in the simulations. The model of the system components will be discussed in the following three subsections.

A. PHOTOVOLTAIC ARRAY MODEL
The overall model of the photovoltaic (PV) array is presented in Fig. 2. The current source I sc represents the short circuit current of the array. The value of the current source is directly proportional to the solar insolation on the array. However, the resistances R s and R p represent the series and parallel losses of the array, respectively. Figure 3 shows the power circuit of the boost converter, whose input is the PV panel output, while the output  Assuming that the DC-link capacitor is large enough, the dynamic model of the converter is specified by the following equations [41]:

B. DC/DC CONVERTER MODEL
where u represents the switch Q b action with PWM taking values from the set of {0 : 1}, L b is the boost converter input inductance, r b denotes the equivalent series resistance of L b , V pv and I pv are the PV voltage and current, respectively, C dc is the output capacitance, and V dc is the DC-link voltage. Figure 4 shows the power circuit of the 3L-NPC inverter. The circuit has twelve power electronic switches and six power diodes. It also contains two splitting capacitors at the DC bus terminals. Therefore, it has three switching states as [−1, 0, 1] per phase. Thus, it has a total of 3 3 = 27 switching states for the whole three phases, as listed in Table 1. If the voltage across each capacitor is assumed to be half of the DC-link voltage, the generated output voltage has the following values

C. 3L-NPC TRANSFORMERLESS INVERTER MODEL
The analysis of the 3L-NPC inverter could be simplified using space vectors. Hence, threephase variables of the circuit are represented as space vectors using: The power circuit of the boost converter. This article has been accepted for publication in a future issue of this journal, but has not been fully edited. Content may change prior to final publication.
where w is the space vector, (w a , w b , and w c ) are the three-phase variables, and a = e j(2π/3) . If the inverter states are expressed as space vectors using (3), we get 19 voltage vectors, i.e., {V 0 , V 1 , . . . , V 18 }. However, in order to reduce the earth leakage current, the voltage vectors that give low common-mode voltage (CMV) must be used [2], [42]. Functionally, there are seven states that have zero CMV, namely, An LC filter is attached between the transformerless 3L-NPC inverter and the grid. The utility grid is assumed to have a constant frequency and voltage amplitude with an internal inductance L g .

III. SYSTEM CONTROL
The proposed system includes three controllers, namely, the MPPT controller, the DC-link voltage controller, and the grid current controller. Firstly, the MPPT controller is used to regulate the operating point of the PV panel to track the maximum power point (MPP) conditions. Secondly, the DC-link voltage controller is used to control and maintain the capacitor voltage (i.e., V dc ) at a certain reference. Finally, the grid current controller is utilized to control the 3L-NPC transformerless inverter for producing a high-quality grid current. Despite the fact that the three control schemes are outside the scope of this study, they are briefly studied in the following VOLUME 4, 2016 subsections, providing a detailed review of our proposed system.

A. PV MPPT CONTROLLER
The key objective of this controller is to manage the PV panel to get better exploitation. Therefore, it forces the system to absorb maximum power from the PV panel.
Hence, the MPPT controller is a basic component of the PV systems. In general, the inputs to this controller are the measured PV current and voltage (i.e., I pv and V pv ), while its output is the duty cycle ratio of the boost converter D cycle , as illustrated in Fig. 5(a). In the literature, there exist many approaches that can be used in order to implement the MPPT controller. Herein, a simple approach called incremental conductance is utilized. The algorithm of this technique is to calculate and observe the slope of the power-voltage curve of the PV until reaching zero [43]. The incremental conductance algorithm generates the setpoint current of the boost converter. On the other hand, the input current of the boost converter is controlled to track that set value. The setpoint of the current I * pv is compared to the actual PV current I pv . Then, the produced error signal is sent to an ON/OFF controller, namely, hysteresis controller. In turn, the ON/OFF controller generates the necessary boost converter duty cycle D cycle .

B. DC-LINK VOLTAGE CONTROLLER
The key goal of this controller is to regulate the voltage level of the DC-link at a predefined reference voltage (i.e., V * dc ). It acts a significant role in the stability of the power system. In addition, it regulates the power flow from the PV to the grid. Its input is the error in the DClink voltage, i.e., e dc = V * dc − V dc . However, the output is the grid current setpoint of the transformerless inverter I * g . This controller should give a slightly slow response compared to the transformerless inverter controller. For this reason, a simple PI controller is sufficient for this job. The parameters of that controller are adapted using the Ziegler-Nichols tuning algorithm. The DC-link voltage controller is depicted in Fig. 5(b).

C. FCS-MPC FOR 3L-NPC INVERTER
Within this work, the key role of the FCS-MPC scheme is, first, to acquire the training datasets (i.e., inputoutput pairs) needed for training our proposed networks; thereafter, it has been utilized as a baseline for assessing the robustness of our proposed control schemes. The initial phase of the FCS-MPC scheme is to derive the discrete model of the controlled variables. The basic equation for deriving the prediction model relies on the differential equation for (i) the voltage across the filter inductor L f and grid inductor L g , and (ii) the current passing in the filter capacitor C f .Therefore, the continuous state-space model of the 3L-NPC can be expressed, in the matrix form, as: where V i is the voltage across the inverter terminals before LC filter, I f is the inverter current passing in L f , V c is the voltage across C f , I g is the grid current flowing in L g , and V g is the utility grid voltage. Then, the discrete state-space model can be obtained by (6), via exploiting the forward Euler discretization method as in (5) where k is the number of sampling interval and T s is the sampling time. where Algorithm 1: Pseudocode of FCS-MPC for the grid-tied 3L-NPC with LC filter.
The key objective of the predictive control scheme of the 3L-NPC inverter is to keep the grid current I g as close to its reference I * g as possible. In turn, the controlled variables for the 3L-NPC comprise only the grid current in the stationary frames αβ.The complete pseudocode for the FCS-MPC algorithm for the studied converter is given in Algorithm 1. The FCS-MPC instructions are executed in a sequential form to find the optimal switching state. The algorithm starts with measuring the grid voltage V g (k) and current I g (k) besides the filter current I f (k) and voltage V c (k) at the instant k, all expressed in αβ coordinates (line 1). Then, the cost function set initially to infinity (line 2). The algorithm sequentially applies only the seven possible voltage vectors of the 3L-NPC inverter that have zero common-mode voltage (CMV), as previously discussed Section II-C) (line 4). For each voltage vector V i , the grid current I g (k + 1) is estimated at instant k + 1 (line 5). The cost function given in (8)) is then evaluated at k + 1. It represents the core of the FCS-MPC optimization process, which is defined, in this work, as the squared error between the predicted grid current and the reference current (line 6). Then, the optimum voltage vector that minimizes the cost function is selected, and the corresponding switching state is applied at the next sampling instant k + 1 (lines 7 to 11). If the time interval is high, the algorithm should wait until the next sample interval occurs before the switching states are changed.

IV. PROPOSED END-TO-END (E2E) LEARNING-BASED CONTROL SCHEMES
This section provides a brief overview of the end-toend (E2E) learning-based control policy. Thereafter, it explains in detail the two proposed E2E control policies, namely, ANN-based and TDNN-based control policies, for the 3L-NPC grid-tied inverter system, as well as the training phase of the proposed neural networks. One of the prime objectives of our work is to investigate the influence of the neural network's topology structure as well as the choice of the input features and training dataset on the performance of the E2E control policy.

A. REVIEW OF E2E CONTROL POLICY
An end-to-end (E2E) learning-based control policy is a particular machine learning model that seeks a direct mapping from the raw input data to the desired outputs, eliminating the need for the employed cost function and the mathematical model of the system to be controlled. As a consequence, such E2E control policies require less engineering effort compared to model-based control approaches, and fewer user interventions to produce the desired outputs without the necessary knowledge of the controlled system; however, they require only expert demonstrations during the training and testing phases. Moreover, their performance is generally better when they are properly designed. Finally, they provide a simpler and lightweight system with high generalization abilities. Just to name a few, they have achieved excellent results for autonomous navigation [44] and selfdriving [45] by coupling between perception and control, which means that the control actions (e.g., steering angle and throttle commands) can be directly predicted from a single image. In such example, a deep neural network control policy is trained to map directly from pixels to control actions. Inspired by this concept and our previous work [2], [31], we propose two classes of E2E control policies for a three-phase 3L-NPC grid-tied inverter powered by a PV system. These two policies are represented by artificial neural networks and time-delay neural networks (TDNNs). Herein, the key goal of each E2E control policy is twofold: 1) replacing the conventional FCS-MPC controller with the aim of eliminating its major barriers, which are: the need for accurate system models as well as the exponential increase in computational complexity; to this end, using this control strategy, the cost and dynamics of the controller will be learned via end-to-end learning fashion, and 2) predicting the optimal switching signals of the inverter in order to enhance system performance during transient and steady-state operations; to do so, the E2E learning-based control policy is trained offline to map directly from raw observations to switching signals based on a dataset collected directly by the classical FCS-MPC scheme.
Since the power loss of the entire conversion system is targeted to be decreased, the main objective of the proposed control schemes, for the grid-connected 3L-NPC inverters, is to generate a high-quality sinusoidal grid current with very low distortions (i.e., THD) and, further, decrease the transformerless system's leakage current.

B. ANN-BASED CONTROL POLICY
Artificial neural networks (ANNs), one of the machinelearning techniques whose architecture has been originally inspired by the operation of biological neural networks, are mathematical models that use learning algorithms to make intelligent decisions based on historical data, such as input-to-output mapping. Similar to the human brain, ANNs are composed of a set of processing units, so-called neurons, which have the capability of performing massively parallel computations; neurons are arranged into two or more layers and linked via weighted connections [46]. Nowadays, they are more and more widely used in power electronics applications, with profound and successful contributions to the identification and control of dynamic systems [47]- [48]. The most commonly used neural network for both modeling highly nonlinear systems and implementing such nonlinear controllers is multi-layer perception (MLP) neural network (i.e., feed-forward neural network) due to its universal approximation capabilities [49], [50]. An MLP generally consists of more than three layers, where each layer contains one or more neurons, as illustrated in Fig. 6(a). More precisely, it consists of: (i) input layer which is composed of a set of input features to be fed VOLUME 4, 2016 as input into the network, (ii) one or more intermediate layers of neurons, so-called hidden layers, that perform nonlinear transformations and compute sophisticated associations between input features, and (iii) output layer which consists of a set of output neurons (should be exactly equivalent to the number of outputs we want), representing the response of the network to the input features.
The classical two-layer ANN, given in Fig. 6(a), can be represented in its mathematical formulation as follows: where f is the neuron's activation function (usually it is a non-linear function such as logistic sigmoid or hyperbolic tangent, to ensure the universal approximation property [50]), On the other hand, the mathematical formulation of the n th output can be described as:

C. TDNN-BASED CONTROL POLICY
Although the structure of the feed-forward network depicted in Fig. 6(a) does not have any internal loops and the input-output mapping is determined in the absence of memory characteristics (i.e., static mapping), the findings of the small-scale (i.e., two-layer) ANNs are (i) promising in learning the dynamics of the system as well as representing the optimal control law, and (ii) less computationally expensive compared to other modelbased control strategies such as MPC [32]. However, in many applications, the need arises for more flexiblestructured networks (namely, dynamic neural networks) such as recurrent neural networks (RNNs) and timedelay neural networks (TDNNs) which have the capa-bility of emulating a nonlinear dynamic system with temporal behavior characteristics [51], [52].
In this work, we mainly focus on TDNN that is constructed by embedding a local memory (namely, tapped delay line memory) in both the input and hidden layers of the classical ANN described in Fig. 6(a). In this case, each element in the input vector X is fed to the multi-input static ANN via a tapped delay line that is used to store the dynamics of the system to be controlled, and, afterward, generates a new sequence of input features with unit delay z −1 which provides the dynamic ability of the model structure (see Fig. 6(b)). Consequently, the actual input vector X to the TDNN, at time t, can be expressed as + 1)) elements, where d refers to the time-delay window for each element at instant t. It is noteworthy that the higher the timedelay window d is, the better the performance is as the network holds information from the past. On the other side, it significantly increases the complexity of the model and its computational time, comparing to the classical ANN. To tackle these problems, a two-layer TDNN-based control policy with a maximum of two time-delay windows (i.e., d = {1, 2}) is proposed, in this paper, to ensure the real-time response of the controller. Similar to (10), such kinds of neural networks can be mathematically described at the instant t, as follows: where j = {1, . . . , J} and n = {1, . . . , N }.
More information about the input features selection and training methodology of the ANN-and TDNNbased control policies are presented in the following section. We have considered two different scenarios for the input features selection, while two strategies have been investigated for acquiring the training data.

D. INPUT FEATURES SELECTION AND TRAINING PROCESS
We have seen in the previous subsections that the choice of the neural network structure can effectively improve the performance of the E2E control policy. On the other hand, choosing the input features and training dataset play a critical role in enhancing the quality of the E2E control scheme and in assessing the capability of learning the mathematical model of the system and its dynamics; as a result, they should be carefully chosen [35]. To this end, we have considered two different scenarios for the input features selection and training data of the two proposed control schemes. . . . . . . . . .
(c) Block diagram of our proposed E2E learning-based control policies for the gridconnected 3L-NPC inverter. Each control policy is trained to map directly from the measured variables, namely, I f , Ig, I * g , and Vc, to the optimum voltage vector Vi. Subsequently, the corresponding switching states Si are directly fed to the power switches of the converter. For TDNN-based control scheme, the input features are assisted with d tapped delay lines. In the first scenario, the filter current I f , grid current I g , grid reference current I * g , and filter voltage V c are considered as the input features of the ANNbased control policy (namely, X = [I f , I g , I * g , V c ] T as depicted in Fig. 6(c)), where all features are expressed in αβ vectorial frame which means that the actual input vector X to the network consists of 8 features; i.e., M = 8 and X = [I f α , I f β , . . . , V cα , V cβ ] T ∈ R 8 . While in the second scenario, three input features have been exclusively considered, which are: the grid current I g , grid reference current I * g , and grid voltage V g (namely, X = [I g , I * g , V g ] T ), resulting in an actual input vector of 6 features, i.e., M = 6. Moreover, for the TDNN-based control policy, the input features are assisted with d tapped delay lines in both scenarios, resulting in an input feature vector X with (M × (d + 1)) elements.
In this study, we considered a two-layer TDNNbased control policy with a maximum of two timedelay windows (i.e., d = {1, 2}), ensuring a real-time performance of the controller. Generally speaking, it should be pointed out that expressing the input features in the αβ reference frame provides better learning for the E2E control policy compared to that of dq the rotating frame, as the dq signals are composed of constant values.
Since the machine learning task that has been considered in this work is a supervised learning (SL) task VOLUME 4, 2016 (i.e., classification task), the output of the proposed network should be defined. In our case, for both the ANN-and TDNN-based control schemes, the output vector Y is composed of seven elements (or, classes) that represent the seven possible voltage vectors of the converter (i.e., V i ∀i = {1, . . . , 7}). It is noteworthy that those seven voltage vectors are the vectors that have zero common-mode voltage (CMV), namely, 16 , V 18 }, as previously discussed in Section II-C. The output classes are encoded using the popular one-hot encoding representation, which means that, at each sampling instant, only the optimal voltage vector among the seven possible vectors is indexed to one, while the other vectors are indexed to zero. This module can be implemented by employing the fully connected layer with seven potential classes, where, in our case, the activation function of the output layer is a sigmoid function, i.e., f (x) = 1 1+e −x . Consequently, the optimal switching pattern S opt , which corresponds to the optimal voltage vector, will be automatically applied to the 3L-NPC at the next sample instant k + 1, as illustrated in Fig. 6(c).
In order to train our proposed networks, a set of the training dataset (i.e., input-output pairs) is needed, which can be easily collected via the classical FCS-MPC scheme. In this work, two main strategies have been explored for acquiring the training dataset. In the first strategy, the dataset is collected using FCS-MPC, considering the following: (i) the system parameters that are given in Table 4, with T s = 60 µs instead of T s = 30 µs, and (ii) we run the simulation only once for 6 s, taking into account step variations in the solar irradiation, as shown in Fig. 8, that starts with a 100% irradiation and ends by 10%. The acquired data is utilized for training and assessing the first scenario of the proposed control schemes, where a higher number of input features is used. In the second strategy that is used for training the proposed control policies in the second scenario which has lower input features, different uncertainties are considered in the system parameters as shown in Table 2. It can be seen that only seven samples are used to obtain the dataset, each sample is simulated for 1 s. As a consequence, the total dataset consists of 276,911 instances, whereas it is composed of 100,001 instances in the first scenario. To sum up, the first scenario represents the case where a higher number of input features is employed, while a lower number of data instances is generated; whilst the second scenario represents the contrary case of the former scenario. These collected dataset has been divided into 70% for the training phase, 15% for the validation phase, and 15% for the testing phase.
Concerning the training phase, the learning process is done based on an optimization strategy (i.e., optimizer), where the three popular training algorithms, namely, trainlm, trainbr, and trainscg, could be used for learning the best mapping from the raw input data to the desired outputs. The primary distinction between them is the learning speed, complexity, needed memory, and whether the input features need to be normalized or not. In this work, we chose the trainscg method as it provides the highest accuracy within a reasonable training time, compared to the other two algorithms that have longer training time and lower accuracy. Moreover, adjusting properly the hyperparameters of the neural networks, such as the number of hidden layers and data division, plays a critical role in enhancing the performance of the proposed E2E control schemes. The number of hidden layers of each E2E control policy, in both scenarios, and the overall accuracy of each are summarized in Table 3, where TDNN-1 and TDNN-2 refer to the TDNN-based control policy with one and two tapped delay lines, respectively. Moreover, Fig. 7 shows an example of the overall confusion matrix, that assesses the performance of the training, for the TDNN with one time-delay window, considering the second scenario. The training algorithm is terminated by one of the two following events: first, if it exceeds the maximum number of epochs, which in our case is set to 1000 epochs. The second event is the early stopping criterion, which occurs when the targeted performance is attained before the maximum number of epochs are reached. The latter is used during the training phase. The training phase is running on a personal computer with an Intel © Core i5-8265U CPU@1.60GHz and 16G RAM. Once the proposed networks are well-trained, the testing phase will be carried out to assess the effectiveness of the networks on unknown (i.e., unseen) data before utilizing them online for controlling the 3L-NPC. It should be  noted that the output of the proposed control schemes will be updated, each sampling time in the Simulink implementation, by utilizing the zero-order hold with a time equal to T s .

V. SIMULATION RESULTS AND DISCUSSION
With the aim of demonstrating the prospective advantages of our proposed E2E control strategies compared to the classical FSC-MPC, intensive simulations validation are carried out and discussed in this section.

A. SIMULATION SETUP
The MATLAB/Simulink software has been used to simulate the PV powered 3L-NPC system given Fig. 1, to verify the performance of our proposed E2E control policies in comparison to the classical MPC scheme. The parameters of the system depicted in Fig. 1 are summarized in Table 4. The solver of the MATLAB/Simulink software is adjusted to run at a fixed time step of 5 µs in a single-tasking mode. In the case of rooftop installations, which is the case study under consideration, threephase systems typically have power outputs ranging from 10 to 15 kW. Furthermore, the PV panel structure was 960 series cells × 6 parallel strings to form a 10 kW of PV system and achieve the desired voltage at the DClink after the boosting stage. The voltage reference of the DC-link of the inverter is set at 650 V to be suitable for coupling the converter to the 3 − Φ grid (230 V, 50 Hz). The investigated system is tested at various irradiation levels, where it differs from the loading power. The PV array always generates the maximum available power under the current climatic conditions, and the inverter injects this captured energy into the utility grid. The proposed system has been tested under step variations in the solar irradiation, as shown in Fig. 8. The steps of irradiation have been selected to be matched with the Californian-efficiency levels [41].

B. SIMULATION-BASED RESULTS
In order to demonstrate the superiority of our proposed E2E control policies, a comparison between the proposed control policies (namely, ANN-and TDNN-based control policies) and the conventional FCS-MPC scheme is carried out with respect to the THD of the grid current I g , which is then summarized in Table 5. The THD is measured at 100% irradiation and t = 0.6 s (i.e., at steady-state operation) for both scenarios discussed in Section IV-D, considering the system parameters given in Table 4 whereas T s is set to 30, 45, 60 µs, respectively. For the TDNN-based control policy, the time-delay window d is set to 1 and 2. As anticipated, in general, the performance of the proposed E2E control policies outperforms that of FCS-MPC in terms of the lower THD of the grid current I g (as highlighted in green).
However, for the first scenario (where a higher number of input features and, in contrary, a lower number of training data is utilized), we can clearly observe that in the third testing case, where T s = 60 µs, FCS-MPC has better performance than all proposed schemes. This reflects the fact that the selection of the training data has an observable influence on the performance of the E2E control policy, regardless of how long the input vector X to the neural network is. This is an important motivation for us to consider the second scenario in which the training data is properly chosen, resulting in enhancing the performance of the proposed control schemes. Moreover, in both scenarios, it is noteworthy that the TDNN-based control policy with two tapped delay lines (namely, TDNN-2) has a slight improvement compared to that of one tapped delay line (namely, TDNN-1). For this reason, we considered the TDNNbased control policy with only one tapped delay line and X = 6 (i.e., M = 12), to be compared with the classical FCS-MPC scheme. Figure 9 presents the corresponding results of the PV array with the FCS-MPC scheme. The PV current tracks its MPPT value very well, as shown in Fig. 9(a). Figure 9(b) shows the PV terminal voltage response. It varies according to the MPPT conditions. However, there are some spikes at the instants of step change. These spikes can be explained based on the control loop's characteristics. The system has many nested control loops. The outer loop is the grid current control loop that is having a slow response compared to the inner loop. Hence, the grid power is slow, as shown in Fig. 9(c). Therefore, the grid still absorbs high power though the PV output has been dropped. Consequently, the capacitors compensate this drop from their stored energy. In Fig. 9(c), the grid power tracks the PV power with a small error according to the circuit losses. Compared to the results of the TDNN-1, shown in Fig.  10, the response of the PV array is nearly typical in the two cases. However, the output power of the grid and its transient response in the case of TDNN-1 is slightly better, especially in the time interval between t = 1 s and t = 3 s as illustrated in Fig. 11. The DC link voltage responses for the two control schemes are shown in Fig. 12. The voltage ripples with the TDNN-1 case are lower by nearly 7%. These ripples may be one source of distortion in the transformerless inverter. Figures 13(a)-13(b) show the instantaneous leakage current for the proposed system using the two controllers. The improvement in the value of the leakage current in case of the TDNN-1 control policy is noticeable. The variation of the RMS of the leakage current with the irradiation level is presented in Fig.  13(c). Moreover, the TDNN-1 control scheme provides an improvement of about 25%. The grid currents and voltages of the two control strategies are presented in Fig. 14. It is clear that (V g & I g ) components are   sinusoidal and maintain unity power factor operation with the two controllers. Figure 15 presents the FFT spectrum of the grid current at 100% insolation level. The figures indicate that the lower order harmonics in the case of TDNN-1 are less than that in the case of FCS-MPC. It can be clearly seen that the value of the harmonics is the lowest for the TDNN-1 case. The training and optimization processes of the TDNN-1 control policy minimize the THD of the grid current. Hence, the THD of the grid current in the case of TDNN-1 is lower than that in the case of FCS-MPC. The THD of the grid current with the TDNN-1 control scheme is 1.63%, while it is 2.08% with the FCS-MPC scheme. The two algorithms have a low order harmonics in the grid current spectrum. This validates the superiority of the proposed TDNN-1 learning-based control policy.
In addition, the average execution time of the conventional FCS-MPC and E2E learning-based control strategies, as a percentage of the entire simulation time, is presented in Table 6, where the simulation is run for only 1 s at 100% solar irradiation. The execution time is estimated using the Simulink Profiler tool, which is utilized for measuring how much time Simulink spends executing each function of the simulated model. It is noticed that roughly 26.28% of the simulation time is spent in estimating the optimal switching states when FCS-MPC is employed, compared to a maximum of 1.52% when the E2E control schemes are utilized, regardless: (i) utilizing a high number of input features as in Scenario #1, or (ii) including a large number of hidden layers as in Scenario #2 and listed in Table 3. Thanks  to the natural parallel structure of neural networks, massive computations can be performed in a short time, resulting in a significantly lower computational burden than that of the conventional control methods.

VI. HARDWARE-IN-THE-LOOP (HIL) VALIDATION
In this section, we validate the applicability of our proposed E2E control strategy, as well as FCS-MPC, on a digital signal processor (DSP) controller using a real-time HIL validation.

A. HIL ENVIRONMENT SETUP
To test the proposed system and validate the investi-gated simulation results, Hardware-in-the-Loop (HIL) emulator has been implemented using C2000TM-microcontroller-LaunchPadXL TMS320F28379D kit. The procedure of the HIL emulator is to host a certain part of the system, usually the power part, in the personal computer as a model in the MATLAB. This model is adapted to run in the realtime simulation mode that can interface its input-output signals without any external devices. However, the control functions and algorithms are programmed in the external microcontroller kit that is previously indicated [21], [53]. In simulations, the system controllers have already modelled in a discrete form, which is suitable for the real-time HIL emulator. A block diagram of the proposed system implementation using HIL technique is presented in Fig. 16. The proposed system power units including the PV, power converters, and filters are simulated and hosted in the MAT-LAB program. On the other hand, the control algorithms, namely FCS-MPC and TDNN-1, are implemented on the micro-controller kit. The communication between the PC and the kit is achieved using a virtual serial COM port. It permits the MATLAB to transmit the power circuit measured signals such as the DC bus voltage, grid voltages, and grid currents to the kit. Consequently, the kit processes the control algorithms and generates the 3L-NPC transformerless inverter switching signals.

LAUNCHXL-F28379D
Host PC (TMS320 F28379D)  phase 3L-NPC grid-tied inverter powered by a PV system, with the aim of (i) generating a high-quality sinusoidal grid current, (ii) improving the power quality of the utility grid, and (iii) reducing the leakage current. With such a control strategy, the grid current of the 3L-NPC is directly controlled, without the need for the mathematical model of the inverter as the whole system is considered as a black-box. So, the proposed E2E control policy can simplify the control design and it could be extended to other systems with a little effort. In this work, we have also studied the effect of having different input features and training data on the real system, i.e., 3L-NPC, as they play a critical role in improving the accuracy of the learned control policy and in assessing the ability to learn the mathematical model of the system. The simulation results demonstrate that our proposed control schemes, particularly, TDNN-1, have superior performance compared to the conventional FCS-MPC, with respect to the power quality where a significant reduction in the THD is achieved compared to FCS-MPC. In addition, the leakage current in the PV transformerless system is minimized. The future work would consider the investigation of the hardware implementation for the proposed system.

DECLARATIONS Funding
The research leading to these results has received funding from the Deanship of Scientific Research, University of Tabuk, under Grant Agreement number S-1441-0055.