Real-Time Nonlinear Behavioral Electrothermal Device-Level Emulation of IGBT on Heterogeneous Adaptive Compute Acceleration Platform

Power converter design evaluation by means of real-time simulation techniques is prevalent, although it is mostly restricted to simple power semiconductor switch models that exclude device-level physical details. In this work, the nonlinear high-order electrothermal model of the IGBT is developed and then deployed onto the heterogeneous digital hardware for real-time implementation. As the complexity of the NBM of the IGBT poses a significant computational burden on real-time hardware emulation, ML methodology is utilized so that the trained model can reproduce the characteristics of its original counterpart as much as possible and then it is implemented on the ACAP, which composes of the PS, PL, and AIE. The vector multiplication feature of the AIE caters to mathematical operations of the ML-based model particularly well and consequently enables it to be executed in real-time with remarkable speedup over the original model with which matrix inversion is otherwise mandatory. Finally, the validation for real-time device-level results and system-level results of a multiconverter system is provided by SaberRD and MATLAB/Simulink.

where most of them are based on detailed modeling or average value modeling, which suffices for the testing and verification of system-level converter functions such as frequency regulation and voltage adjustment. When an in-depth study is required for a comprehensive electrothermal transient analysis, the device-level modeling is compulsory [7], as it reveals the transient performance of the power semiconductor switch, so that the transient voltage, current, and thermal stresses can be monitored accurately for real converter design evaluation [8].
Various device-level IGBT models have been developed and widely used in the past for power converter simulation [9], [10], such as the analytical model, and the NBM. However, the modeling complexity due to the inclusion of device transients poses a significant challenge accompanied by a high chance of numerical divergence. This often results in a short simulation duration that is even insufficient for the system to reach its steady state, especially in commercial simulation tools such as PSpice, Multisim TM , and SaberRD. Therefore, hardware acceleration using FPGA has been adopted for medium-scale power converters where a dramatic speedup over CPU was attained [11], [12]. In addition, [13] implements the device-level simulation of the IGBT model using the parallel algorithm on gate recurrent unit (GPU), which also significantly improves the simulation efficiency. Real-time simulation [14] is playing an increasingly vital role in the development and testing stages of power electronics and requires the model to be updated strictly within the corresponding simulation time-step, but the nonlinear property of the device model determines that real-time execution can hardly be met due to a Newton-based iterative solution of a high-order matrix equation. As a result, both hardware acceleration and algorithm optimization are necessary to achieve that goal.
Machine learning (ML) has begun to be employed in power systems and power converters to reduce the computational burden of conventional models [15], [16], and various NNs including GRU [17] and recurrent neural networks (RNN) [18] are utilized to train models and obtain accurate results and improve the simulation efficiency. As a novel and time-saving approach, ML can also be applied to the study of circuit transients by learning a specific dataset and configuring the NN to create the design-compliant models [19]. However, this approach has yet to be explored for power electronics device simulations. In this article, the ML methodology is adopted for avoiding high-dimensional matrix equations that are challenging to solve by traditional methods.
Compared to the conventional FPGA, the Versal TM ACAP from Xilinx has an innovative design in terms of hardware architecture, which combines adaptable engines, scalar engines, intelligent engines, and NoC to provide powerful heterogeneous acceleration for a wide range of applications [20]. As the most critical and innovative part of ACAP, the AIE is a highly optimized processor with many features, such as the SIMD vector unit, and VLIW function that can be used in the field of real-time emulation to solve the data-intensive computing issues. In this work, the IGBT electrothermal NBM has been implemented and evaluated on the Versal TM ACAP's PS, PL, and AIE, separately. The ML-based model is proposed to accommodate the SIMD vector processing feature of the ACAP, specifically, the adoption of the NN enables faster matrix calculations to replace the complex iterative matrix inversion in the transient simulation process. The ML model is realized through learning from the dataset of IGBT NBM, and the AIE SIMD vector unit provides intrinsics [21] to make the model emulation more efficient before being implemented on the ACAP. Finally, the simulation results of a multiconverter system are verified by MATLAB/Simulink. The rest of this article is organized as follows. Section I introduces the IGBT device-level nonlinear behavioral electrothermal model. In Section III, the Versal TM ACAP architecture including PS, PL, and AIE is introduced, and the implementation and performances of the NBM in these three domains are also presented. The ML model, training methodology, and vectorized implementation are described in Section IV. Section V shows the validation of the ML model and hardware simulation results. Finally, Section VI concludes this article.

A. IGBT NBM
The NBM [22] of an IGBT with its inherent antiparallel diode is shown in Fig. 1(a). According to definition a capacitor can be discretized by backward Euler as where t is the time-step. The equivalent conductance is defined as G C eq = C t , and the equivalent current source Consequently, for capacitor C ge , the conductance G C ge and current source i C geeq are given as The discretized forms of nonlinear capacitors C cg and C ce are identical, e.g., where m is the Miller capacitance exponent coefficient, which is set to 0.5 by default, and C cgo is the fixed capacitance, given in Appendix A. Similar to C ge , the conductance could be calculated as G C cg = C cg t , and the equivalent current source as where q C cg is the charge.
Since the IGBT has three operating states: OFF state, linear, and saturation regions, the MOSFET is adopted for model description, and its equivalent current i mos can be formulated by three segments, namely where a 1 , a 2 , b 1 , b 2 , x, y, and z are coefficients, v C ge and v d are the voltages over capacitor C ge and i mos , respectively, V th is the IGBT channel threshold voltage, and V C ge is defined as Consequently, the conductance G mosvd and transconductance G mosvcge resulting from the discretization of the component can be derived by taking partial derivatives of v d and v C ge , respectively, and each operation state has a different form.

1) ON STATE
Under ON state, i.e., v d is less than the value of (y · v C ge ) 1 x , the conductance and transconductance are expressed by the following equations:

2) TRANSIENT STATE
Under the transient stage, the conductance G mosvd is zero, and the transconductance can be derived as

3) OFF STATE
When the IGBT is OFF, both G mosvd and G mosvcge are zero.
Taking the different forms of G mosvd into consideration, the companion current of i mos can be calculated by The tail current I tail occurs when the IGBT is being turned OFF, and it can be estimated using the formula below where i rat is a fixed current. Finally, all subunits are combined and expressed as G IGBT · v IGBT = I IGBTeq (15) where G IGBT is the 5 × 5 admittance matrix, v IGBT is the IGBT node voltage, and I IGBTeq is the companion current.

B. DIODE NBM
The nonlinear behavioral power diode model is demonstrated in the right part of Fig. 1(a). The relationship between diode static current I d and its junction voltage is expressed by where I s is the leakage current, V b is the junction barrier potential, and V j is the static junction voltage. The nonlinear diode conductance G j and the companion current I jeq are

C. IGBT ELECTROTHERMAL MODEL
As given in Fig. 1(b), the process in which the power loss causes semiconductor junction temperature rise can be modeled by the R-C pairs as an equivalent electrothermal network [23] which is generally expressed as where R th(i) and τ i are constants. The power loss of the IGBT P loss is numerically equal to the input current of the transient thermal impedance equivalent circuit. On the other hand, the terminal voltage of the current source can be taken as the semiconductor's junction temperature T j where T e stands for the ambient temperature, G ci = t/2C th(i) , and I ci is the capacitor history current.

III. IGBT NBM IMPLEMENTATION ON ACAP
Versal TM devices are the first ACAP based on the TSMC 7 nm FinFET process technology of Xilinx. Fig. 2(a) depicts the architecture of ACAP, which consists of a scalar engine (PS), an adaptable engine (PL), and an intelligent engine, all of which are connected together via a series of high-speed and integrated horizontal and vertical paths NoC to achieve remarkable performance and meet design timing, speed, and logic utilization requirements.

A. IGBT DESIGNS ON ACAP
1) AI Engine: As shown in Fig. 2(b), the AIE array is the top-level hierarchy of the AIE architecture, which integrates a 2-D array of AIE tiles. The AIE array interface enables the AIE to communicate with the rest of the Versal TM device through the NoC or directly to the PL. The AIE tile architecture is shown in Fig. 2(c), where each tile includes one tile interconnect module which handles AXI4 input/output, a memory module, and an engine, which can access up to four memory modules in four directions. The AIE, shown in Fig. 2(d), is a highly-optimized processor that supports both fixed-point and floating-point precision and is organized as an array of AIE tiles, which can contain up to 400 tiles on the VC1902 device used in this work. The AIE programming flow is carried out in two phases with the Vitis integrated design environment: Kernel programming and graph programming. A kernel describes a specific computing process running on a single AIE tile where C/C++ code is used for programming, and a C++ framework is provided by Xilinx to create graphs from kernels that contain declarations for the graph nodes and connections. A graph will instantiate and connect the kernels using buffers and streams, and also describe the data transfer between the AIE array and the rest of the ACAP device. Fig. 3 shows the dataflow graph and kernels of the NBM implementation, which is achieved by five AIE kernels (pre_cal, diode, igbt_on, igbt_of f , and igbt_transient), connections, and different types of buffer, where the data transfer between kernels is memory-to-memory and the transmission of data between kernels and PL is stream-to-memory or memory-to-stream. First, the node voltage of the IGBT is sent as input to the first kernel pre_cal for parameters precalculation, the second kernel diode computes the parameters of the diode, and the third to fifth kernels igbt_on, igbt_of f , and igbt_transient are designed to perform IGBT nonlinear functions in the ON state, OFF state, and transient state, respectively, and finally, the outputs make up the admittance matrix in (15).
2) PS: As shown in the scalar engine part of Fig. 2(a), the application processing unit (APU) is based on the ARM Cortex-A72 processor core to provide general-purpose computing in a standard programming environment [24], which is chosen for IGBT NBM computation since it offers higher capabilities and a high clock frequency of up to 1700 MHz. The OpenCL and the (XRT) methodology are adopted for software programming, which enables multiple kernels to be executed concurrently with initialized command queue and thus is highly efficient in performance.
3) PL: PL is an extensible structure that enables the creation of a wide range of conceivable functions. It consists of digital signal processor engines, configurable logic blocks, configuration RAM, and BRAM, which can be configured together to create numerous types of hardware functionalities  including accelerators, processors, functional pipeline units, and peripherals [24]. As shown in the left part of Fig. 3, PL establishes connections between PS, NoC, AIE, high-density I/O buffers, and components instantiated within the PL. In the IGBT NBM design, the global memory input/output port is used to connect external memory mapped to or from the global memory, which accesses DDR memory directly with a bandwidth throughput of 3200 MB/s. The connections and configuration of the PL elements are captured in the Vivado design suite and the Vitis unified software platform toolchain using a programmable device image. Fig. 4 shows the setup of the hardware platform Xilinx Versal TM VCK190 board with the ACAP device XCVC1902. The IGBT NBM is implemented on the PS, PL, and AIE of the ACAP, respectively, for a comprehensive evaluation of different design schemes. When the simulation duration is 0.05 s, the actual execution time for the simulation is 0.042 s on the PS. Then, the real-time ratio could be expressed as 0.05 s 0.042 s = 1.19, which indicates that for a single IGBT, the simulation speed is slightly faster than real-time. However, the simulation of a power converter with many IGBTs slows down significantly due to the inadequate scalability of PS. Table 1 lists the latency and resource utilization of NBM implementation on AIE and PL. While the PL has the advantages of numerous resources and customizability to support the simulation of systems with multiple IGBTs, a heavy data dependency of the NBM restricts parallelism and ultimately leads to high latency. The AIE has highly optimized processors and a data stream frequency of 1 GHz for efficient parallel processing. The AIE scalar processor has an excellent performance on fixed-point data processing but is not ideal for floating-point data required by NBM, as shown in Table 1. To accelerate the computing process, the ML strategy and AIE vector unit are adopted, as the adapted vectorized data type and SIMD features enable the IGBT NN model to be processed simultaneously.

IV. ML MODELING AND REALIZATION OF NBM
Based on the NBM performance evaluation in the previous section, it can be seen that the real-time performance is less than satisfactory. A ML-based cosimulation technique is proposed to streamline the computational procedure while maintaining simulation accuracy.

A. SELECTION OF NN TOPOLOGY
Different NNs such as CNN, RNN, and ANN are novel trends in the realm of ML, providing impetus for various applications. Similarly, the NN methodology can be valuable in the field of real-time simulation, as one of its benefits is that it can take advantage of the numerical prediction property to derive the corresponding output model by training on specific data, thus avoiding the extensive computations caused by iterations during transient states.
In Fig. 5(a), an elementary version of the NN is depicted, with a multilayer structure formed by certain neurons, notably the input layer, the hidden layer, and the output layer, each node in the upper layer is linked to all the nodes in the next layer. The mathematical expression is where X is the input, n is the number of neurons, Y is the output, W is the weight, and b is the bias. Fig. 5(b) represents the general mathematical model of NN, where the input variables from x to x i are multiplied with the weight matrix W and summed with the bias value b. Finally, the activation function serves as a nonlinear mapping, limiting the amplitude of the output to a specific range. Common activation functions include sigmoid, tanh, and ReLU [25], of which ReLU is the most popular type in machine learning compared to the Sigmoid and Tanh functions since ReLU has only a linear relationship and its computation is faster than the other, which needs to perform exponential operations.
In this work, ANN is chosen as the IGBT NBM transient state ML model because it has the feature of fitting the intermediate data curve by the first and last data only, which avoids the problem of computational iterations in traditional EMT models, and its high parallelism and low execution delay can match the criteria of transient simulation.

B. DATA COLLECTION AND TRAINING METHODOLOGY
One crucial part of ML training of devices is the selection of the dataset since it will influence the accuracy of the training results and the generality of the model. For the IGBT Siemens BSM300GA160D, rated 1600 V, 300 A in this work, where the parameters are provided in Appendix A, the dataset is extracted from the MATLAB simulation results of the IGBT NBM, and both the turn-ON and turn-OFF data during the transient state should be of concern.
The corresponding IGBT NBM ANN model has five input variables including the initial and last status of the transient state voltage V start , V end , current I start , I end , and gate signal V g . All these data are normalized to (−1,1) using min-max normalization, which allows for easier data processing and better training performance.
The MAE is used to measure the accuracy of the training model where n is the total number of the output, y i is i th originate value from the dataset, and the y pre i is the corresponding output of the ANN model. The Adam optimization algorithm is adopted as the training methodology in this work to minimize the error [26]. Fig. 6 shows the MAE of the IGBT ANN model, which presents the error reduction during the training process. The training epoch is selected as 1000 to reduce error, and the hidden layer size is set to 32 to improve the efficiency of the AIE vector code since the size of the accumulator is a multiple of 8-b. Since the MAE of one hidden layer is not significantly distinct from that of two hidden layers, it is used to achieve optimal performance.

C. MATRIX MULTIPLICATION IMPLEMENTATION WITH AIE
From the previous part of this section and the mathematical expression, the input variables need to be multiplied by the weight and summed by bias, which could be seen as the matrix multiplication and addition for the hidden layer and output layer. Some changes are performed to the matrix size that has no impact on the outcome to make the operations adaptable for the AIE vectorized code, e.g., for the hidden layer, the size of the weight matrix W is 32 × 8, the input matrix X is 8 × 1, and the bias matrix b is 32 × 1.
The column-based matrix multiplication is implemented using vectorized AIE code, where the vector data types pack multiple scalar data elements into a wider vector. In this case, both the AIE API and intrinsics are employed to increase  design productivity. The AIE API, which is implemented as a C++ header-only library and offers types and operations that are converted into effective low-level intrinsics, is a portable programming interface for accelerators. In the meantime, the vector data types and the MAC intrinsics [21] are deployed for application-level programming in this work.
There are two solutions based on AIE floating-point intrinsics to implement the matrix multiplication; the first strategy is to perform the multiplication with f pmul and then add it with the bias matrix to the accumulator using f pmac. Another methodology, the more efficient way presented in this article, is to apply f pmac intrinsic only as shown in Fig. 7(a). Firstly, the bias matrix b is loaded to the accumulator, then the weight matrix W is stored at several accumulators by column, and each column in the weight matrix is multiplied by the corresponding row of the input matrix X, where the f pmac intrinsic is applied to perform both the matrix multiplication and addition, the full IGBT ANN AIE vectorized matrix calculation is shown in Fig. 7(b).     Table 2 shows the latency and resource consumption of different parts of the ANN model implemented in AIE. A comparison of matrix multiplication implementations on different hardware platforms is given in Table 3, for the same size matrix multiplication, AIE is 2.6 times faster than CPU and more than 28 times faster than FPGA.

B. REAL-TIME SYSTEM-LEVEL EMULATION RESULTS
The case study system is presented in Fig. 9, where Fig. 9(a) shows the two-level VSC converter. For the dc side, as shown in Fig. 9(b), there are four kinds of load circuits, namely half-bridge load, buck load, boost load, and full-bridge load, and Fig. 9(c) presents the control diagram. The system parameters are given in Appendix B. The emulation of the system is implemented on the Xilinx Versal TM ACAP XCVC1902, where the time-step is 5 μs. Table 4 provides the hardware resources consumption and the latency of the different parts of the system. Fig. 10 demonstrates the simulation results of the case study system with the AC side fault F at 0.4 s as shown in Fig. 9(a). In Fig. 10(a), before the ac side fault, the power of the grid varied in the range of approximately 600 to 900 kW; and it quickly drops to about 50 kW when the fault occurs. Then, after 0.1 s, the grid power is gradually restored. Fig. 10(b) displays the power of the full-bridge and half-bridge load, which both decrease from their original power at fault, and increase to peak at 0.5 s, then reinstate at 0.6 s. Fig. 10(c) is the power of the buck load and has the same trend as the previous figures while the value drops to 0 when the fault happens. Fig. 10(d) is the boost load power and the power remains steady before the fault, and the value changes from −124 kW to −110 kW between 0.4 s to 0.5 s, and recovery to the original value after 0.1 s. Fig. 10(e) and (f) is the voltage on the dc side and  ac side. Fig. 11 gives the junction temperature of an IGBT in the simulation of the whole system. In Fig. 11(a), with cooling system 1 which has the insufficient capacity as given in Appendix A, the junction temperature reaches about 220 • at the steady state. Fig. 11(b) shows that with a decent capacity, such as cooling system 2, the temperature remains below 70 • even though the fault occurred.
In Fig. 12, the simulation results of the system are presented with the dc side half-bridge load circuit fault at 0.5 s and last for 2 s. Fig. 12(a) shows the gird power between 0 and 3.0 s, and it can be seen that the power increases to about 95 kW at 0.5 s, and then returns to its original value at 2.5 s. Fig. 10(b) is the power of the full-bridge and buck load, both of which do not change considerably after the fault occurs. In Fig. 12(c), the power of the half-bridge load increases from its original value to 440 kW and becomes stable in the range of 390 kW to 420 kW, then restored after the fault ends at 2.5 s. Fig. 12(d) shows the dc side voltage, which originally varied between approximately 950 V and 1040 V, and changed to between 940 V and 1050 V after the fault occurred.

VI. CONCLUSION
Real-time emulation of a device-level NBM of IGBT is a challenging task due to its high computation burden arising from the need for an iterative solution of device equations to obtain a convergent solution of every nanosecond scale time-step. In this article, a ML strategy is proposed to tackle the IGBT nonlinear behavioral electrothermal model and demonstrated in a multiconverter supply-load system case study. The model is implemented on three main domains of a novel heterogeneous ACAP hardware: PS, PL, and AIE, which are introduced in detail in terms of functionality and features. The performance evaluation results, covering latency and hardware resource consumption, are provided separately. To make better utilization of the VCK190 hardware platform and AIE characteristics to achieve the requirements of real-time simulation, the IGBT ML-based model and NNs training methodology are proposed, where the ANN model is adopted to convert the complex computational iterative process of the transient state into the simpler matrix operations. From results comparisons with the conventional model in device-level emulation, the error of the IGBT ML model is within 1%, and the real-time requirement can be achieved with less resource consumption. The system-level simulation results are given for two different fault scenarios on both ac and dc sides and validated by MAT-LAB/Simulink. The proposed modeling and implementation strategies can be applied in the future for real-time emulation of energy conversion systems in various practical applications.

APPENDIX B
The parameters of the case study system: The grid voltage V s = 490 V (L-L), 60 Hz; the transformer 1MVA, 25 kV / 490 V ; C dc = 0.0333 F ; the half-bridge load 400+j50 kVA; the buck load 250 kW , duty D = 0.55; the boost supply V boost = 500 V , duty D = 0.8; the full-bridge load 200+j50 kVA.