Denoising and Voltage Estimation in Modular Multilevel Converters Using Deep Neural-Networks

Modular Multilevel Converters (MMCs) have become one of the most popular power converters for medium/high power applications, from transmission systems to motor drives. However, to operate properly, MMCs require a considerable number of sensors and communication of sensitive data to a central controller, all under relevant electromagnetic interference produced by the high frequency switching of power semiconductors. This work explores the use of neural networks (NNs) to support the operation of MMCs by: i) denoising measurements, such as stack currents, using a blind autoencoder NN; and ii) estimating the sub-module capacitor voltages, using an encoder-decoder NN. Experimental results obtained with data from a three-phase MMC show that NNs can effectively clean sensor measurements and estimate internal states of the converter accurately, even during transients, drastically reducing sensing and communication requirements.


I. INTRODUCTION
Multilevel converters (MCs) are a cost-effective solution for medium to high-voltage power conversion. MCs have been under intense research and development during the last decades and have shown wide adoption in industrial applications such as high power AC drives, high-voltage direct-current transmission, integration of renewable energy, power filters and static synchronous compensators [1], [2].
Among the different topologies for MCs, Modular Multilevel Converters (MMCs) have become one of the most attractive options for medium/high power applications [3]. The MMC stands out from other topologies due to its high modularity, impressive scalability, fault tolerance and high output power quality, which means very low harmonic The associate editor coordinating the review of this manuscript and approving it for publication was György Eigner . distortion with reduced passive filters [4]. MMCs operate at high voltages with standard power semiconductors, but with a high voltage DC-link and a floating capacitor in each sub-module (SM). Nowadays, the MMC is widely found in commercial solutions, such as the ABB HVDC light, Alstom MaxSine, Siemens Sinamics SM120N and the Benshaw M2L series.
MMCs contain up to thousands of SMs depending on the application, which poses several challenges for its operation and control. Power converters are a natural source of electromagnetic interference (EMI) due to the high dv/dt and di/dt generated by power semiconductors. EMI can lead to high frequency spikes in measurements and it also may interfere with adjacent communications, monitoring and control systems. Hence, its proper management is imperative [5]. A second challenge comes from the fact that the modular structure of the MMC is not exploited in the control scheme, which usually is centralized and requires a large number of measurements and signals, including at least one voltage measurement per each SM. This increases the complexity and cost of the converter, and reduces reliability and noise tolerance [6]. Several efforts have been made to reduce the number of voltage sensors in MMCs, in order to simplify the hardware requirements and increase reliability and noise immunity [6]. The Kalman filter has been proposed as an estimator for capacitor voltage in [7], but the formulation uses an extra voltage sensor per stack and its slow dynamic response may affect control performance during transients and faults. In [8], an adaptive linear neural network is proposed for SM voltage estimation, adding sensors to the stack inductors and output voltage of each phase. However, this method presents a considerable tracking error.
This work proposes the use of Deep Neural Networks (DNNs) to support the operation of MMCs. Particularly, it is proposed to combine Convolutional Neural Networks (CNNs) and Recurrent Neural Networks (RNNs) in an encoder-decoder architecture, for: i) denoising measurements from the MMC, with the aim of providing cleaner signals, with negligible delay, to the control system; and ii) estimating the capacitor voltage of each SM of the MMC, with the aim of eliminating most of the sensors required to operate the MMC. The proposed denoiser is able to blinddenoise the signals, i.e., denoise without knowing any prior information of the noise, while the proposed estimator is capable of estimating the capacitor voltages in different states of operation, even during transients. The effectiveness and performance of the proposed techniques are evaluated using real operational data from a three-phase MMC, with 4 sub-modules per stack, obtained under both steady-state and transient operation.

A. NOTATION AND BASIC DEFINITIONS
In this work, R denotes the real numbers, Z >0 the positive integers, R n the Euclidean space of dimension n, and R n×m the set of n × m matrices with real coefficients. For a, b ∈ Z ≥0 , [a; b] denotes their closed interval in Z. For a vector v ∈ R n , v i denotes its ith component. For a matrix A ∈ R n×m , A i denotes its ith column and A i its ith row. For an n-dimensional real-valued sequence α : Z >0 → R n , α(t) denotes its tth element and α [a;b] its restriction to the interval [a; b], i.e., a sub-sequence. For an n-dimensional sub-sequence α [a;b] , M (α [a;b] ) ∈ R n×(b−a+1) is a matrix whose ith column is equal to α(a+i−1), with i ∈ [1; b−a+1]. The same notation applies to an n-dimensional finite-length sequence α : Given an n-dimensional sequence α, its T -depth window is a matrix-valued sequence β :

B. MODULAR MULTILEVEL CONVERTERS
The schematic representation of the MMC under study is shown in Figure 1. The three-phase MMC consists of three legs connected in parallel to a common dc-bus V DC . Each leg is divided in two stacks, upper (u) and lower (l), while each stack is formed by the series connection of N SMs with a stack inductor L s and an equivalent resistance R s . The midpoint of each leg is connected to an AC output terminal (a, b, c).
The number of SMs per stack depends on the voltage of the application. In a typical HVDC application, the number of SMs per stack can be over 500 [6], with a voltage of ±400 kV. On the other hand, motor drive applications need between 5 − 20 SMs for an operating voltage of 3.3 − 13.8 kV [9].
There are plenty of SMs topologies currently proposed in the literature. Nevertheless, the basic and most commonly used one is the half-bridge (HB) [10], due to its high efficiency, simplicity and straightforward SM capacitor voltage balancing. The HB consists in two transistors operating in a complementary manner, and a single floating capacitor (C), generating two possible voltage levels (v c and 0). The variation of the capacitor voltage is a function of the stack current and the switching state of the power semiconductors.

C. ENCODER-DECODER NEURAL NETWORKS
RNNs have been extensively used in applications where time series have to be processed, as modeling of dynamical systems or fault detection, mainly because of its recurrent structure, similar to a dynamical system in state-space form. However, recent studies, have brought to light problems of the classical RNN architecture, such as difficulties to retain long-term information and learn complex patterns.
To address these problems, in [11] an architecture consisting in two cascaded RNNs was proposed, where the first network (the encoder) maps a variable length input sequence to a fixed-length vector representation in an intermediate space and then, the second network (the decoder) maps this vector to the target sequence space. This architecture is known as encoder-decoder.
The encoder-decoder architecture has gained popularity and variations to the initial formulation have been proposed lately, as attention mechanisms [12] or the use of non-recurrent networks. In particular, encoder-decoder CNNs have shown similar performance to the original RNN formulation in sequences-related tasks [13], but with a fraction of the training and processing time required by RNNs. Recently, the benefits of combining both type of networks in an encoder-decoder structure have been exposed [14], where CNNs process longer sequences in less time and extract local spatial features, while RNNs find temporal relationships among input signals.
A particular instance of encoder-decoder networks is the Autoencoder (AE) [15], which is an unsupervised NN capable of learning efficient representations of the inputs by trying to reconstruct them. Formally, the AE maps an input vector y ∈ R n to its latent representation z ∈ R m where typically m < n. This mapping is done by an encoder f θ e , parameterized by θ e , as z = f θ e (y). The latent vector z can be mapped back to the input space by the decoder f θ d , parameterized by θ d , asŷ = f θ d (z). Both the encoder and the decoder can be any type of NN depending on the application. Nonetheless, in all cases the same optimization problem is solved, which amounts to minimizing the difference between input and decoded vectors.

III. DENOISING AND VOLTAGE ESTIMATION USING ENCODER-DECODER NEURAL NETWORKS A. DENOISING PROBLEM FORMULATION
Denoising based on autoencoders started with the appearance of the Denoising Autoencoder (DAE) [16], which is fed with inputs corrupted by noise and is forced to reconstruct the clean version of the inputs. DAEs have been successfully used in many applications, particularly image processing [17]. However, DAEs work under the assumption that the clean version of the inputs is available and the noise nature is known, which in general is not true. Blind denoising autoencoders (BDAEs) [18] address the aforementioned issue. BDAEs are able to denoise inputs by restricting the latent space, without knowledge about the noise nature nor access to the clean inputs. In [19], a BDAE with a dual encoder structure (DBDAE), one RNN encoder to exploit temporal relationships among inputs and one CNN encoder to exploit spatial correlations, was designed to blind-denoise time series data from dynamical systems. In this brief, we propose to use this DBDAE structure to filter sensory data in MMCs. Figure 2 illustrates the DBDAE and its building blocks are explained in the following. Consider a dynamical system and a sampling process generating a sequence of noisy measurementsỹ, a T-depth windowỸ can be constructed, so that In this setting,Ỹ k (n) ∈ R N is a column vector containing the N measurements at time instant n − T + k, andỸ k (n) ∈ R T is a row vector containing the last T samples from sensor k.

It generates its latent representation
where, f RE represents the recurrent network, with L R layers and Q l neurons in each layer l. Considering Gated Recurrent Units (GRU) [11], the operations at each layer l are given by where h l R (j) ∈ R Q l is the hidden state of layer l at iteration j, r l , z l and n l are Q l -dimensional finite-length sequences representing the value of reset, update and new gates of layer l respectively. σ and tanh are the sigmoid and the hyperbolic tangent activation functions, • is the Hadamard product and p l (j) is the input of layer l at iteration j. For l = 1, the input, p 1 (j), is given byỸ j (n), while for l > 1 is given by h l−1 R (j). The latent representation of this encoder is h LR R (T ) ∈ R Q L R , which is the last hidden state of the last layer. The final output of the encoder h R (n) ∈ R Q is obtained by projecting h LR R (T ) with a linear layer to obtain the dimension of the latent space Q. Note that h R is an infinite-length sequence with one value per datasetỸ(n), hence is indexed by n.

2) CONVOLUTIONAL ENCODER
It generates a latent representation in parallel as VOLUME 8, 2020 where f C represents the convolutional encoder network with L C layers. The output of each layer l is composed by K l concatenated output channels C l out k (n) ∈ R z l , with k ∈ [1; K l ], given I input channels. Each of the resulting output channels for layer l at time step n is given by: where b l k is a bias vector for channel k at layer l, W l k (i) is the ith channel of the I -channel kernel corresponding to output channel k, p l i (n) is the ith channel of the input of layer l and is the cross-correlation operator. For l = 1, p 1 i (n) is given byỸ i (n), while for l > 1 p l i (n) is given by C l−1 out i (n). The convolutional encoder produces its latent representation h L C C (n) ∈ R z LC ×K LC by concatenating all the output channels of layer L C . The final output of the encoder h C (n) ∈ R Q is obtained by projecting h L C C (n) with two linear layers to match the requirements of the latent space dimension Q.
The final latent representation of the input at time step n is just the sum of the latent representations of both encoders.

3) RECURRENT DECODER
Is a RNN, f RD , with L D layers andQ l neurons in each layer l. Its hidden state, h LD D , is aQ l -dimensional finite-length sequence of length T , given by where p j (n) ∈ R N is the input for each decoding step of the network, which is either the delayed targetỸ j−1 (n) or the previous network estimateŶ j−1 (n), following the scheduled sampling technique [20]. The network estimates,Ŷ j (n) are calculated by projecting h LD D (j) with a linear layer. The final output of the DBDAE isŶ T (n) ∈ R N , which contains the denoised measurements of at time step n.
It should be noted that the decoder starts once h E (n) is available, which involves T previous iterations of the encoder.

4) CONSTRAINING THE LATENT SPACE
To stop learning, the simple heuristic proposed in [19] is used. The heuristic consists in using principal component analysis (PCA) to transform the latent space of the DBDAE during training and to stop the learning process when the derivative of the reconstruction error is maximum. This heuristic was successfully applied in [19] to a variety of systems and will be used in this work.

B. VOLTAGE ESTIMATION PROBLEM FORMULATION
Modeling of dynamical systems using encoder-decoder RNNs as predictors in input-output form has recently gained popularity [21], [22]. However, when systems exhibit a combination of slow and fast dynamics, models increase in complexity as the input sequences become longer and longer. In this setting, we propose the use of a CNN as encoder to process long sequences aiming to capture the slow dynamics, and a RNN as decoder taking a short sequence as input aiming to capture the fast dynamics. Figure 3 illustrates the structure of the encoder-decoder network proposed as voltage estimator, the building blocks are explained in the following. Given an N -dimensional measurement sequence X sampled from a dynamical system , a T c -depth window X c and a T r -depth window X r are constructed, with T c T r .

1) CONVOLUTIONAL ENCODER
It generates at each time step n a latent representation of X c (n) using the same structure as the convolutional encoder used in the DBDAE, with h C (n) ∈ R Q as the latent representation of its inputs X c (n) ∈ R N ×T c .

2) RECURRENT DECODER
It uses as initial hidden state the latent representation h C (n) and as input the data matrix X r (n) ∈ R N ×T r . The decoder is a GRU-based network with L R layers and Q l neurons in layer l, that generates its latent representation h L R R , a finite-length sequence of length T r , as After T r iterations, h L R r (T r ) ∈ R Q L R is available, which contains information about all the input sequences of the encoder and the decoder and is projected by a linear layer to produceŷ(n) ∈ R D , the final output.

A. EXPERIMENTAL SETUP
As case study, the three-phase MMC shown in Figure 4 is used. The experimental setup includes an OPAL-RT OP4510 system acting as controller (Intel Xeon E3 v5 CPU and two Kintex-7 FPGAs) and a passive RL load with L o = 1 mH and R o = 28 . To obtain experimental data for training and testing of the denoiser and estimator, the MMC was configured according to the operating conditions summarized in Table 1.
The MMC was operated in open-loop, with no control for balancing the voltage of the capacitors, nor the output voltages and currents. Open-loop operation was chosen since it exhibits richer dynamic response, which favors the generalization of the DNNs. In such operating conditions, capacitor 207976 VOLUME 8, 2020  voltages may diverge over time, compromising the operation of the MMC [3]. To overcome this situation, a phase-shifted pulse-width-modulation (PS-PWM) was used since it distributes the SM power evenly, keeping the capacitor voltages balanced [23].
Several experimental datasets were obtained both under steady-state operation and during transients due to step load variations. The datasets contain output voltages, capacitor voltages, stack currents and switching signals, all recorded synchronously. Figure 5 illustrates a portion of a dataset in steady-state operation. A considerable amount of noise can be observed in the measured variables.

B. DENOISING IN AN MMC
The noise in the measurement data-buses of the MMC is mainly due to the high EMI generated by the huge number of power semiconductors switching at high voltage and high speed. Reducing noise in MMC signals is important for a proper operation of the equipment.

1) TRAINING
To train the DBDAE to denoise the MMC signals, 40 seconds of experimental data sampled at Ts = 75 × 10 −6 s were used. The objective of the network is to denoise all the important signals in each phase, that is, all the capacitor voltages, both stack currents and the output voltage. Therefore three DBDAEs with 11 inputs each are trained.
A latent space dimension of 44 and a window length of 20 were selected for each network. As for the loss function,  the MSE (mean squared error) between the reconstructed signals and the original inputs was aimed to be minimized until the stopping criterion, introduced in Section III, was reached. Adam optimizer with a learning rate of 1 × 10 −4 was used. Implementation details of the DBDAE are given in Table 2.

2) RESULTS
Since we are dealing with real data, the clean versions of the signals are not available, hence is not possible to calculate any metric comparing the results with the ground-truth. However, we can visually inspect the results. Figure 6 shows the denoising results for some of the MMC signals. It can be seen that the DBDAE removes noise and switching effects from the on-off operation of power semiconductors, which are not required for control purposes and can deteriorate closed-loop performance. The denoised  signals show negligible error nor lag and follows the fast dynamic of the different signals, unlike a low pass filter.

C. CAPACITOR VOLTAGE ESTIMATION IN AN MMC 1) TRAINING
The encoder-decoder network was trained to estimate the 4 capacitor voltages of the same stack. The data used to train the network involves 9.6 seconds of 33 signals: the input signals of each SMs (24 signals), the current of each stack (6 signals) and the output voltage per phase (3 signals). In this case, no capacitor voltage signals were used as inputs (only as targets during training), as the objective is to completely remove voltage sensors in the SMs. The dataset was acquired running experiments that capture both, slow and fast dynamics at a sampling time of 25 × 10 −6 s. Adam optimizer was used with a learning rate of 1 × 10 −3 and as for the loss function, MSE was used. Details of the encoder-decoder network are given in Table 3.

2) RESULTS
Results obtained are presented in Figure 7 and Figure 8.  Figure 7, where estimation during a step change in the load at time 1.57 s is shown. There is no delay or appreciable mismatch in the estimates, even during changes in the load. Figure 8 shows a zoomed image where it can be appreciated how the encoder-decoder network also estimates the switching effects in the capacitor voltages. These switching signals have four times the carrier frequency (4 kHz) due to the PS-PWM. It should be noted that although this switching effect in the voltage signal can be harmful for the controller, as it was mentioned in the motivation of the DBDAE-based application, it is useful to validate the correct internal operation of the converter. The estimator is capable of reproducing the switching effect, which if not desired can be filtered. Table 4 summarizes the numerical errors between the estimated voltages and the real targets, in terms of the root mean square error (RMSE). It can be seen that all the estimated signals present a RMSE below 0.54 volts, which given the operating point around 100 volts, is negligible.
The proposed encoder-decoder network has several advantages. First, no pre-processed data is required to estimate the output, this becomes important when running at realtime, given the high frequency at which the MMC operates.  Second, the convolutional encoder is an advantage over a recurrent encoder for capturing the slow dynamics, avoiding using long sequence recursions in a RNN. Finally, although the proposed network has many parameters, this architecture allows parallel computation, facilitating its implementation.

V. CONCLUDING REMARKS
The use of neural networks is proposed to support the operation of modular multilevel converters (MMCs). Particularly, a dual-blind denoising autoencoder (DBDAE) is proposed to filter sensor measurements, and an encoder-decoder network is proposed to estimate the voltages of each sub-module. A three-phase MMC with 4 sub-modules per stack was used as case study and setup to obtain datasets of measurements corrupted by electromagnetic interference (EMI) during both steady state and transient operation.
Experimental results show that the DBDAE removed practically all the noise and power switching artifacts in the signals with a negligible lag. Regarding the estimator, experimental results show a high performance and accuracy under steady state and transient operation, estimating the wide spectrum of the capacitor voltage behavior, from the fundamental oscillation of 50 Hz to the switching ripple introduced by the commutations of power semiconductors. VOLUME 8, 2020 The proposed methods potentially increase the EMItolerance and reliability of the MMC, and allow the elimination of voltage sensors in each sub-module, thus reducing the overall cost and communication requirements.
As future work, the filter and estimator should be implemented in the FPGA of the controller of the MMC to test their performance when the MMC operates in closed-loop under aggressive dynamic operation and failures. He has been with the Department of Electronics Engineering, Universidad Tecnica Federico Santa Maria, since 1977, where he was full a Professor and President. Since 2015, he was the President and since 2019, he has been a Full Professor with Universidad Andres Bello, Santiago, Chile. He has coauthored two books, several book chapters, and more than 400 journal and conference articles. His main research interests include multilevel inverters, new converter topologies, control of power converters, and adjustable-speed drives. He has received a number of best paper awards from journals from the IEEE. He is a member of the Chilean Academy of Engineering.