Multi-Leak Deep-Learning Side-Channel Analysis

Deep Learning Side-Channel Attacks (DLSCAs) have become a realistic threat to implementations of cryptographic algorithms, such as Advanced Encryption Standard (AES). By utilizing deep-learning models to analyze side-channel measurements, the attacker is able to derive the secret key of the cryptographic algorithm. However, when traces have multiple leakage intervals for a specific attack point, the majority of existing works train neural networks on these traces directly, without a appropriate preprocess step for each leakage interval. This degenerates the quality of profiling traces due to the noise and non-primary components. In this paper, we first divide the multi-leaky traces into leakage intervals and train models on different intervals separately. Afterwards, we concatenate these neural networks to build the final network, which is called multi-input model. We test the proposed multi-input model on traces captured from STM32F3 microcontroller implementations of AES-128 and show a 2-fold improvement over the previous single-input attacks.


I. INTRODUCTION
Side Channel Attacks (SCAs) [1] were proposed 20 years ago, and have become a realistic concern recently with the help of deep-learning techniques. By analysing the unintentional physical leakage during the execution of the cryptographic algorithms, SCAs are able to break ciphers that are assumed to be mathematically secure. Once the secret key is extracted, the ciphertext can be decrypted and the signature can be forged, which is particularly threatening. Since Kocher introduced the first attack which is based on time consumption traces in 1996 [1], many other types of leakages have been used. For example, leakages via acoustic channels [2], power consumption [3], electromagnetic (EM) emissions [4], [5], and photon emissions [6] are now widely studied.
In most cases, a well-trained deep-learning [7] classifier is able to use fewer side-channel measurements (traces) to recover the secret key from an implementation of AES than the traditional signal processing approaches. Since deep learning models are good at extracting features from raw data, they can help attackers to find correlations between physical measurements and the internal state of the processed algorithm. Deep-learning techniques start helping power analysis in 2013 [8], in which a three-layer MLP The associate editor coordinating the review of this manuscript and approving it for publication was Vivek Kumar Sehgal . network is trained to break a Smart Card implementation of AES-128 which contains an 8-bit microcontroller PIC16F84 [9]. Subsequently, many softwares [5], [10]- [12] and hardwares [13]- [16] implementations of AES have been broken by DLSCAs. In [17], Cagli et al. evaluated the CNN network's performance in datasets with jitter based countermeasure. In [10], Wang et al. studied the impact of how diversity of target chips affects side-channel attacks. In [18], the influence of the depth of the neural networks on DLSCAs were studied by visualising the heatmap. These papers provide a strong evidence for the effectiveness of deep learning techniques in the context of side channel attacks.
In most existing DLSCAs, neural networks are trained by using the value at the attack point. Once the value at attack point is recovered, the key can be derived. At the profiling stage of DLSCAs, models are trained to learn a leakage profile between side channel traces and the value at the attack point (an attack point is an intermediate value which can be used to describe the power consumed by the victim device during the execution of a cryptographic algorithm). In software implementations of AES, the attack point is usually set to the output of the SubBytes operation of the first and last round of AES, in which a lookup table called SBox is used. Most existing deep learning side channel attacks train models on traces which contain all leakage points directly or mainly focus on the main leakage point. They ignored the fact that leakage points appear in small trace segments grouply in some cases. A dedicated model which makes use of all these leakage intervals may potentially further increase the attack efficiency. Therefore, we propose a multi-input model to explore the benefit of using multiple leakage intervals collaboratively.

A. OUR CONTRIBUTIONS
In this paper, our main contributions are summarized below: • We find that in software implementations of AES-128, a chosen attack point can lead to multiple leakage intervals in the traces.
• We propose a multi-input deep-learning model in which multiple leakage intervals could be used collaboratively to perform the attack. For the proposed model, we investigated the effect of different fusion techniques on the multi-input model in terms of classification accuracy.
• We experimentally show that the proposed multi-input model is capable of outperforming the conventional single-input approach. The results on traces captured from STM32F3 microcontroller implementations of AES-128 show a 2-fold improvement over the previous attack. Three datasets are used for validation.

B. PAPER ORGANIZATION
The rest of the paper is organised as follows. Section II discusses how a single attack point causes multiple leakage intervals in traces for software implementations of AES and introduces the multi-input model. Section III describes three datasets used in our experiment and shows the results. Section V concludes this paper.

II. MULTI-LEAKAGE AND MULTI-INPUT MODEL
In this section, we first explains why a specific attack point can have multiple leakage intervals and uses different attacks as examples. Afterwards, we show the network structure of the proposed multi-input model.

A. LEAKAGE ANALYSIS
Power consumption of software implementations of AES is mainly derived from the bit transitions in the CMOS cells. Thus data processed in the device dominate its power dissipation. As previous researchs of SCAs [19], [20], attackers have commonly chosen the output of the SBox as the attack point on software implementations of AES, based on the fact that the non-linear output of the SBox has a higher level of confusion. However, in AES not only the output of the SBox that can be used as an attack point, other phases of key-related intermediate values can also be used. Fig.1 shows the flow of AES-128 algorithm and some potential attack points which can be used for the key recovery. AES-128 requires a total of 10 rounds of encryption, each round consists of four basic steps, which are SubBytes, ShiftRows, MixColumns and AddRoundKey. The last round encryption doesn't have MixColumns procedure. 1, F_leak represents the intermediate value which is related to the plaintext and the initial key. L_leak denotes the point which is related to the ciphertext and 10th round key.
To locate the leakage intervals for a specific attack point in traces, we utilize Correlation Power Analysis (CPA). In general, CPA calculates the Pearson Correlation Coefficient [21] between real traces and modeled power consumption. Afterwards, the attacker find the key value which correlates best to the measured traces. Currently, there are three commonly used power models: Identity (ID), Hamming weight (HW) and Hamming distance (HD). For example, the ID model assumes the power consumption is proportional to the value at the attack point.
In Fig.1, we denote seven potential attack points for breaking software implementations of AES. F_leak 1 represents the AddRoundKey's output before the first round of AES-128.
Next, we introduce the leakage function. A leakage function is used to obtain the value related to an attack point based on a specific power model to describe the leakage. In our case, the power model is set to ID model. When F_leak 1 is used as the attack point, the leakage function V F_leak 1 is denoted as: where Pt represents the plaintext and Key 0 represents the original key. We use Key i(i ∈ [1,10]) to denote the ith round key which is derived from Key 0 by using a Key Expansion algorithm [22]. In Fig.1, F_leak 2 represents the output of the SubBytes in the first round of AES (the output of the SBox), and the leakage function V F_leak 2 for F_leak 2 is expressed as: F_leak 3 represents the output of the ShiftRows in the first round of AES. The ShiftRows is a cyclic shift operation performed on different rows. Keeping the first line unchanged; the second line shifts one byte to the left; the third line shifts two bytes to the left; the fourth line shifts three bytes to the left. Thus the leakage function V F_leak 3 for F_leak 3 as a point of attack is expressed as: where i denotes the ith byte, and i + n = i + n − 16 (n = 0, 4, 8, 12) when i + n > 16.
In Equation (4), the first corner labels of α and β represent the rows of the 4 × 4 matrix and the second corner labels (r ∈ 1, 2, 3, 4) represent the columns of the matrix (α's left multiplication matrix is a fixed matrix of MixColumns). Therefore, when 5th, 6th, 7th and 8th subkeys are used as target subkeys, the leakage function V F_leak 4 is expressed as: In Equation (5), a = V F_leak 3 and i represents the ith byte. '×' denotes a multiplication operation in a finite field and '⊕' denotes the XOR operation.
In the last round of AES-128, the leakage function V L_leak is calculated in a similar way to the leakage function V F_leak in the first round. The leakage function for the last round V L_leak is represented by Equation (6)(7)(8)(9)(10).
In Equation (6-10), Ct is the ciphertext. Key 10 and 9th represents the 10th and 9th round key, respectively. SBox −1 denotes the inverse of SBox. i denotes the ith byte. In Equation (10), γ is used instead of V L_leak 4 . 9, B, D, E are the values in the reverse column obfuscation in AES-128.

B. LABEL FOR MULTI-INPUT MODEL
The SCAs are usually divided analysis into non-profiled and profiled analysis, with the profiled analysis divided into two stages. The first stage is called profiling, in which a deep-learning model is trained to learn a leakage profile between traces and the secret. Afterwords, the second stage is called attack stage, in which the attacker uses the trained model to classify traces from victim device. To obtain a welltrained model, profiling traces are required to be labeled properly. As we mentioned before, there are three commonly used power models: HW, HD and ID. HW and HD models are reasonable estimations but suffer from the issue of class imbalance in practice owing to Bernoulli Distribution [23]. The value model (identity (ID) model) assumes the power consumed by the device is proportional to the data processed at the attack point. We use ID model in our experiment to distinguish different power traces.
There is a correspondence between the attack point and the leakage function (the leakage function are defined as label in deep learning), and we can build a network model for recovering the key according to each attack point. However, in the process of this attack, the connection between different attack points in the AES algorithm and the feature that each attack point is correlated with the same key are not taken into account. In order to combine information from multiple attack points, we propose a multi-input model. Because each attack point corresponds to a different leakage function (e.g. the leakage function when the output of the AddRound-Key is used as the attack point is different from the leakage function of the SBox's output), and because there is only one output in the multi-input model, the leakage function V F_leak of multiple attack points need to be unified. The following describes the way to unify the leak functions of different attack points and the reason why multiple leakages can exist at one attack point.
The first round of AES is used as an example to investigate the relationship between leakage functions at different attack points. We use the leakage function of V F_leak 2 instead of the other leakage functions as label for the multi-input model. By replacing V F_leak 1 with V F_leak 2 , the one-to-one nonlinear transformation of SBox does not affect the classification of the network model (e.g. after replacing all the labels of cats with dogs and all the labels of dogs with pigs in image classification, results show that the accuracy of the network model training does not change). The ShiftRows's leakage function simply shifts V F_leak 2 without changing it, so V F_leak 2 can be used instead of V F_leak 3 . The leakage function V F_leak 4 for one subkey in MixColumns is obtained from four different subkeys of the V F_leak 2 by the XOR operation, and V F_leak 2 is used as part of the MixColumns leakage function, so V F_leak 2 can be used instead of V F_leak 4 (e.g. when the AddRoundKey is used as a specific attack point, the Key can be used as the model's label). The leakage function for multiple attack points is unified as V F_leak 2 . This is the first step in building a multi-input model, which is described below.

C. CONSTRUCTION OF MULTI-INPUT MODEL
The basic architecture used in this work is a CNN with multiple input layers, as shown in Fig.3. The multiple inputs are merged and connected to a Convolutional layer consisting of 32 neurons. Then the extracted features are expanded by a Flatten layer after passing through a three-strides Max-Pooling layer. Afterwards, two Dense layers are connected to the Flatten layer and each dense layer contains 128 neurons.
The Output layer is also a dense layer but with 256 neurons for prediction and the activation function is set to Softmax. The Convolutional layer and Dense layers are activated by function with Rectified Linear Units (ReLU). The single input model does not contain a Merge layer and consists of a single input layer connected to a Convolutional layer. The rest of the model structure is the same as the multi-input model structure, as shown in Fig.2.
Merge layer is an important aspect when building multiinput models. It can be used to combine two different neural networks which are trained for the same task but on different datasets. Two fusion techniques are commonly used in existing works, one is called early fusion and another one is late fusion [24], [25]. Early fusion merges layers of different neural networks at an early stage while late fusion merges layer lately. Early fusion is the combination of multiple inputs which are then connected to the first layer of the DNN. In the late fusion architecture, features are first extracted from the input data of individual channels. The specific information of the channels is eventually merged and processed in further network model layers responsible for the classification based on the extracted features.
We find that the late-fusion model is less accurate than the early-fusion model, as shown in V. Therefore, in this paper we only conduct experiments for the early fusion network model.
There are different types of methods to merge layers within DNN architectures, which are listed below: • Add: returns the element-wise sum of two inputs • Subtract: returns element-wise subtracts two inputs • Multiply: returns element-wise multiplication of inputs The following steps are required to complete the application of the multiple input model to the multiple leakage intervals. In the first step, a suitable attack point is selected and the intermediate value function for that attack point is derived by an energy model. In the second step, the index of the leaky intervals on the traces for that intermediate value function is found by finding the leaky interval. In the third step, the leakage intervals are sliced and a traditional DL model is trained for each leakage interval. In the fourth step, the two leakage intervals corresponding to the models with the strongest classification accuracy on the testing sets are selected and used to explore which fusion method is optimal for improving the classification accuracy of the multi-input models. In the fifth step, all the leaked intervals are used for training the multi-input network model using this fusion method to obtain the optimal multi-input network model.

III. EXPERIMENTAL SETUP
In this section, we first introduce the datasets and the evaluation metrics we used for the experiments. Afterwards, we test the proposed multi-input model on traces captured from a CW308T-STM32F3 board. Next, we further validate the performance of our model on the STM32 implementation of the 32bit AES-128 dataset and the AES_GPU [26] public dataset.

A. DATASETS
In our experiments, we use three datasets in total. Power traces in the first dataset are captured from a CW308T-STM32F3 board implementation of TinyAES-128. The board contains an Arm Cortex M4 microcontroller. The mode of operation is set to Electronic CodeBook (ECB) mode. The training set involves 50K traces representing the first round of AES-128 and 50K traces for the last round. The testing set contains 10K traces with random plaintexts and fixed keys for both the first and last rounds respectively. Each power trace contains 4, 000 sampling points as shown in Fig.4(a), (b). The second dataset was captured with the same equipment as the first dataset, implementing AES-128 for 32bit parallel processing. The training set involves 50K traces and the testing set involves 10K traces, generated from random plaintext and fixed keys. Each power trace contains 1, 000 sampling points as shown in Fig.4(c).
The third dataset is an NVIDIA GeForce GT620 graphics card (GPU) connected to the host with a PCIe bus. The AES parallel implementation (32 threads in a warp) and trace acquisition details are stated in [26]. There are 34, 511 traces for profiling and 5, 000 traces for the attack. We call this homemade dataset AES_GPU in brief. Each power trace contains 15, 001 sampling points as shown in Fig.4(d).

B. EVALUATION METRICS
The first metric used in our experiments to evaluate how the trained model performs on the testing set is the classification accuracy or sometimes called attack accuracy. The attack accuracy is defined as the fraction of correct predictions when using the trained model to classify traces from the testing set. The formula of the attack accuracy is shown below: In Equation (11), X attack denotes the testing dataset. X correct is the set of power traces when the guessed keys are all equal to the correct key.
However, when traces are noisy [27], it might be difficult for the model to predict the key with a single traces. In that case, partial guessing entropy (PGE) becomes a more suitable evaluation criterion. PGE indicates the mean rank of the real subkey sorted by the predicted probabilities of all possible subkeys. During the attack stage, we use the trained model to classify traces from the testing set and obtain the probabilities of different keys for each trace. For trace x i ∈ X attack , the obtained probability matrix is denoted as P i = [p i,1 , p i,2 , . . . , p i,255 ], where p i,j in P i is the predicted probability of k = j for trace x i . Where P i is the correct Key Rank, which is usually used as an evaluation criterion for datasets with better signal-to-noise ratios, as the number of traces used to recover the correct key for datasets with higher signal-to-noise ratios is usually in the single digits, and using the Key Rank provides a more intuitive evaluation of the results. The lower the number of traces in the Key Rank, the better the model. Afterwards, we apply an element-wise multiplication for all P i to obtain a cumulative probability:  where m is the number of traces we used for classification. Then, PGE can be represented as the averaged rank of real key k * sorted by P.

IV. EXPERIMENTAL RESULT A. TinyAES-128 IMPLEMENTATION ON A STM32F3
In the first experiment, the target is a STM32F3 implementation of TinyAES-128. During network training, we used the Adam optimizer with a learning rate of 0.0005. The mini-batch size is 256 and the maximum iteration epoch is 500.

1) FIRST ROUND OF AES
We use the ρ-test [28] as the leakage detection method to find the Point of Interest (POI) of each subkey for the attack point. The POI of the first round of AES is shown in Fig.5.
In our experiments, we randomly choose the 5th subkey as an example for illustration and others will be the same. Fig.5 shows intervals divided by red lines as A, B, C, D and E, representing the AddRoundKey operation before the first round of AES, and the SubBytes, ShiftRows, MixColumns and AddRoundKey operation of the first round. From Fig.5, we can see that interval D still leaks information about the attack point. However, when it comes to interval E, traces does not contain any leakage. Since traces in interval E represent AddRoundKey operation of the first round encryption of AES-128 and the attack point related input for this procedure is the output of MixColumns operation, this verifies the statement that MixColumns procedure is side-channel resistant [22].
Next, we first train 4 conventional single-input Convolutional Neural Network (CNN) models (the model structure is shown in Fig.2) on traces with different POI intervals separately. The ρ-test of the 5th subkey is illustrated by Fig.5, divided by the red line dividing 4 leakage intervals. For each CNN model, we train it on traces with a specific leakage interval. Four leakage intervals for these 4 models are listed in Table.1. Afterwards, we use these four models to classify testing traces separately and the classification results are also shown in Table.1.
From Table.1, we can find that all the leakage intervals can contribute to build a leakage profile between traces and the selected attack point. We consider the model to be effective for the task when the classification accuracy on the testing sets is higher than 1/256 ≈ 0.39%.
Afterwards, we train and test the proposed multi-input CNN models for different combination of leakage intervals. In Table.1, we can find that leakage interval B achieves the best classification result. So our 1-input models are trained on traces with leakage interval B. For the 2-input model, we use interval A and B as the two inputs since interval A has the second best classification accuracy. By following this rule, our 3-input model is made by using interval A, B and C as the inputs. The 4-input model makes use of all listed leakage intervals.
Since there are multiple fusion methods and not all of them have an improvement in model classification accuracy, we used the leakage interval A and B of the 5th subkey to explore which fusion method is more effective in improving the classification accuracy of the multi-input network model. Fig.6 shows the classification accuracy of the 1-input model and the 2-input model (model structure shown in Fig.3), on the validation set. Table.2 shows the classification accuracy of the 2-input model on the test set and the Key Rank of the different fusion methods. As can be seen from Table.2, Concatenate Layer as the optimal fusion method, in the later experiments we focus on using Concatenate Layer to merge neural networks trained on different leakage intervals.
For the Concatenate Layer, there are two common settings for fusing the inputs: axis = 1 for row-wise concatenation and axis = 2 for column-wise concatenation. Note that axis = 0 is the batch axis.
Afterwards, we test the trained 1-input and multi-input models on trace in the testing set. Table.4 shows the classification accuracies of these models to recover 5th, 6th, 7th and 8th subkeys of AES.
From Table.4, we can find that the multi-input models can always achieve a higher classification accuracies than the single-input models. The last row of the 1-input in Table.4 shows the classification accuracy on the testing sets for models trained using the full leakage intervals containing the target subkeys. This indicates that by utilizing multiple leakage intervals of traces, it is capable of further improving the attack efficiency of deep-learning models in side-channel attacks' context. However, we can also see that leakage intervals A and B, which represent AddRoundKey and SubBytes operations separately, contribute the most to the multi-input model. For leakage interval C and D, a well-trained model can also use them to a fine-tune the classification accuracy.  For the setting of the Concatenate Layer, it seems better to use the column-wise approach (axis = 2), as we can see from Table.4. Next, we show experiments on the last round of AES for the proposed multi-input models.

2) LAST ROUND OF AES
The POI for the last round of AES for the STM32F3 implementation of TinyAES-128 is shown in Fig.7. In this section, the experiments are as the same as in the first-round experiments. So we keep using 5th, 6th, 7th and 8th subkeys as the targets to show the classification results.
In this experiment, we also use the ρ-test as the leakage detection approach. We plot the ρ-test results of all subkeys for the last round of TinyAES-128 in Fig.7. The attack point for the ρ-test of the traces is set to the SBox input of the last round of AES. In Fig.7, we can divide traces to leakage intervals as shown in Fig.7. Notice that compared to the ρ-test results of the first-round traces, the last-round traces contain two more leakage intervals which are denoted as interval F and G, for the 9th round encryption of AES. This indicates that these two operations in the 9th round of AES also contain information related to the attack point. For traces representing the last round of AES, there are only three leakage intervals: H, I, J. This is because that the last round does note MixColumns.
Next, we train 5 conventional single-input CNN models (the model structure in shown in Fig.2) on the last-round traces with different leakage intervals separately. Afterwards, TABLE 2. Classification accuracy of the 2-input (A&B) model on the testing sets and the number of traces with key rank < 5 on the testing sets (a total of 10K traces were used as the testing set).

TABLE 3.
Leakage intervals for the 5th,6th,7th and 8th subkeys for the last round traces of a STM32F3 implementation of TintAES-128.

TABLE 4.
Results of classification accuracies on testing sets for single-input and multi-input models trained using 5th, 6th, 7th and 8th subkeys for the last round traces of a STM32F3 implementation of TintAES-128.

TABLE 5.
Leakage intervals for the 5th,6th,7th and 8th subkeys for the last round traces of a STM32F3 implementation of TintAES-128.

TABLE 6.
Results of classification accuracies on testing sets for single-input and multi-input models trained using 5th, 6th, 7th and 8th subkeys for the last round traces of a STM32F3 implementation of TintAES-128.
we use these 5 models to classify testing traces separately and the classification results are shown in Table.6.
Because the model trained on leakage interval F cannot achieve a classification accuracy larger than 0.39% on the testing set, this interval will not be involved in the training of the multi-input models. The conclusion that the multi-input models a more efficient attack in the presence of multiple leaks in traces is verified in the last round of AES.
From Table.4 and Table.6, it is easy to draw the following conclusions: For models trained using a single leaky interval with a classification accuracy below 0.39%, the addition of that leaky interval to the multi-input model does not improve the classification accuracy of the multi-input model; For the setting of the Concatenate layer, it seems better to use the column-wise approach (axis = 2); The classification accuracy using the multi-input model in the last round of AES-128 is higher than using the multi-input model in the first round of AES-128 because the leakage intervals exist in the 9th and last round of AES-128 when the input to the SBox in the last round of AES-128 is the attack point.
From Table.4 and Table.6, it can be concluded that the classification accuracies of the 4-input model is just lower than that of the 2-input model when using the concatenation axis = 1 fusion method, and according to the experiments we have done, the increase of the network training parameters is not very effective in improving the classification accuracies. It can be noted that the classification accuracies of the 4-input model in Tables 4 and 6 of the paper decreases when using the concatenation axis = 1 fusion method compared to the 3-input model, and does not decrease when using the concatenation axis = 2 fusion method for the 4-input model. The conclusion is that when using the concatenation axis = 1 fusion method, multiple leaked intervals are connected to form a longer trace, but each leaked interval contributes differently to the recovery key (i.e. the classification accuracy of the model trained using that leaked interval), and this connection leads to a decrease in classification accuracies when some of the leaked intervals contribute too little. For the concatenation axis = 2 fusion method, each leaked interval is concatenated on a channel, and each leaked interval belongs to a different dimension, which does not degrade the classification accuracies when passed into the neural network for training, and as long as the leaked interval contributes to the recovery key, then the classification accuracy is reduced using The final classification accuracies of the multi-input model is improved by adding the leaked interval using the concatenation axis = 2 fusion method. The detailed parameters of the models are shown in V.
Next, we show the results of implementing 32bit AES-128 in STM32 for the proposed multi-input model.

B. 32bit AES-128 IMPLEMENTATION ON A STM32F3 (AES_32bit)
We aim at the leakage operation of the last round 5th byte register writing: v 5 = SBox −1 [c 5 ⊕ k * ], where c 5 is the 5th ciphertext byte. We call this homemade dataset AES_32bit in brief. During network training, we used the Adam optimizer with a learning rate of 0.0005. The mini-batch size is 256 and the maximum iteration epoch is 500.
We use the ρ-test as a leakage detection method to find the leakage intervals corresponding to v 5 in the traces, as shown in Fig.8. In Fig.8, the leakage intervals are divided into A, B and C using the green lines. The leakage interval A is indexed as [320: 390], the leakage interval B is indexed as  [470: 540] and the leakage interval C is indexed as [760: 830]. Since it has been shown in IV-A that the Concatenate layer as a Merge layer can most effectively improve the classification accuracy of the multi-input model, in the next experiments, only the single-input model with the highest classification accuracy and the multi-input model using the Concatenate layer are shown. Fig.9, shows the classification accuracy on the validation set for the single-input model and the multi-input model during training. Fig.10 shows the Key Rank comparison between the single-input, multi-input deep learning (DL) model and the single-input, multi-input template attack (TA) on 10K test set traces.
Because of the STM32 implementation of 32bit AES-128, SubBytes and ShiftRows operations will not leak on the traces, the multi-input model only improves the classification accuracy by 19.65% over the single-input model on the test set in the AES_32bit dataset.
Finally, the results for the single-input, multi-input DL model and the single-input, multi-input TA are shown in Table.7. It can be noted that the classification accuracy of the multi-input model on the validation set is 1.75% higher than that of the single-input model, due to the fact that a model combining multiple inputs can learn more information related to the intermediate values in the data. Although multiple inputs are introduced into the template attack, they can effectively increase the efficiency of the template attack. However, in Table.7, the multi-input deep learning model has 1399 more traces with Key Rank < 5 on the testing set than the multi-input template attack. This indicates that the multi-input deep learning model is currently a more reasonable solution to the problem of multiple leakage intervals.
Next, we show experiments on the AES_GPU datasets for the proposed multi-input models.

C. AES_GPU
We aim at the leakage operation of the last round 16th byte register writing: v 16 16 ⊕k * ] where c 16 is the 16th        Comparison of classification accuracies between early-and late-fusion models for the 2-input (leakage intervals A&B) models on the testing sets of the first round of the STM32 implementation of the AES-128 dataset.
We use the ρ-test as a leakage detection method to find the leakage intervals corresponding to v 16 in the traces, as shown VOLUME 10, 2022    in Fig.11. For the intermediate value v 16  only the single-input and multi-input models with the highest classification accuracy on the validation set. The classification accuracy of the single-input and multi-input models on the validation set is shown in Fig.12. To further investigate the effects of single and multiple inputs on DLSCA and TA, we investigated the number of traces on the AES_GPU dataset for single-input and multiple-input DL models and single-input and multiple-input TA on the testing set for Key Rank < 5 respectively, and the results are shown in Fig.13.
Finally, the number of traces required to recover the target subkey (PGE) on the AES_GPU dataset is compared between this work and other researches and the results are shown in Table.8. The multi-input model requires only 14 traces in the AES_GPU dataset to recover the target subkey. This is 31 fewer traces than the most current state-of-the-art result for this dataset (CDAE proposed by Yang et al.).

V. CONCLUSION AND FUTURE WORK
In this paper, we propose a multi-input deep-learning model for side-channel attacks, which is dedicated for the case where multiple leakage intervals exist in traces. By utilizing these leakages as separate inputs instead of using the entire trace for profiling, the trained model can focus more on these leakages. One well-known publicly available dataset and traces captured from a STM32F3 implementation of AES are used in our experiments. We show that the proposed multi-input model achieves a 2-fold improvement over the previous single-input attacks. Besides, we further compare different fusion layers for connecting leakage intervals. The result shows that concatenating leakage intervals in parallel outperforms other approaches.
Future work includes testing the proposed multi-input model on implementations of other cryptographic algorithms and mounting similar attacks on devices supporting AES with other countermeasures. Besides, we plan to further investigate the multi-leakage phenomena by training new models on other attack points. Certainly, the most important future work should be designing countermeasures to against deep-learning based side-channel attacks.

APPENDIX A COMPARISON OF THE RESULTS OF THE EARLY-AND LATE-FUSION MODELS
See Table 9 and Fig. 14.

APPENDIX B MODEL STRUCTURE AND MODEL PARAMETERS (STM32F3 IMPLEMENTATION OF AES-128 DATASET)
See Tables 10-13. JUNNIAN WANG received the bachelor's degree from the Department of Modern Physics, Lanzhou University, in 1991, the master's degree in radio physics from the School of Information Science and Engineering, Lanzhou University, in 2000, and the Ph.D. degree in control theory and control engineering from the School of Information Science and Engineering, Central South University, in 2006. He has undertaken four projects of the National Natural Science Foundation of China and more than ten other provincial and ministerial level research projects. He has published more than 50 scientific papers, including more than 20 SCI/EI papers. His research interests include deep learning, intelligent information processing, and fault diagnosis. He received the Second Prize of Hunan Provincial Science and Technology Progress Award. VOLUME 10, 2022