MagNet-AI: Neural Network as Datasheet for Magnetics Modeling and Material Recommendation

This article presents the MagNet-AI platform as an online platform to demonstrate the “neural network as datasheet” concept for <inline-formula><tex-math notation="LaTeX">$B$</tex-math></inline-formula>–<inline-formula><tex-math notation="LaTeX">$H$</tex-math></inline-formula> loop modeling and material recommendation of power magnetics across wide operation range. Instead of directly presenting the measured characteristics of magnetic core materials as time sequences, we employ a neural network to capture the <inline-formula><tex-math notation="LaTeX">$B$</tex-math></inline-formula>–<inline-formula><tex-math notation="LaTeX">$H$</tex-math></inline-formula> loop mapping relationships of magnetic materials under different excitation waveforms at different temperatures and dc bias. Long short-term memory and transformer-based neural network models are developed, verified, and compared. The neural network can be used to rapidly predict hysteresis loops and core losses under different operating conditions, compare materials, and recommend materials for design. The neural network model is also proved effective in reconstructing the raw measurement while accurately maintaining the magnetic characteristics, enabling rapid material evaluation and comparison.

Abstract-This article presents the MagNet-AI platform as an online platform to demonstrate the "neural network as datasheet" concept for B-H loop modeling and material recommendation of power magnetics across wide operation range.Instead of directly presenting the measured characteristics of magnetic core materials as time sequences, we employ a neural network to capture the B-H loop mapping relationships of magnetic materials under different excitation waveforms at different temperatures and dc bias.Long short-term memory and transformer-based neural network models are developed, verified, and compared.The neural network can be used to rapidly predict hysteresis loops and core losses under different operating conditions, compare materials, and recommend materials for design.The neural network model is also proved effective in reconstructing the raw measurement while accurately maintaining the magnetic characteristics, enabling rapid material evaluation and comparison.
Index Terms-Core loss, hysteresis loop, machine learning, neural network, power magnetics.

I. INTRODUCTION
P OWER electronics systems depend heavily on magnetic components.Due to the large component volume and the significant power loss, power magnetics are typically the bottleneck of the system optimization in terms of the power density and power conversion efficiency.Despite significant advances in power semiconductor devices and circuit topologies, the development of corresponding approaches for designing and modeling power magnetic components and materials is still lacking [3], [4], [5], [6], [7].Modeling magnetic materials, especially their hysteresis loops, poses a significant challenge due to the complicated excitation-response mechanisms inherent in these materials, the numerous factors involved (such as frequency, temperature, dc bias, and memory effects), and the lack of a fully satisfactory first-principle model.The ability to model and predict the hysteresis loop of power magnetics across a wide operation range has a profound impact on improving the power density and efficiency of power electronics systems.
Currently, the design of power magnetic components relies extensively on the classical datasheets provided by manufacturers, together with conventional modeling tools, such as analytical models or interpolated loss maps.Classical datasheets for magnetic materials are typically only valid for sinusoidal waveforms, and usually cannot provide sufficient information for all potential scenarios that designers may encounter with.The limited information of hysteresis loop and sparse core loss curves impede designers from making accurate predictions on the variations of magnetic permeability and core loss under different operating conditions, such as different amplitudes, frequencies, temperatures, and levels of dc bias.Conventional models for power magnetics, e.g., the Steinmetz equation, the iGSE [8], and the Jiles-Atherton model [9], are established based on empirical simplifications or physical approximations, which limit their modeling accuracy.The magnetic core material behavior is highly complex [10].The limited complexity of these models limits their capability of capturing sophisticated waveform, temperature, and dc-bias determined impact.
Recent advancements in data-driven methods, specifically machine learning techniques, such as neural networks, have proven to be highly effective in resolving complex nonlinear multivariable regression problems [11], [12], [13], [14].The primary advantage of utilizing neural networks is the ability to unify many intertwined influencing factors, such as temperature and dc bias, into a cohesive framework, which makes the neural network a good candidate for the data-driven modeling of power magnetics.
This article proposes the concept of "neural network as datasheet" for magnetic materials modeling, and demonstrate neural-network-aided material recommendation for rapid design.The effectiveness of this approach was validated by an open-source online research platform-MagNet-AI.The contributions of this article are as follows.
1) Provided a more comprehensive explanation on the design considerations of the neural network architecture.2) Introduced a systematic data-processing and dataaugmentation technique for neural network training.3) Compared different neural network architectures for model accuracy and the training cost.
Fig. 1.Concept of neural network as datasheet.A neural network (NN) is better equipped to store the shape of B-H loops for different operating conditions as compared to traditional datasheets or datasets of B-H loops that can only contain limited information, providing an effective and efficient guidance for the design of power magnetic components.
4) Demonstrated the effectiveness of the "neural network as datasheet" concept, including the hysteresis loop and the core loss prediction, the neural network-aided material recommendation, and the online MagNet-AI platform.Fig. 1 shows the concept of "neural network as datasheet."Traditional datasheets, as stated above, are typically deficient in terms of the information availability, which makes it difficult for designers to achieve an optimal design solely based on limited information.Some manufacturers also offer online dataset as a supplement of the datasheet, where massive measured data points are provided to users to search.However, due to the various influencing factors, the data size increases rapidly as the number of variables increases, typically in the range of tens or hundreds of GB.Meanwhile, it is still inconvenient to utilize these datasets for parameterization of the design models, which usually requires extensive or complicated data extraction and interpolation.The behavioral modeling methods proposed in [15] and [16] provide a feasible approach to overcome this challenge, in which analytical models are developed to map the excitation and operating conditions directly to the component characteristics.
Neural network-aided datasheet, on the other hand, exhibits advantages over the traditional datasheet or the dataset as datasheet.Enabled by the fully automated data acquisition system developed in [17] and [18], the targeted magnetic materials can be automatically characterized and measured, where human errors are minimized.Based on the measurement data, a neural network model will be trained, which encapsulates and compresses all the information of the time domain dataset in GB and results in a small model file with a size in kB or MB ranges, while maintaining a high accuracy.With the well-trained neural network, users only need to perform a quick inference of the model to predict the magnetics performance under specific operation conditions, or a series of inference to track the variations of hysteresis loop and core loss, providing an effective, efficient, and convenient reference for the magnetic components design.
Neural networks have been applied to modeling the core loss or the hysteresis loop of power magnetics [19], [20], [21], [22], [23], [24], [25], [26], [27].However, existing neural network models have mostly been constructed using outdated or simplistic network structures, such as feedforward neural networks (FNNs), which limits the accuracy of hysteresis loop or core loss prediction.There was no large-scale open-source database available before MagNet [17], limiting the size of the neural network that can be trained effectively.In this work, an encoder-projector-decoder architecture is proposed to develop a sequence-to-sequence model for the hysteresis loop prediction, which takes the flux density sequence B(t) as an input and output the field strength sequence H(t) (or vice versa), incorporating other inputs variables, such as the frequency f , the temperature T , and the dc bias H dc .More specifically, we implement the proposed architecture using two commonly used sequence-to-sequence neural network structures-the long short-term memory (LSTM) [28] and the transformer [29], two of the most successful sequence-to-sequence neural network architectures.After proper training, they can accurately and rapidly predict the B-H loop of power magnetics under a wide range of operating conditions while significantly reducing data size by storing trained parameters rather than raw time domain data.
The rest of this article is organized as follows.Section II provides an overview of the proposed encoder-projector-decoder architecture and its data flow.Sections III and IV describe the details of the LSTM-based and the transformer-based network implementations, respectively.Section V presents the data processing and augmentation techniques used during network training.Section VI evaluates and compares the testing results of each implementation.Sections VII, VIII, and IX demonstrate the applications of neural network-aided smart datasheet, including the hysteresis loop and core loss prediction, magnetic materials comparison, and the online platform.Finally, Section X concludes this article.

II. ENCODER-PROJECTOR-DECODER ARCHITECTURE
Hysteresis loops of power magnetics are determined by various influencing factors.Besides the different shapes of excitation waveform, other operating conditions, such as different frequencies, temperatures, and levels of dc bias will all result in different shapes of B-H loops.Fig. 2 compares multiple measured B-H loops for N87 ferrite material as an example, where the material characteristics differ significantly under different conditions.These various factors are quantified in [10], and in real-world applications, they often coexist and change concurrently, which renders the modeling of magnetic materials extremely difficult.To develop a neural network model that is capable of predicting the hysteresis loop under different operating conditions, an encoder-projector-decoder architecture is proposed in this article.
The structure and data flow of the proposed encoderprojector-decoder neural network architecture is shown in Fig. 3.The general concept of this architecture is to map a time series into another time series while incorporating other information about the operating conditions.In this work, the input sequence is B(t), and the output sequence is H(t), which define the basic shape of hysteresis loops.Scalar inputs, such as  frequency f , temperature T , and dc bias H dc , also significantly affect the B-H loops.Therefore, an additional projector is implemented between the encoder and the decoder to take these scalar inputs into consideration and accurately predict the B-H loop under different operating conditions.
The encoder receives the B(t) sequence as input and transforms it into a fixed-dimensional vector by capturing the sequential information and temporal correlations, including shape, pattern sequence, amplitude, and the relative change rate of the excitation waveform.The encoder outputs the hidden state vectors containing all the relevant information extracted from the input sequence and mapped into a hidden state domain.The hidden state vectors are then passed through the projector and adjusted according to the scalar inputs (frequency f , temperature T , and dc bias H dc ).The projector is necessary because the shape of the B-H loop is determined not only by the B(t) sequence, but also by many other factors.Finally, the modified hidden state vectors are processed by the decoder to predict the output sequence H(t).During model inference, the expected response sequence is produced in an autoregressive manner.At each time step, the prediction is generated not only based on both the current hidden state vectors, but also all the previously generated predictions.With the autoregression, the temporal information of the sequence is retained and reconstructed sequentially with hidden time causality.
The geometry of a specific magnetic component can significantly affect the component-level behaviors [30], [31], [32].The model proposed in this work is limited to material-level modeling.The architecture introduced in this article can be further extended to cover component-level geometry impact, which is beyond the scope of this article.
The encoder and decoder modules can be implemented using different neural network architectures, recurrent neural networks (RNNs) [33], attention-based networks (transformer), or convolutional neural networks (CNNs) [34], [35], [36], all of which have shown success in modeling sequences with complex temporal dependencies.Wavelet-CNN-based neural network has been applied to model core loss in [37].In this work, we specifically investigate and provide guidance on the use of both LSTM-based and transformer-based implementations for the encoder-projector-decoder neural network, which is designed to map a time series input to another time series output while incorporating information about external factors.

III. LSTM NEURAL NETWORK MODELS
LSTM is a specialized type of RNN that is wellsuited for capturing the temporal relationships within time series data [38].The effectiveness of LSTM networks in solving sequence-tosequence tasks has been demonstrated, with the LSTM encoderdecoder architecture being one of the most widely adopted Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.implementations [39].The LSTM-based encoder-decoder architecture has a well-established ecosystem in popular deep learning frameworks, such as PyTorch and TensorFlow.
In the LSTM-based encoder-projector-decoder architecture shown in Fig. 4, both the encoder and the decoder are implemented as LSTM neural networks.At a given time step t = t i , the input sequence of B(t) is inputted to the LSTM network and processed through the input gate, forget gate, and output gate on the encoder side.The temporal information is stored in the cell states and the hidden states, which are fed back through the recurrent connections for processing the next input at t = t i+1 .By unwrapping the recurrent connections across the timeline, it is equivalent to passing the entire input sequence through a series of LSTM networks.Mathematically, the operation of the LSTM cell at time t can be described as where x t is the sequential input at time t.Intermediate variables f t , i t , and o t represent the value of the input gate, forget gate, and output gate, respectively, and c t and h t refer to the cell states and the hidden states, which are the recurrent variables being fed back to the LSTM cell and thus providing the memorizing capability.The function σ(x) is the Sigmoid function that operates as the activation function to provide the nonlinear learning capability.As in an FNN, W and b are the weights and biases, respectively, and the subscript refers to the source and target variables that they are applied to.The operator stands for the Hadamard product, which performs an element-wise product for all the elements of two matrices.Then, the cell states and the hidden states at the last time step are concatenated with the additional inputs and fed into the projector, where these state vectors are modified using a FNN.The LSTM network in the decoder uses the modified states to predict the output H(t) at t = t 0 , which is then fed back as the Magnetic Field Strength H(t); 1: Initialize hidden states h 0 and cell states c 0 ; 2:

., y L };
input for the next prediction at t = t 1 .The prediction continues until the entire output sequence is generated.
More details of the data flow in the LSTM model are described by the pseudocodes in Algorithm 1. Models and example codes are available on MagNet GitHub repository. 1

IV. TRANSFORMER NEURAL NETWORK MODELS
Transformer with the attention mechanism is another very successful network architecture that excels at modeling sequence-to-sequence problems, such as large language models represented by ChatGPT.Unlike RNNs, the transformer eschews recurrent connections, but instead relies entirely on attention mechanisms to capture temporal dependencies between the input and output sequences.Modified from the original structure in [29], we implement an encoder-projector-decoder architecture, as shown in Fig. 5.
The data point at each time step in the input sequence B(t) is first passed through a shallow FNN and transformed to a d-dimension vector, which sets the representation dimension of the model.Given that the attention mechanism used in the transformer model is essentially the dot product of matrices, the time steps in the sequence are permutable.To ensure the model effectively captures temporal dependency, the input vector is combined with a positional encoding vector, providing information about the position of each time step in the sequence.The resulting vector is then fed into the self-attention module, which analyzes and captures the temporal dependency within the input sequence itself.Further processed by a FNN, a set of hidden vectors encapsulating the information of the input sequence is generated and passed to the projector.
Next, the hidden vectors obtained from the encoder are similarly concatenated with the additional inputs, such as frequency f , temperature T , and dc bias H dc , and the resulting vectors are passed through a FNN-based projector.The projector modifies the hidden vectors by considering the influence of these additional inputs.The modified hidden vectors are then passed to the decoder for reconstructing the output sequence.
Besides the hidden vectors, the input of the decoder consists of a reference sequence.During the network training, it is the target output sequence; during the network testing, it is the sequence predicted by the model itself (initialized with zero), shown as the dashed line in Fig. 5.The reference sequence is similarly mapped to a d-dimension vector with a shallow FNN, summed with a positional encoding vector, and fed into the self-attention module to generate another set of hidden vectors.Both sets of hidden vectors from the projector and the self-attention module are further processed with the input-output attention module.Finally, the resulting output vectors are processed by a FNN to generate the desired output sequence H(t).
More details of the data flow in the transformer model are described by the pseudocodes in Algorithm 2. Models and codes are available on MagNet GitHub repository as well.

V. DATA PROCESSING AND AUGMENTATION
The prediction accuracy of a neural network model is fundamentally determined by the quality of the training data.In this work, the training data is constructed based on the massive measured dataset in the MagNet database [17], [18].The database currently includes B-H loop measurements for ten different ferrite materials across a wide range of excitation and operation conditions, collected by an automated data acquisition system.All the measurements are captured in the periodic steady state operation.
The MagNet dataset comprises five data fields: the flux density waveform B(t), the field strength waveform H(t), the fundamental frequency f , the temperature T , and the dc bias H dc .The fundamental frequency is determined based on the measured voltage waveform using Welch's frequency domain method [40], while the remaining four data fields are obtained directly from the measurements.In the following sections of this article, we present the example results based on the dataset of N87 ferrite material, which contains 142 871 measured data points (B-H loops) covering the range of flux density amplitude in The flux density B(t) here only contains the ac part, which is extracted from the voltage measurement, while the H(t) contains both the ac and dc parts that are directly extracted from the current measurement.All the flux densities in the hysteresis loops shown below also only consider the ac part B ac .
To ensure a better convergence of the model, all five data fields {B(t), H(t), f , T , and H dc } are normalized before being fed into the neural network.This is accomplished by subtracting the average and rescaling with the standard deviation for each data field.Parameters for standardization are saved and reused during model testing and inference.
The two sequence inputs {B(t) and H(t)} require more complex processing in order to ensure that the network achieves good prediction accuracy and sufficient generalization capability.Adding reasonable noise to the data to avoid overfitting is the most commonly used preprocessing technique.In this study, both the B(t) and the H(t) waveforms are superposed with white noise of uniform distribution within the range of ±0.1 mT and ±0.05 A/m, respectively.Moreover, the sequence inputs are further processed and augmented in three ways to improve the network performance.

A. Single-Cycle Interpolation
In the aforementioned original MagNet database, the B(t) and H(t) waveforms are directly calculated from the raw measurements of voltage and current signals.These waveforms consist of multiple cycles that are captured in the periodic steady state.Each waveform is a 1×10 000 time sequence, with a sampling rate of f s = 125 MHz.Each of the waveform comprises multiple waveform cycles that are captured in the steady state.Training the network with these multicycle waveforms; however, can be problematic.On the one hand, the large number of data points significantly increases the computational cost of network training and inference.Considering the fact that the waveform is captured in periodic steady state, the repeating cycles do not provide much valuable information to the network.On the other hand, waveforms with different frequencies have different numbers of samples in each cycle, leading to different numbers of points on the B-H plane.Networks trained with these waveforms are prone to magnify noises in the low-frequency waveforms while overlooking sharp transitions in the high-frequency waveforms.
To address these issues, a single-cycle interpolation algorithm is applied to all the B(t) and H(t) waveforms.Given the sampling rate f s and the fundamental frequency f , the total number of cycles contained in each waveform can be calculated as N = 10 000 × (f/f s ).The 10 000-sample waveform is first interpolated into N × 128 samples using the spline algorithm.Then, the interpolated waveform can be evenly sliced into multiple sections, where each section contains a full cycle of the waveform with 128 sample points in total.Finally, all the sections are averaged into a single-cycle waveform.Fig. 7 shows the corresponding single-cycle waveform for each of the original waveforms shown in Fig. 6.By applying the single-cycle algorithm, the time stamp of the waveform is normalized by the period ΔT = 1/f into [0,1].The single-cycle waveforms of B(t) and H(t) well describe the shape of the hysteresis loop in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.periodic steady state, while maintaining a constant sequence length across all frequencies.The single-cycle interpolation is essentially removing the time stamp information from the original waveform, which is another reason why the fundamental frequency f is included as one of the network inputs.

B. Phase-Shifting Augmentation
Through the single-cycle interpolation, it is assumed that the original waveform can be well reconstructed based on the single-cycle waveform since all the waveforms are captured during the periodic steady state.Furthermore, we hypothesize that the predicted magnetic behaviors, such as the B-H loop and core loss, remain the same regardless of where the waveform is sliced into sections since they all reconstruct to the same original waveform.This hypothesis also indicates that the predicted results should not be influenced by the starting phase of the single-cycle waveform.
To prevent the neural network from misunderstanding the phase information, phase-shifting data augmentation is applied to the single-cycle waveforms.Each pair of B(t) and H(t) waveforms is circularly shifted with a random phase at the same time, which changes the starting phase of the waveform while preserving the original phase difference between the B(t) and H(t) waveforms.Fig. 8 shows a set of example B(t) waveforms of N87 ferrite before and after the phase shifting augmentation.
Phase-shifting augmentation is also applied to balance the dataset distribution.Given the fact that different shapes of waveforms have different degrees of freedom (e.g., amplitude, frequency, and duty ratio), the amount of data for each shape of waveform provided in the original MagNet database significantly differs from each other.For example, the N87 ferrite material dataset contains 142 871 pairs of B(t) and H(t) waveforms measured under different frequency f , temperature T , and dc bias H dc conditions.Among them, the sinusoidal wave, triangular wave, and trapezoidal wave contribute 3,495 (2.45%), 46 973 (32.87%), and 92 403 (64.68%) pairs, respectively.The sinusoidal wave, in particular, has much fewer samples compared with the other two waveform shapes.As a result, training with this unbalanced dataset leads to biased accuracy for sinusoidal excitations.Through phase-shifting augmentation, one can assign multiple phase values for the sinusoidal waves to augment the data while keeping the augmented waveforms distinguishable from each other.

C. Multicycle Augmentation
In addition to the initial phase, the frequency is a crucial factor affecting the division of the full-length waveform into single-cycle sections.In Section V-A, the waveform sequence is sliced based on the fundamental frequency such that each section contains a complete cycle of the waveform.Alternatively, the sequence can also be divided into sections based on 1/N of the fundamental frequency, resulting in each section containing N cycles of the waveform.Theoretically, for any integer number of N , the sliced sections are always able to reconstruct the same full-length sequence, and thus, the same B-H loop, except that the resolution within each cycle is reduced due to the fixed-length interpolation.
Based on this hypothesis, the dataset is further augmented by incorporating multicycle waveforms.Fig. 9 shows an example of the two-cycle data augmentation, where the augmented sequence contains two cycles of the waveform and the frequency is halved, correspondingly.With the support of multicycle augmentation, the neural network model is expected to predict approximately equivalent B-H loops and core losses, regardless of whether single-cycle or multicycle input sequences are provided.This augmentation further enhances the model's generalization capability for certain types of waveforms that are not covered by the training dataset.

VI. TRAINING AND TESTING RESULTS
The LSTM-based and transformer-based models are synthesized using the PyTorch framework.Hyperparameters of the network are determined and optimized based on experimental training results.In the LSTM-based model, both the encoder and the decoder are implemented with a one-layer 32-D LSTM network.In the transformer-based model, the model dimension is set to 24 and the number of attention heads is set to four.In both models, the projector is implemented as a three-layer FNN, with 40 hidden neurons in each layer.These hyperparameters The proposed neural network model is trained for 5,000 epochs on the standard Google Colab Pro GPU devices with the MagNet dataset.After applying the data augmentation, the size of the N87 ferrite dataset is expanded to 269 940.These data points are further randomly split into 70%, 20%, and 10% for the training, validation, and test sets.During training, the mean-squared error between the predicted sequence H pred (t) and the measured sequence H meas (t) is selected as the loss function for backpropagation.The test dataset, which is never used for training the model, is used to evaluate the model performance.The optimizer used in the model training is configured as Adam optimizer.An exponentially decayed learning rate strategy is implemented to yield a better model convergence, where the initial learning rate is 0.004 and the decaying rate is 90% per 150 epochs.The typical elapsed time for the network training is approximately 20 h for each material (using Google Colab Pro), which can be further accelerated by adopting parallel computing.

A. Hysteresis B-H Loop Prediction
We evaluate the performance of two trained models on the test set to validate their ability to predict the B-H hysteresis loop.Fig. 10 shows a series of particular prediction results generated by the transformer-based model for an example testing point (trapezoidal, 140 kHz, 90 • C, 30 A/m dc bias) at different stages of the training.As the training proceeds, the model gradually converges and the discrepancy between the predicted and the measured hysteresis loops is minimized, eventually achieving a good match.
To quantitatively evaluate the prediction accuracy of the models, the relative error between the predicted sequence H pred (t) and the measured sequence H meas (t), as is defined below, is used 52% and the transformer model 2.99%, while the 95th percentiles are 10.92% and 6.48%, respectively, as listed in Table I.The test set covers data points with all three types of waveform shapes and across the same ranges of frequency, temperature, and dc bias as the training set.These statistics on the prediction results validate that both the proposed models are capable of making accurate predictions for the hysteresis loops under various operating conditions.Under the given hyperparameter settings, the transformer-based model outperforms the LSTM-based model, demonstrating lower overall relative error in terms of the hysteresis loop prediction.

B. Core Loss Prediction
We assess the performance of two trained models to validate their ability to predict the core loss.Based on the predicted B-H loop, one can directly calculate the predicted core loss P V based on the following integral.Then, the relative error between the predicted core loss P V,pred and the measured core loss P V,meas can be calculated, which is used as another figure of metric for evaluating the model performance 0) Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.It is noticed that the overall relative error of core loss prediction is higher than that of hysteresis loop prediction.According to the selected loss function during the training, the model is optimized to minimize the shape discrepancy between the predicted sequence H pred (t) and the measured sequence H meas (t), while any information about the core loss is not directly available to the network for reference.However, core loss calculation is highly sensitive to phase mismatch between H(t) and B(t), and an approximately matched sequence does not necessarily result in a good match for core loss.To further improve the accuracy of core loss prediction while maintaining high accuracy for hysteresis loop prediction, one can introduce the core loss information or the phase information into the loss function of training, which however may inevitably increase the computational cost.

C. Comparison and Discussion
Table I presents a comparative analysis between the LSTM and transformer implementations in terms of their hysteresis loop prediction accuracy, model size, and approximate computational cost.Both models are trained and tested with the same training set and test set for the same number of epochs.
With the given hyperparameter settings, the overall prediction accuracy of the transformer-based model is better than that of the LSTM-based model.Fig. 13 shows a few examples of the predicted B-H loops generated by each model against the measured ones under various frequency, temperature, and dc bias conditions, with multiple waveform shapes.Both models accurately predict the shape and location of the majority part of B-H loops, while the sharp corners are better captured by the transformer-based model, benefiting from the attention mechanism.The LSTM-based model requires less elapsed time for the model inference.Table II provides a theoretical comparison of the computational complexity of both models [29], where n represents the sequence length and d is the dimension of the model.In our testing cases, both models have n = 128, while the LSTM model and transformer model have d = 32 and 24, respectively.Therefore, despite that the transformer-based model reduces the sequential operation and maximum path length by avoiding recurrent operations, it suffers a much higher complexity per layer, resulting in a longer training and execution time.
Considering the tradeoff between the prediction accuracy and the execution time, the transformer-based model is selected to establish the neural network-aided smart datasheet.All the results in the following sections are generated by the transformer-based model.

VII. MAGNET-AI: A NN-AIDED SMART DATASHEET
Evaluations results validate the neural network model's effectiveness of predicting the hysteresis loop and core loss under various operating conditions and excitation waveforms.To establish a neural network-aided smart datasheet, the neural network model is packaged into a function for rapid inference, where the inputs are the waveform of flux density B(t), the frequency f , the temperature T , and the dc bias field strength H dc while the output is the waveform of the field strength H(t).The flowchart of the neural network-aided smart datasheet is shown in Fig. 14.Fig. 14.Flowchart of the neural network-aided smart datasheet.Users can specify the excitation waveform and the operating conditions through the user interface as the inputs of the neural network model.The model inference is executed to predict the response waveform.After post-processing, the prediction results, e.g., hysteresis loop, core loss, and permeability, will be visualized and provided to users.
Here are several prediction examples to demonstrate different ways of using the NN-aided smart datasheet.In each example, a manually generated dataset is fed into the neural network model as the inputs, where the excitation waveforms are in ideal shapes and the operating conditions are swept.Note the waveforms in the manually generated datasets are pure waves without any nonideal effects, such as switching transitions, which naturally leads to a slightly different prediction results compared with the measurements, despite the close resemblance between the two.
1) Example-1: Predicting the hysteresis loop at different flux density amplitudes.In this example, the excitation waveforms are a set of 50% duty ratio pure triangular waves, where the amplitude is swept from 30 to 240 mT.The frequency, the temperature, and the dc bias are fixed at 100 kHz, 25 • C, and 0 A/m, respectively.Fig. 15 shows the predicted B-H loops with this manually generated dataset as model inputs.It is observed that the impact of the flux density amplitude on the hysteresis loop is well captured and predicted by the neural network model, and a good match is achieved with respect to the adjacent measured hysteresis loops.At small amplitude, the B-H approximately aligns with the straight line B = μ i H, where μ i is the initial permeability of the material.As the amplitude increases, the B-H loop is enlarged and gradually saturated, resulting in a much larger core loss and very different permeability.2) Example-2: Predicting the hysteresis loop at different frequencies.In this example, the excitation waveforms are a set of pure sinusoidal waves, with an amplitude of 45 mT.The temperature and the dc bias are fixed at 25 • C and 0 A/m, respectively, while the frequency is sweeping from 100 to 400 kHz.Fig. 16 shows the predicted B-H loops with this manually generated dataset as model inputs.It is observed that the impact of the fundamental frequency on the hysteresis loop is well captured and predicted by the neural network model, and a good match is achieved with respect to the adjacent measured hysteresis loops.As the frequency increases, the B-H loop is enlarged, resulting in a larger core loss energy per cycle.3) Example-3: Predicting the hysteresis loop at different levels of dc bias.In this example, the excitation waveforms are a set of pure sinusoidal waves, with an amplitude of 30 mT.The frequency and the temperature are fixed at 200 kHz and 25 • C, respectively, while the dc bias is swept from 0 to 30 A/m.Fig. 17 shows the predicted B-H loops with this manually generated dataset as model inputs.It is observed that the impact of the dc bias on the hysteresis loop is well captured and predicted by the neural network model, and a good match is achieved with respect to the adjacent measured hysteresis loops.As the dc bias increases, the B-H loop is enlarged and tilted.4) Example-4: Predicting the core loss under triangular waves with different duty ratios.In this example, the excitation waveforms are a set of pure triangular waves, with an amplitude of 43.5 mT while the duty ratio is swept from 10% to 90%.The frequency, the temperature, and the dc bias are fixed at 315 kHz, 25 • C, and 0 A/m, respectively.Fig. 18 shows the predicted core loss curves with this manually generated dataset as model inputs.It is observed that the basic relationship between the duty ratio and the core loss is well captured and predicted by the neural network model, and a good match is achieved with respect to the adjacent measured core loss.For triangular waves, the core loss reaches a minimum when duty ratio D = 0.5, and increases when it approaches 0 or 1.The core losses for duty ratio of D and 1 − D are approximately the same, resulting in a symmetric core loss curve versus the duty ratio.5) Example-5: Predicting the core loss at different temperatures.In this example, the excitation waveforms are a set of pure trapezoidal waves, whose duty ratios for rising and falling are both 20%.The amplitude of flux density is fixed at 35, 70, and 140 mT, separately.The frequency and the dc bias are fixed at 100 kHz and 0 A/m, respectively, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.while the temperature is swept from 25 • C to 90 • C. Fig. 19 shows the predicted core loss curves with this manually generated dataset as model inputs.It is observed that the basic relationship between the temperature and the core loss is well captured and predicted by the neural network model, and a good match is achieved with respect to the adjacent measured core loss.As the temperature rises, the core loss is reduced.With the capability of predicting the hysteresis loop and core loss under various operating conditions, as demonstrated in the above examples, the proposed neural network model can be potentially used as an alternative to the conventional datasheet or measurement dataset.Notably, the neural network models can greatly reduce the size of the dataset with negligible loss of accuracy.For the N87 material in this work, the size of the postprocessed dataset for model training is 3.8 GB, while the size of the transformer model is only 204 kB, which almost equivalently describes the behaviors of magnetic materials, and much more comprehensive than the conventional datasheet with similar file sizes.Users of the neural network aided datasheet can rapidly predict the behavior of the magnetic material, such as the hysteresis loop, the permeability, and the core loss, by specifying the excitation waveforms and the operation conditions, without any needs of time consuming data extraction and complicated interpolation as dealing with the conventional datasheet.Distinguished from the conventional datasheet, the neural network-aided smart datasheet is well packaged into a function, which makes it feasible to be integrated into any other iteration calculations, such as multiobjective optimization algorithm or AI-mag [41].

VIII. NN-AIDED MATERIAL RECOMMENDATION
Benefiting from the fast model inference capabilities of neural networks, the proposed model can also be used to assist the material comparison and selection for specific excitations and operating conditions.It can rapidly rank magnetic materials across a wide operation range.The transformer-based neural network model has been trained on all the ten materials from the MagNet database, including TDK{N27, N30, N49, N87}, Ferroxcube{3C90, 3C94, 3E6, 3F4}, and Fair-Rite{77, 78}.The model was trained using a similar large measurement dataset as shown in Section V, with the same network hyperparameters and data augmentation techniques.When a specific operating condition is provided, these neural network models can be executed sequentially to sweep across all the materials and calculate the corresponding core loss for each material.By sorting the core loss values among all the candidate materials, MagNet-AI can recommend the best-performing material candidates for the given operating condition.Two examples of material ranking maps are provided here to illustrate the effectiveness of the neural network-aided material comparison.
1) Example-1: Selecting the optimal material at different levels of dc bias across a wide range of flux density and frequency.In this example, the excitation waveforms consist of a set of 50% duty ratio pure triangular waves.The amplitude and frequency of the waveforms are swept from 30 to 200 mT and 50 to 500 kHz, respectively.The dc bias is selected from three different levels, namely 0, 10, and 20 A/m, while the temperature remains fixed at 25 • C. Fig. 20 shows the material ranking maps for each level of dc bias, where different colors stand for different materials, whichever achieve the lowest core loss under each operating conditions.Each material has its optimal operation range in terms of frequency, flux density, and dc bias.N30 ferrite demonstrates lower core loss at low frequency, whereas N49 ferrite dominates at higher frequency.The boundary is moving as the dc bias changes.2) Example-2: Selecting the optimal material at different temperatures across a wide range of flux density.In this example, the excitation waveforms also consist of a set of 50% duty ratio pure triangular waves.The amplitude of the waveforms and the temperature are swept from 30 to 200 mT and 25 • C-90 • C, respectively.The frequency remains fixed at 150 kHz and the dc bias is zero.Fig. 21 shows the corresponding material ranking map.
As depicted in the map, different materials also have their optimal operation ranges in terms of temperature.At low temperature, the material map is dominated by N49 and ferrite.As the temperature increases, 3C90 ferrite begins to show the superiority.More specifically, Fig. 22 shows predicted core loss curves for the three aforementioned materials across different It can be observed that N49 ferrite achieves its minimum core loss at a relatively low temperature, while the other two materials are more suitable for high temperature applications.As demonstrated, given a targeted operation range, the neural network model can effectively assist the designer to determine, which material offers the most desirable performance for the particular operating conditions.With the constantly expanding material category, the neural network model will provide design recommendations among various materials with only linearly increasing computational cost.

IX. ONLINE SMARTSHEET PLATFORM
To enable the interactive datasheet inference based on the proposed neural network model, an open-source webpage-based platform with graphic user interface (GUI) is designed and developed.It is powered by Streamlit (an open-source app framework of Python for website deployment), and shared in GitHub, offering a variety of data-visualization tools with a GUI for the database access, magnetic core loss estimation, hysteresis loop prediction, and circuit simulation, as well as the access to download all the measured data points.The neural network model and the circuit simulation engine are deployed on the website, which allow users to predict the hysteresis loop under any user-defined conditions or simulation conditions.The website architecture and information flow of the website platform are shown in Fig. 23.
Fig. 24(a) shows an example screenshot of the smartsheet session of the neural network model.With the GUI, users may specify the type of magnetic material, operating conditions (temperature, frequency, and dc bias), and the excitation waveform The webpage is also connected to a circuit simulation server hosted by Plexim.The webpage feeds information to the server, and the server returns inputs to the machine learning algorithms in combination with power converter operations, as shown in Fig. 24(b).Users can choose from a pool of common topologies (buck, boost, flyback, dual-active bridge), specify the circuit parameters, magnetic component specifications, and operating conditions, then the simulation engine simulates and outputs the excitation waveform of the magnetic component.The MagNet server collects the waveform and predicts the core loss using the proposed neural network models.Iterations between the neural network model and the simulation engine will be implemented in the future to achieve more accurate simulation results by capturing the nonlinear effects.Note the flux density is calculated based on the specified geometrical parameters assuming a uniform flux distribution.The geometry impact is not considered in this work.The neural network models can be integrated with a circuit simulator to enable magnetic-in-circuit simulations.
Besides the neural network-aided smart datasheet, the website also provides the database session, which allows the raw measurement dataset to be visualized in many ways, and enables rapid comparison of the core loss and B-H loop data of different materials.The user may specify the type of magnetic material, together with the excitation waveforms and the operating conditions.The website backend searches for the requested data in the database and visualizes it in the way that the user selects.The website also provides download access to the raw data being collected from the equipment before any postprocessing with the test conditions documented, and the postprocessed dataset files for data-driven modeling applications.
The MagNet platform is constantly maintained and updated with new data and neural network models.Extended details are included on the website to enable trustworthy repeating measurements and cross validation of the dataset.

X. CONCLUSION
This article introduces the concept of neural network as datasheet for modeling magnetics across wide operation range.We proposes an encoder-projector-decoder neural network architecture for B-H loop modeling of power magnetics.The proposed architecture is implemented based on both the LSTM network and the transformer network, effectively combining both sequence inputs (excitation waveforms) and scalar inputs (operation conditions) for hysteresis loop modeling.Experimental results prove that the neural network is capable of accurately predicting the B-H loop and the corresponding core loss for ferrite materials.Several applications of the neural network-aided datasheet are demonstrated, including B-H loop and core loss prediction, and material recommendation across wide operation range, with a fully functioning webpage-based online smartsheet platform.The neural network-aided datasheet can offer much more comprehensive information and convenient accessibility compared with the conventional datasheet, while maintaining a comparably small file size.

Fig. 2 .
Fig. 2. Examples of B-H loops measured with N87 ferrite material under 50% duty ratio triangular excitations.The reference loop (blue) is measured at 200 kHz, 25 • C, and 0 A/m DC bias.Each of the three figures shows the variation of B-H loop at different frequencies, temperatures, and DC biases, respectively.The B (only AC) waveform is extracted from voltage measurement, and the H (both AC and DC) waveform is extracted from current measurement.

Fig. 3 .
Fig. 3. Architecture and data flow of the encoder-projector-decoder neural network architecture.

Fig. 4 .
Fig. 4. Neural network structure of the LSTM-based encoder-projector-decoder architecture.Temperature (T ), frequency (f ), and DC bias (H dc ) information are mixed with the waveform information in the FNN projector after the encoder and before the decoder.

Fig. 5 .
Fig. 5. Network structure of the transformer-based encoder-projectordecoder architecture.B(t) waveform is the sequence input of the encoder.T , f and H dc are the scalar inputs of the projector.During the model training, the targeting H(t) is directly fed to the decoder as a reference input.During the model inference, the predicted sequence is fed back to the decoder, generating the entire output sequence in an autoregressive manner.

Fig. 8 .
Fig. 8. Set of example B(t) waveforms of N87 ferrite before and after the phase shifting augmentation.Waveforms are measured under sinusoidal excitations at 100 kHz, 25 • C, and zero DC bias.

Fig. 9 .
Fig. 9. Examples of the multicycle data augmentation.(a) Original singlecycle waveform at 125 kHz.(b) Augmented two-cycle waveform at the effective frequency of 62.5 kHz.A well designed and well trained neural network should be able to predict similar results for both two cases.

Fig. 10 .
Fig. 10.Prediction results of the H(t) waveform and the B-H loop of an example testing point (trapezoidal, 140 kHz, 90 • C, 30 A/m DC bias) at different stages of the training.The mismatch is minimized as the training proceeds until a good match is achieved between the predicted and the measured waveforms.

1 n tn t=t 1 ( 1 (H meas (t)) 2 .( 2 )
Fig.11shows the distribution of relative errors in the H(t) predictions generated by the LSTM-based model and the transformer-based model.As shown in the figure, both models accurately predict the H(t) sequences.The average relative error for the LSTM-based model is 4.52% and the transformer model 2.99%, while the 95th percentiles are 10.92% and 6.48%, respectively, as listed in TableI.The test set covers data points with all three types of waveform shapes and across the same ranges of frequency, temperature, and dc bias as the training set.These statistics on the prediction results validate that both the proposed models are capable of making accurate predictions for the hysteresis loops under various operating conditions.Under the given hyperparameter settings, the transformer-based model outperforms the LSTM-based model, demonstrating lower overall relative error in terms of the hysteresis loop prediction.

Fig. 11 .
Fig. 11.Relative error distributions of the predicted H(t) sequence generated by the LSTM-based and the transformer-based neural network models.

Fig. 12 .
Fig. 12. Relative error distributions of the predicted core loss generated by the LSTM-based and the transformer-based neural network models.

Fig. 13 .
Fig. 13.Examples of the predicted B-H loops under different frequency, temperature, and DC bias conditions, with multiple waveform shapes.Both the LSTMbased and the transformer-based models accurately predict the majority part of B-H loops, while the sharp corners are better captured by the transformer-based model.

Fig. 15 .Fig. 16 .
Fig. 15.Predicted B-H loops with the manually generated model inputs with 50% duty ratio pure triangular waves, where the amplitude of flux density is sweeping from 30 to 240 mT.The frequency, the temperature, and the DC bias are fixed at 100 kHz, 25 • C, and 0 A/m, respectively.

Fig. 17 .
Fig. 17.Predicted B-H loops with the manually generated model inputs with pure sinusoidal waves, where the DC bias is sweeping from 0 to 30 A/m.The amplitude, the frequency, and the temperature are fixed at 30 mT, 200 kHz, and 25 • C, respectively.

Fig. 18 .Fig. 19 .
Fig.18.Predicted core loss curves with the manually generated model inputs, where the duty ratio of the triangular wave is sweeping from 10% to 90%.The amplitude, the frequency, the temperature, and the DC bias are fixed at 43.5 mT, 315 kHz, 25 • C, and 0 A/m, respectively.

Fig. 20 .
Fig. 20.Material ranking map at different levels of DC bias across a wide range of flux density amplitude and frequency.

Fig. 21 .Fig. 22 .
Fig. 21.Material ranking map across a wide range of flux density amplitude and temperature.

Fig. 23 .
Fig. 23.Website architecture and information flow of the MagNet webpage platform, which provides users with access to download and visualize the measured data in the MagNet core loss database, as well as analyze and simulate the magnetic behaviors with the deployed neural network models and the PLECS simulation engine.
Input: Flux Density B(t), Frequency f , Temperature T , DC bias H dc , Field Strength H(t) (only available in training); Output:

TABLE II COMPARISON
OF THE THEORETICAL COMPUTATIONAL COST BETWEEN THE LSTM AND THE SELF-ATTENTION (TRANSFORMER)