Early Fault Classification in Rotating Machinery With Limited Data Using TabPFN

Intelligent fault detection and classification is a cornerstone of prognostic and health management of rotating machinery (RM) research. Correctly classifying and predicting RM faults not only increases productivity in industrial plants but also reduces maintenance costs. The datasets from real facilities needed to train fault classifiers often have few samples due to the expense of provoking faults in real scenarios to obtain data. This article proposes the use of the tabular prior-data fit network (TabPFN) model for the classification of faults in RM. TabPFN is a model which has been pretrained with a large amount of synthetic data with many causal relationships. This allows the model to perform Bayesian inference on the data used for training. The advantages of this model are its ability to be trained with limited data without generating overfitting problems and its high speed (if a graphics processing unit (GPU) is available). To compare its performance with traditional algorithms for tabular classification such as XGboost and random forest, three public datasets were used. Results show that TabPFN performs more accurately than algorithms with limited data, so it is suitable to be deployed in real scenarios when the amount of data available from the monitored RM is limited.

Maintenance accounts for 15%-60% of the total production costs [30].Predictive maintenance is the best alternative in terms of both cost and performance, carrying out maintenance tasks only when a fault has been detected, preferably at an early stage [13].Predictive maintenance can increase productivity by 20%, and the life of RM by up to 50%.Maintenance costs can be reduced by 50% [6].Detecting when an RM is about to fail requires continuous monitoring, and it depends on many operational factors, such as the rotating speed, the load it supports, the type of bearings, and all the nearby machines [14].Early detection of faults is vital, leading to a considerable increase in the availability and productivity of machinery.
Fault classification enables proactive maintenance strategies, such as condition monitoring and predictive maintenance, by providing insights into the health status of RM [24].Accurate fault classification advances the field of machinery design and manufacturing, facilitating the development of more robust and reliable RM with improved fault tolerance and enhanced performance [12].
Data-driven models have emerged as a promising approach for fault classification in RM, analyzing large amounts of sensor data and identifying patterns for different fault types.Data-driven models need a large amount of data for training.In RM fault classification, this must contain RM faults, which is difficult to collect in real environments.A further complication is that RM behavior is not always consistent as RM can work under different loads or rotating speed, and faults may manifest themselves differently under different operating conditions.The limited amount of data available is especially problematic with recently installed RM.Overfitting is another problem with data-driven models, especially when dealing with limited data, as models tend to memorize noise rather than learn true patterns [29].Overcoming these challenges is crucial to ensure the reliability and effectiveness of data-driven fault classification models in real-world applications.
Although there are different data-driven approaches to fault classification in RM, one of the most widely used is deep learning [10], [11].Gradient-flow-based meta generative adversarial network (GAN) for data augmentation in fault diagnosis by Wang et al. [25] addresses the scarcity of labeled data.The authors propose using a flow-based meta generative adversarial network (GFMGAN) and support this model with single natural image (SinGAN), which deploys generative models with a single natural image [23].The distribution of data is captured and characterized as vibrations by concatenating training samples and converting them into an image.The generative model obtained is used to generate new samples during the data augmentation process.To improve the ability to generate the model, they introduce a gradient-flow-based metalearning technique.Finally, they use a convolutional neural network (CNN) to perform the classification.The authors validate their proposal with two types of bearing faults: inner race and outer race.A minor limitation of the proposal is that if a new type of fault is included, it is necessary to repeat the whole training process including the data augmentation process with the new fault.The characterization of vibrations as images may limit the deployment of this system in realtime.
Another work that uses CNN for the classification of faults is CFCNN: A novel convolutional fusion framework for collaborative fault identification of RM by Xu et al. [28].The authors extract multilevel features from the different vibrations and integrate a module to merge these features using the correlations between them.Then, a smoothing mechanism is used to reduce possible overfitting when training the model.Finally, the authors validate their model with a cylindrical rolling bearing dataset and a planetary gearbox dataset.This work integrates vibrations from different sources, although it does not solve the problem of the limited datasets in this field.The inclusion of a smoothing mechanism to avoid overfitting is a step in the right direction, but may be insufficient when classifying more types of faults.
An intelligent fault diagnosis method of small sample bearing based on an improved auxiliary classification GAN by Meng et al. [16] uses two different deep learning architectures to solve the problem of data scarcity.The authors use a GAN to generate data and perform the data augmentation process.This neural network introduces the Wasserstein distance in the cost function in a way that alleviates the vanishing gradient problem.In addition, an attention mechanism is introduced that focuses on the blocks obtained with the convolutions.The features obtained by the attention mechanism are merged with the convolutions to classify faults.The method is tested with the public bearing dataset of Case Western Reserve University and the bearing simulation dataset of the Yanshan University Laboratory.Although the proposal is very interesting, it has certain drawbacks.First, GANs are very prone to overfitting, especially with limited data, which can cause them to generate very similar elements.They are also very sensitive to hyperparameter selection, because a bad balance between the generator and the discriminator can result in learning nothing or always generating similar samples.Finally, the authors use accuracy as a metric in imbalanced datasets, although other metrics usually fit better in this type of datasets.
There are also proposals that focus on the methods to characterize data, such as a visual vibration characterization method for intelligent fault diagnosis of RM by Peng et al. [20], which measures vibrations using images.These vibrations are obtained through the phase difference in the image.This means that the differences between images are obtained with their maximum frequencies.A video of the machinery while working is recorded and this technique is performed between different frames.This makes it possible to characterize the vibrations as an image directly, without transforming the vibrations into images, as in previous works.Once the vibrations are characterized as image differences, a CNN is trained to classify and detect faults.The novelty of this work lies in the elimination of the signal-to-image transformation process for the use of a subsequent CNN, but the authors do not propose a mechanism to solve the problem of limited training data.
It is common to use CNNs and characterize vibrations as images.However, there are other approaches to data characterization, such as the graphs used in fault diagnosis of rolling bearing based on knowledge graph with data accumulation strategy by Xiao et al. [27].In this work, the nodes contain information about the different extracted features, and the edges contain the feature-fault correlation.This representation constantly updates the model with new data using weighted random forest, which classifies the faults and also updates the network information when training with more samples.The strength of this proposal is that it generates a network structure for the characterization, which enables incremental training.However, it does not offer solutions for datasets with few samples: the recurring problem in this field.
Different data-driven approaches to characterization have been tested, and different data augmentation methods have been developed to address the problem of the scarcity of labeled data and consequently the overfitting of most of the data-driven approaches.However, to the best of the authors' knowledge, there is no work that proposes using a tabular prior-data fit network (TabPFN) for fault classification.This article proposes an early fault classification method using TabPFN as a data-driven model to classify different RM operating conditions when very little data are available, such as in new installations or after changes in the RM operating configuration.The performance of the model has been analyzed using three different datasets and its response tested as the size of the training set decreases and the size of the test set increases.The results show that the proposed model performs better than the traditional machine learning (ML) algorithms when the amount of available data is limited.Furthermore, the model makes better predictions without overfitting and adjusting hyperparameters, training in less than a second using graphics processing unit (GPU).This makes the proposed classification method suitable for implementation with newly installed or reconfigured RM.
The rest of this article is organized as follows.The key concepts of the research are outlined in Section II.Section III describes the environment used for testing.A description of the tests carried out and the results obtained by the proposed solution are shown and discussed in Section IV.Finally, the concluding remarks and future work are outlined in Section V.

II. BACKGROUND
In this section, the key concepts for the development of this work are described.In particular, the different RM faults to be classified are analyzed, and TabPFN and its advantages for the classification of these faults are described.

A. RM Faults
RM refers to any device that rotates, converting electrical energy into rotational kinetic energy.It has two main components: the stator, which is stationary, and the rotor, which moves.The bearing, which enables relative motion between the rotating and stationary parts, is essential to the operation of RM.To ensure consistent productivity, RM must operate continuously, with high reliability and no breakdowns.This work uses datasets in which the major contributors to RM faults are present: unbalance, misalignment, and bearing faults [5].Misalignment can be vertical or horizontal and there are several types of bearing faults depending on where they occur.Each of these faults is explained in more detail below.
1) Unbalance occurs when there is an uneven distribution of mass within the rotating component of the machine.Unbalance is usually caused by defects in manufacturing, wear, or improper assembly.This leads to excessive centrifugal forces during operation, generating vibrations that affect performance and can be a risk for the entire system [4].If they are not properly fixed, they can produce significant damage, reducing the remaining useful lifetime of RM and in the worst cases causing catastrophical damage.Vibrations can also have detrimental effects on the surrounding equipment and structures.2) Horizontal misalignment happens when the centers of rotation of the driving and driven components are not perfectly aligned along the horizontal axis.It commonly appears due to installation errors, thermal expansion, or incorrect placement of the machine on its base [21].
If not properly addressed, it can cause excessive friction and accelerated wear on horizontal shafts and bearings, leading to reduced energy efficiency and increased power consumption.To rectify it, precise alignment techniques such as laser alignment or dial indicator measurements are used.3) Vertical misalignment is similar to horizontal misalignment.It occurs when the centers of rotation of the driving and driven components are not aligned, but in this case along the vertical axis.Both types of misalignment lead to similar issues: increased forces, vibrations, and strains [21].However, vertical misalignment affects vertical components and can cause an increased stress on thrust bearings.Like horizontal misalignment, vertical misalignment can be rectified using precise alignment techniques.4) Outer race bearing faults refer to the breakdown or malfunction of the outer race of bearings.The outer race is the outer ring of the bearing, which encloses the rotating elements that support and guide the rotating shaft [26].These faults usually appear due to misalignment, improper lubrication, or excessive loads.They can lead to severe consequences in RM, causing excessive vibrations and noise, and reducing productivity and efficiency.When outer race bearing faults are not properly addressed, the fault can cause damage to other components such as the shaft or the housing of the RM.They can also lead to damage in external components or structures that are connected to the bearing.Regular maintenance, proper lubrication, and condition maintenance are usually used to prevent them.5) Inner race bearing faults are similar to outer race bearing faults, but they appear in the inner ring of bearings [26].The inner ring interacts directly with the rotating elements, facilitating smooth rotation.These faults are caused by excessive loads, misalignment, improper lubrication, or fatigue over time.They can cause excessive vibrations and noise, as well as damage to the shaft or housing of the RM, reducing RM performance.6) Ball bearing faults are the breakdown or malfunction of the rotating elements of the bearing, which are metal balls or cylinders that roll between the inner and outer races of bearings [26].They are used to reduce friction and enable smooth rotation of shafts.When these faults occur, they can lead to increased friction, excessive heat generation and vibrations, reducing the RM efficiency, and increasing energy consumption.They can also affect other parts such as the shaft.

B. Tabular Prior-Data Fit Network
TabPFN is a prior-data fit network (PFN) [17] designed to perform supervised classification tasks on tabular data [8].This model proposes a radical paradigm shift with respect to other current models in the state of the art.TabPFN provides a transformer that has been trained to perform Bayesian inference.For this purpose, single causal models are generated and Bayesian neural networks are used.The objective is to train the PFN model to approximate the posterior predictive distribution of the data to be predicted as closely as possible.The goal is for the transformer to perform Bayesian inference for a wide range of causal relationships.When trained on a different dataset, such a model is able to approximate the posterior predictive distribution of new data in a single pass.
While classical approaches, such as neural networks, focus on finding the distribution of the data based on each of the Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
samples, TabPFN focuses on finding the best final distribution given the initial distribution of the training data.Since TabPFN has been pretrained with synthetic datasets that hold simple causal relationships, TabPFN searches for approximations to such causal relationships [18].
The experiments carried out by the authors seem to conclude that a key difference with respect to other state-of-the-art models is that TabPFN does not incur overfitting when the dataset is small.This is because it is pretrained with a large number of datasets that collect different types of causal relationships.Therefore, the model is able to achieve an approximation of the final distribution of the new training data, as long as the initial distribution of the new training data follows the patterns it extracts from the pretraining data.It is also capable of training without hyperparameter optimization, obtaining very good training times, especially if a GPU is used.
However, TabPFN has a number of limitations.It can deal with 1024 training samples, ten different families, and 100 features at most.These limitations are due to the fact that the execution times of the PFNs scale quadratically based on the training data and the need to keep model training times short.A softer limitation is that TabPFN degrades its performance with missing values or categorical features.One example of categorical feature is the lubrication level, which can be low, normal, or excessive.Another limitation of TabPFN is that it may fail to generalize when the data contain very complex causal relationships, since it has not been pretrained with data with such relationships.
These limitations mean that TabPFN has very specific requirements.When these requirements are met, it is a viable alternative that does not require hyperparameter optimization and allows training in less than 1 s.

III. TEST ENVIRONMENT
The test environment was configured with an Intel 1 Core 2 i7-9750H CPU (2.6 GHz), 16-GiB RAM, and an NVIDIA GeForce GTX 1650 (4 GiB) graphic card.The programming language used was Python 3.10 using the PyCharm framework.The proposed model was tested using three different datasets to evaluate its performance under different RM conditions and fault types.These tests were repeated with different traditional ML classification models: random forest (RF), support vector machine (SVM), gradient boosting (GB), multilayer perceptron (MLP), eXtreme gradient boosting (XGB), and label propagation (LP).The results obtained by TabPFN were compared with those of these traditional ML classification models.
The datasets were divided into the training and testing sets, varying the size of the training set.For each training set size, each ML model was trained with the same training set and tested with the same testing set to compare their performance.If the models are trained with only one training set and tested with only one testing set, the results obtained may be biased by the training set used.For this reason, cross-validation was carried out, repeating each test 50 times, varying the 1 Registered trademark. 2Trademarked.

TABLE I EXTRACTED TIME-DOMAIN FEATURES
training and testing sets and computing the mean of each of the evaluation metrics.In addition, different experiments were performed where the number of samples of the training set was increased progressively, from 2% to 50% of the whole dataset, using the remaining data in the dataset as the corresponding testing set.The objective was to test how the different algorithms behave with small training datasets.
The first step is to extract a total of seven time-domain features from the raw vibration data of each vibration axis.These time-domain features and the formulas used to extract them are shown in Table I.Next, each of the datasets is divided into several training and testing sets, using different sizes of the training set.All the traditional classification methods were previously configured by manually adjusting their hyperparameters to obtain the best possible results.In contrast, TabPFN does not need hyperparameter optimization.
To measure the performance achieved by TabPFN and compare it with that of the traditional classification methods, each of them was analyzed as a multiclass classifier.True positives (TP) are obtained when the classifier correctly identifies an instance as belonging to a specific class, making the prediction correctly.False positives (FP) occur when the classifier identifies an instance as belonging to a particular class when it actually belongs to another, making an incorrect prediction.True negatives (TN) refer to instances that are correctly classified as not belonging to a specific class.Finally, false negatives (FN) appear when the classifier identifies an instance as belonging to a specific class, but the actual class label indicates that the prediction is wrong.The following metrics were used to evaluate the performance of the tests.(1) 2) Precision: This metric measures the ability of the classification model to avoid FP.Precision is computed with the following equation: 3) Recall: This metric measures the ability of the classification model to avoid FN.It is a very important metric as it tests the possibility allowing faults to appear in the RM without being detected, which could cause a reduction in productivity and even the complete breakdown of the RM.Recall is computed using the following equation: 4) F1-Score: This is used to provide a more global view of the performance of the classification model.It is the most balanced metric as it takes into account both precision and recall, working well in cases of imbalanced datasets [22].The F1-score is computed with the following equation: A brief description of each of the datasets is given below.These datasets have been selected because they are imbalanced and exhibit different types of failures in RM.

A. CWRU Dataset [1]
The CWRU dataset is one of the most commonly used datasets in bearing fault classification.The test rig used for this dataset consists of a motor, a token converter, an encoder, and a dynanometer.Three vibration sensors are placed at different locations (drive end, basement, and fan end) to collect vibration time-series data.CWRU simulates the system working under different operating states: healthy, inner race bearing fault, ball bearing fault, and outer race bearing fault.Data were collected at different rotating speeds, and each of the faulty operating conditions was simulated with different levels of severity.Drive end bearing faults were collected at a frequency of 12 and 48 kHz, while fan end and basement bearing faults were collected only at 12 kHz.For healthy operating conditions, data were collected at 48 kHz.
During the tests, only healthy operating conditions and drive end bearing faults were analyzed.Data collected from the sensor were as segmented into 10 248 data points to generate samples from which to extract the time-domain features shown in Table I.After preprocessing this dataset, a total of 2208 samples composed of seven time-domain features were extracted, corresponding to healthy operating conditions, inner race bearing faults, outer race bearing faults, and ball bearing faults.

B. MaFaulDa Dataset [2]
The MaFaulDa dataset is another commonly used dataset in fault classification in RM.It comprises various types of faults at different rotating speeds and levels of severity.It consists of multivariate time-series data collected from a machinery fault simulator test rig.The machinery fault simulator simulates the system working under different states: healthy, unbalanced, horizontal misalignment, vertical misalignment, inner race bearing fault, ball bearing fault, and outer race bearing fault.
The dataset is composed of 1951 files including vibration data collected from two three-axis accelerometers (axial, radial, and tangential), the rotating speed, and the noise around the system.Each data sample was collected at a frequency of 50 kHz for 5 s.
The dataset was preprocessed to extract the time-domain features shown in Table I for each vibration axis.After preprocessing the dataset, a total of 1951 samples were extracted, each composed of 42 time-domain features (seven timedomain features per axis per sensor).

C. HUST Dataset [9]
This recently published dataset consists of vibration data gathered at 51.2 kHz from different bearings under different operating conditions.There are a total of 90 raw vibration data sample files, including four types of bearing states (healthy, inner race, outer race, and ball bearing faults) on five different bearings, under three different working conditions.
During the tests, all the bearings were taken into account.Data collected from the sensor were segmented into 18 000 data points to generate samples that were used to extract the time-domain features indicated in Table I.Once this dataset was preprocessed, a total of 2036 samples composed of seven time-domain features were extracted, corresponding to healthy operating conditions, inner race bearing faults, outer race bearing faults, ball bearing faults, and both inner race and outer race bearing faults of five different types of bearings.
All these datasets after feature extraction are summarized in Table II.First, the CWRU dataset was selected to test the performance of TabPFN while classifying bearing faults only.Then, the MaFaulDa dataset was used for the tests with misalignment and unbalance faults.Both the datasets exhibit a clear imbalance between healthy and fault states.Finally, the HUST bearing dataset was used for the tests with bearing faults for different bearing types while classifying states where inner race and outer race bearing faults occur simultaneously.

IV. RESULTS AND DISCUSSION
The performance of the proposed model using TabPFN and the performance obtained with traditional classification models were compared.As explained in Section III, three public RM fault diagnosis datasets were used.
In the next subsections, the results obtained for each of the datasets are presented and discussed, comparing the TabPFN model with the rest of the classification models.

A. Results Obtained With CWRU Dataset
Once the CWRU dataset was preprocessed, a total of 2208 samples, each of them with seven time-domain features, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.were extracted.Only 216 of the samples correspond to a healthy state of the RM.The rest of them contain bearing faults: 649 ball bearing faults, 717 inner race bearing faults, and 626 outer race bearing faults.The dataset was divided into training and testing using 2%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, and 45% for training, which means using between 45 and 994 samples for training.The number of training samples was always lower than the 1024 training sample limitation of TabPFN.

TABLE II DESCRIPTION OF USED DATASETS
Table III includes all the performance results (accuracy, precision, recall, and F1-score) obtained by TabPFN and traditional ML models used for fault classification in the CWRU dataset while the training set size increases from 2% (45) to 45% (994) of the samples.The average training time for each of the tests is also included.As can be seen in Table III, TabPFN offers the best results in terms of accuracy, precision, recall, and F1-score in all the cases.As for the other ML models, XGB, GB, and RF offer good results, but always worse than TabPFN.TabPFN is always trained in less than 0.35 s.LP is the fastest trained in all the cases, but its performance metrics are significantly worse than those obtained with other models.The results of all the models improve as the size of the training set grows.A detailed comparison of the performance of TabPFN, XGB, GB, and RF is shown in Fig. 1(a) and (b), where accuracy and F1-score are analyzed, respectively.

B. Results Obtained With MaFaulDa Dataset
After preprocessing the MaFaulDa dataset, a total of 1951 samples were obtained, each of them with 42 timedomain features: 49 of the samples correspond to the healthy state and the rest to fault states.Of the fault state samples, 197 have horizontal misalignment, 301 vertical misalignment, 333 unbalance, 376 inner race bearing faults, 372 outer race bearing faults, and 323 ball bearing faults.The dataset was divided into training and testing using 2%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, and 50% for training, which means using between 40 and 976 samples for training.The number of training samples was always lower than the 1024 training sample limitation of TabPFN.
The results obtained with TabPFN and the traditional ML models are shown in Table IV.TabPFN offers the best results in terms of accuracy, precision, recall, and F1-score, except when 20% of the dataset is used for the training set, when MLP has slightly higher recall.As for the other ML models, MLP, XGB, and RF have good results, but always worse than TabPFN.TabPFN is always trained in less than 0.60 s.In these tests, SVM and LP are trained faster than the rest of the ML models, but the performance metrics obtained are worse than the rest of the ML models.Again, the results of all the models improve as the size of the training set grows.To facilitate the comparison, Fig. 2(a) and (b) shows the accuracy and F1-score of each of these models, respectively, as the training set size increases.

C. Results Obtained With HUST Dataset
The HUST bearing dataset was preprocessed, and a total of 2036 samples, each of them with seven time-domain features, were extracted.From these samples, 435 correspond to healthy state while the rest correspond to bearing faults: 296 samples have ball bearing faults, 435 inner race bearing faults, Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.435 outer race bearing faults, and the remaining 435 samples correspond to RM where there are both inner race and outer race bearing faults.This dataset was divided into training and testing using 2%, 5%, 10%, 15%, 20%, 25%, 30%, 35%, 40%, 45%, and 50% for which means using between 41 and 1018 samples for training.The number of training samples was always lower than the 1024 training sample limitation of TabPFN.
The results for each test are shown in Table V. TabPFN provides the best results until the training set size exceeds 30% of the total.After this, the results obtained by TabPFN are still good, but those of XGB are better.RF results are also better when the size of the training set is 50%.The training times of XGB and GB increase as the number of training samples increases, always lower than 1.20 s.The training time of TabPFN increases less than the other two ML models, always lower than 0.65 s.LP is again the fastest ML model to be trained, but the results are significantly worse than the results of the rest of ML models.Fig. 3

D. Discussion
As can be seen in the results shown in Sections IV-A-IV-C, TabPFN provides significant improvements in the classification of RM faults.With the CWRU bearing dataset, the results obtained with TabPFN are significantly better than those obtained with the traditional ML models, especially when the number of samples used for training is very low.As the size of the training set increases, the performance obtained by the traditional ML models becomes more similar to that obtained by TabPFN.The results with the CWRU dataset show TabPFN to be a valuable tool when classifying bearing faults and healthy operation, in spite of the imbalance of the dataset.
In the second test, the MaFaulDa dataset was used, combining bearing faults, vertical and horizontal misalignment, and unbalance faults.In this case, TabPFN has better results for all the sizes of the training set.Thus, not only is TabPFN valid for bearing faults' classification but also it provides better results when the dataset used is clearly imbalanced, containing only 49 samples of healthy behavior and approximately 300 samples for each type of fault.This confirms the initial assumption that TabPFN tolerates training with few samples without overfitting, even when the datasets are highly imbalanced.These results make the proposed solution suitable for real scenarios where the number of fault samples is low, such as new RM installations, changes in RM operating configuration, or when the installation is already in production but the number of faulty samples is limited.
Finally, TabPFN was tested with the HUST dataset, which includes faults in bearings of different types.It also includes samples with simultaneous inner race and outer race faults.In this case, the results obtained are better than those of the rest of the ML models until reaching 35% of the data as the training set.From this point on, the results obtained with TabPFN are still competitive but those of XGB are better.These results may be due to the bias that was arbitrarily introduced in TabPFN, with respect to causal relationships, based on the Occam's razor principle [3].TabPFN gives more importance to simple causal relationships, because of the large amount of synthetic data with this type of relationship.In this dataset, there are more complex failures so the result is slightly worse, although it is still viable for classification in this dataset.
TabPFN's training times are consistently below 1 s, although they are not the fastest.This is because the training is performed with a single pass to adjust the weights of the neural network.This is possible because the preweights of this network were already defined in the pretraining with the synthetic data used to generalize a large number of causal relationships.
It is very important to predict healthy states correctly to prevent unnecessary downtime of RM and therefore a reduction in productivity and unnecessary maintenance costs.Therefore, each dataset was divided into training and testing using 10% for training and the remaining for testing to obtain the confusion matrices.It must be noted that this test was only done once, so the results shown in Fig. 4 correspond to the confusion matrices of a single test.The vertical labels correspond to the actual values and the horizontal labels to the predicted values classified by TabPFN, showing how the model classifies healthy states (H), ball bearing faults (B), inner race bearing faults (IR), outer race bearing faults (OR), horizontal misalignment (HM), vertical misalignment (VM), unbalance (U), and simultaneous IR-OR.
As can be seen in Fig. 4(a), the healthy states are all correctly predicted with the CWRU dataset.However, due to the imbalance of the CWRU dataset, eight faulty states are predicted as healthy.Something similar happens with the MaFaulDa dataset [see Fig. 4(b)], where only two healthy states are predicted as if they were faulty, but eight faulty states were predicted as corresponding to healthy operating conditions.Finally, as shown in Fig. 4(c), all the healthy and faulty states are correctly predicted using the HUST dataset.
Based on the results obtained, TabPFN classifies healthy states correctly although sometimes it classifies faulty states as healthy.This problem is due to the imbalance in the dataset of healthy and faulty states, where the number of faulty states is much higher than the number of healthy states.In real scenarios, the number of healthy states is much higher than the number of faulty states, making the proposed solution not only feasible for industry but also achieving better performance than the traditional ML models.

V. CONCLUSION AND FUTURE WORK
A novel early fault classification method for RM using TabPFN with limited data has been proposed.The method was tested using three different RM public datasets.Seven time-domain features from each vibration axis were extracted and then used to classify RM faults.
The results show that TabPFN works correctly with limited data for training, achieving better performance than the traditional classification algorithms without hyperparameter optimization, avoiding overfitting and completing the training process in less than 1 s in a GPU.Furthermore, its ability to correctly classify healthy and faulty states has been proved with both the balanced and imbalanced datasets.Thus, TabPFN is suitable for use in industrial plants where the monitored machinery has been recently installed or reconfigured and the amount of data for training is limited.To summarize, our achievements are as follows.
1) TabPFN is a reliable model for fault detection in RM.
2) The inherent characteristics of the datasets that generate RM faults are beneficial for TabPFN.Good results are achieved even with imbalanced datasets.
3) The TabPFN-based fault detector performs well in the three public datasets tested because the model can generalize even with different types of faults.4) In this context, TabPFN achieves competitive results with other state-of-the-art algorithms, especially when there are very few samples to train on.Future work will be geared toward applying TabPFN to current signature features in rotating machines that are not accessible for vibration measurement, such as submerged pumps.Moreover, its utility in other scenarios, such as gear faults or even combustion motor faults, will be explored.

1)
Accuracy: This metric measures the overall correctness of the classification model by calculating the proportion of correctly classified instances out of the total number of instances in the dataset.It provides an assessment of the ability of the model to correctly classify both the positive and negative instances.Accuracy can be computed using the following equation: Accuracy = TP + TN TP + TN + FP + FN .
(a) and (b) shows the accuracy and F1-score of the different ML models and TabPFN as the training set size increases.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.