A Novel Methodology for Unsupervised Anomaly Detection in Industrial Electrical Systems

The recent development of highly automated machinery and intelligent industrial plants has increasingly enabled the continuous monitoring of their efficiency and condition, with the aim of maintaining high production efficiency and minimal malfunctions. Typical condition monitoring and fault detection applications are often achieved using acoustic and vibrational techniques, but the availability of distributed electrical measurements opens new opportunities for industrial fault detection with minimal impact on electrical systems. Even if artificial intelligence (AI)-based approaches can be used to model industrial equipment by means of measures made on electrical systems to which they are connected, machine learning algorithms have been demonstrated to be particularly adequate for this purpose due to the huge amount of data produced by interconnected sensors and devices. In this context, the aim of this work is to propose a new unsupervised analysis methodology for detecting anomalies in industrial machinery using electrical current values and other parameters measured on the power grid. The proposed framework is aimed at incorporating the advantages of machine learning algorithms and those of traditional analysis, optimizing their operation to improve performance and execution time; this also incorporates a methodology for analyzing the temporal dynamics of the anomaly based on short-time Fourier transform (STFT) to strengthen the performance of the detection. The results obtained showed excellent performance, both compared to the evaluations of a technical expert and to other methodologies used in the literature, with zero false positives (FPs) detected in all datasets tested and a negligible number of undetected outlier events, less than 4% of the total in the datasets.

perspectives for raising the productivity and efficiency of manufacturing systems [1] by the pervasive promotion and inclusion of measurement and decision-taking devices in facilities and plants [2], [3], is leading to extremely connected smart factories.
Due to the advances in measurements and real-time data collection, the monitoring of different industrial equipment, such as milling and turning machinery, air compressors, power transformers, and electrical distribution systems [4], has become a more common and easy task to realize.
Artificial intelligence (AI) has become an essential tool for the efficient exploitation of the increasing amount of data produced by these enhanced technologies.In particular, neural networks allow for extremely efficient nonlinear feature extraction [5], [6], [7] from this vastity of data.The family of machine learning algorithms also allows for a great deal of control over the visualization and processing of data, so as to facilitate the extraction and identification of important features for the phenomenon under investigation [8], [9], [10].Artificial neural networks are thus the most useful mathematical and algorithmic tool in machine learning.In particular, deep neural networks, such as convolutional neural networks (CNNs), recurrent neural networks (RNNs), and deep autoencoders, are part of deep learning, a subset of machine learning focused not only on classification and prediction tasks but also on automatic feature extraction, without the support of additional algorithms and tools [11].
Moreover, in the context of Industry 4.0 and Smart Industry, the automation procedure of monitoring and production processes is becoming an essential requirement.While Industry 4.0 and sensors enabled manufacturing equipment are strategic and essential for equipment data collection, predictive maintenance (PdM) [12] consists in the data mining process and the machine learning models able to predict the equipment state of health [13] and remaining useful lifetime (RUL).
Predictive maintenance has become a promising approach to cost savings, increasing the life span of equipment, and convenient and sustainable machinery operational management.The applicability is vast and goes from industrial equipment, bearing [14], and engines [15], to batteries [16] and so on.
The benefits of artificial neural networks in the field of predictive maintenance have already been discussed by many authors.A review of developments in the field fault machine diagnostic based on AI techniques can be found in [17].Recent trends move toward the reduction of training data and simplification of neural network structures with the introduction of physics-informed neural networks [18].They demonstrated to be effective and faster than other conventional methods in the description of power system dynamics [19] and for predictive maintenance for semiconductor applications [20] simplifying multiphysics finite element simulations.

A. Research Area
The research area investigated in this article relates to the predictive maintenance of industrial equipment.More in detail, the work deals with the research and development of techniques to predict failures of machinery and functional parts of a production line.
The monitoring of industrial components, such as motors, turbines, oil flow in pipelines, and other mechanical components for fault detection, can be obtained with anomaly detection techniques [21].Recently, artificial neural networks have come in to help with anomaly detection to detect complex anomalies with dynamic and time-variant systems where common approaches based on static models struggle [22].Due to the not precisely defined boundary between normal and anomalous behaviors and the increase in the available data volume for anomaly detection, deep learning anomaly detection techniques assess their effectiveness for their automatic feature learning capability [23].
In this field, in fact, being able to know in advance what machinery is about to break down, thus enabling timely repair, can greatly reduce costs compared to managing downtime, even for a few hours.Achieving this goal, therefore, requires using advanced analytics and maximizing not only the amount of data acquired but also the number of parameters monitored.Parameters that can provide indications of possible breakdowns can be captured with infrared thermography [24], acoustic monitoring [25], vibration analysis [26], oil analysis [27], and electrical analysis [28].
Among the various techniques used to achieve this goal is anomaly detection.The concept of anomaly detection is extremely general in that it refers to detecting patterns in the data that do not conform to the definition of "normal behavior."This highly subjective definition causes this concept to be interpreted extremely differently in the different fields in which it is applied.It is, therefore, necessary to have a very detailed knowledge of the phenomenon under consideration to provide as comprehensive and objective a definition of "normal behavior" as possible.
This research focuses on the analysis of electrical anomalies for predictive maintenance on large ohmic-inductive loads.The type of fault investigated occurs mostly as a rapid and very short-lived change in nominal values on the three-phase power grid to which faulty systems are connected, causing problems for other healthy equipment as well.This type of maintenance is very common in large industrial plants, as rotary machines, including refrigeration compressors, can give indications of malfunctions not only in terms of vibration but also by introducing "abnormal" behaviors on the electrical network.
This type of monitoring has the advantage over acoustic or vibrational monitoring of not having to monitor each machine individually, significantly reducing the complexity of the apparatus and its costs.

B. Literature Review
The monitoring of large ohmic-inductive loads in the industry is a topic that has been addressed many times in the literature; this is because this type of load is representative of an electric motor, which can also be used in compressors for refrigeration machinery.For this reason, there are mostly approaches based on vibration analysis in the literature [29], [30], [31], [32], [33].
Moreover, this approach has also been automated through the introduction of deep learning and deep clustering; in particular, this has been realized in [34].
However, as described earlier, the approach of interest in this article instead concerns the study of anomalies introduced by the faulty item in the power grid.For example, Hashmi et al. [35] monitored the medium-voltage (MV) overhead distribution network to detect and locate faults in place.Doing so makes it possible to signal the problem in advance, according to the predictive maintenance paradigm.
Ardito et al. [36] designed interpretable spectrogram-based CNN modeling for fault signal identification; in particular, the authors used a weighted class activation mapping to visualize the regions of the input spectrogram that are most relevant for prediction.In this way, the authors tried to make the motivation of the neural network's predictions more explainable.An electrical analysis was also carried out in [37], where the authors analyzed the imbalance in voltage between the three phases of an induction motor.A type of analysis closer to the approach of interest in this article is presented in [38]; this has been carried out with RNNs, Bi-long short-term memory (LSTM), and ensemble learning approaches; the authors' goal has been to characterize partial discharge by analyzing power signals, manually extracting features of interest and providing them to a machine learning-based classifier for the final decision.
Another analysis carried out numerous times in the literature, particularly close to the topic of interest of this article, concerns the detection of anomalies in the power consumption of a power line.The approaches are numerous and are based on time series analysis.Malki et al. [39] used data produced by a series of IoT sensors to analyze and predict anomalies in household consumption using time series forecasting.
In contrast, in [40], the authors use an unsupervised approach, providing a series of unlabeled data instances as input to an isolation forest algorithm.This overcomes the problem of having to label a large amount of data by hand, reducing work and training time and allowing the algorithm to continue learning during the field deployment phase.
The concept of unsupervised training of machine learning algorithms is also discussed extensively in [41], where the authors dissert a review on the frameworks, methods, applications, and challenges of anomaly detection for energy consumption.In particular, the importance of these methods, such as clustering, for the application, consisting of a large amount of data but an unbalanced dataset, which is composed of few anomalies compared to a large amount of nominal data, is highlighted.However, there has been no research on the combined use of these clustering techniques and algorithms for analyzing features in the frequency domain.
Recently, the unsupervised approach has also been practiced with deep learning, which has the great advantage of not requiring a preliminary manual analysis of the features to be extracted; in particular, Kardi et al. [42] applied this methodology to anomaly detection in electrical consumption, using, in this case, a deep autoencoder.These tools have the merit of being robust and adaptable to all types of scenarios, needing only a few portions of the time series to self-train before actual deployment.However, autoencoder decisions are not explainable, as they follow a black-box approach [43]; therefore, in the case of incorrect detection, the cause of the error cannot be easily found.

C. Main Contribution
The study of state-of-the-art has shown that the analysis and detection of anomalies necessarily requires an in-depth study of the physical and electrical characteristics of the fault under investigation, thus making a classical approach necessarily tailor-made, increasing development time and cost.Deep learning algorithms have also been used in the literature, which, in contrast, are extremely robust and adapt automatically to a wide range of faults.The disadvantage is the total absence of analytical explanations of the decisions of these algorithms due to the black-box approach.
The type of learning most suitable for the problem under consideration is the unsupervised type, in order to minimize human intervention and ensure greater continuity of analysis even at night.The goal of the work was therefore to build from scratch an anomaly detection methodology that would also be aware of the types of anomalies that affect these types of electrical systems.The proposed methodology relies on a chosen set of time-and frequency-domain features, in contrast with the deep learning black-box automatic features extraction approach.In fact, the use of time-domain data clustering alone is not sufficient to robustly identify the type of anomalies under investigation, not only related to instantaneous values (detectable with clustering such as K-means) but also to temporal dynamics (detectable with frequency-domain analysis).
To summarize, the novelty is the proposal of a new algorithm for detecting anomalies in time series based on two variables; the algorithm is based on both time and frequency study of time series.Moreover, the focal points of the work are given as follows.
1) Identification of anomaly points using unsupervised data clustering for the machine learning field.2) Optimization of the unsupervised clustering algorithm to lighten the computational load and make the choice of optimal data partitioning for the specific type of anomaly under exam and centroid initialization more efficient.3) Fusion with short-time Fourier transform (STFT) algorithm for a frequency-domain validation of the anomaly identified by clustering.

D. Manuscript Structure
This article will then be structured as follows.In Section II, theoretical concepts regarding anomaly detection and machine learning techniques with unsupervised training, particularly clustering, will be presented; the operation of the proposed methodology will also be described.Section III will describe the data acquisition system and the actual application context of the work.Section IV will describe the deployment of the proposed algorithm, the results, and comparisons with the benchmark method, adaptive thresholding, and the other most widely used methodology in the literature, the deep autoencoder.Conclusions will be presented in Section V.

II. PROPOSED METHODOLOGY
As described in Section I, the objective of this work is to propose a methodology for the automatic detection of malfunctions in industrial electrical systems; specifically, the aim is to detect the malfunction of ohmic-inductive loads, which are typical of rotary machines, including refrigeration compressors, installed in large industrial plants.Therefore, to automate the decision and make the training process unsupervised, it was decided to employ a clustering algorithm, K-means, and an analysis of temporal features with a spectrogram.

A. Data Clustering
Clustering is one of the most widely used approaches for descriptive modeling of big data and it allows a dataset to be analyzed and explored to group objects into clusters, which means the groups that have common characteristics, as shown in Fig. 1.Given a dataset of N time series, D = {F 1 , F 2 , . . ., F N } is referred to as time series clustering, the process of unsupervised classification of D into C = {C 1 , C 2 , . . ., C N } to form homogeneous groups of time series based on a certain measure of similarity.
Clustering algorithms are divided into hierarchical clustering and partitioning clustering.The former can be agglomerative and divisive, depending on whether the data are grouped into a cluster or divided into subclusters.On the other hand, the choice of clustering type for the problem under consideration was partitioning clustering, which is divided into center-based, density-based, and spectral-based.In particular, it was decided to use K-means, an unsupervised algorithm that divides the dataset into k clusters and is based on the concept of a centroid, a point belonging to the feature space that averages the distances between all the data belonging to the cluster associated with it.The choice of K-means as the main algorithm for processing anomalous and normal points was motivated by its remarkable power, result explainability, and adaptability in data mining [44].In addition, to address its shortcomings, mainly due to centroid initialization, some choices were made for its optimization.
The difficulty of the task is related to the fact that the type of anomaly to be detected is contextual, i.e., referring to a multiplicity of points and not point type, where a single point represents the entire anomaly.
The algorithm operates as follows.First, the number of k clusters is indicated, centroids belonging to the space are chosen randomly with the only condition that they are not coincident, and then, the distance of each point in the dataset with respect to each centroid is calculated.Next, it proceeds by associating each point in the dataset with the cluster connected to the nearest centroid, recalculates the position of each centroid by averaging the positions of all the points in the associated cluster, and finally iterates this process until there is no more input that changes clusters.This methodology makes it possible to fully automate the detection of outliers by avoiding a supervised training process, which can be time-consuming and costly because of the dataset labeling operation.

B. Time Series Clustering
The clustering operation can also be carried out on time series, as required by the problem at hand: in fact, the type of anomaly intended to be detected can be seen only on the time trend of some variables of the three-phase power system in the factory.
Although it is possible to use only one time variable, i.e., a feature, for clustering, this operation is not formally correct.In fact, in some works, including [45], it is pointed out that clustering a single temporal sequence by using subsequences leads to systematically wrong results, comparable to using totally random sequences.This is because any time series must follow certain patterns that mislead the clustering algorithm.
For these reasons, it was decided to use two features measured in the three-phase electrical system: the rms value of the current on one of the three phases of the system and the total harmonic distortion (THD) parameter.Although this choice was made as a good fit for the specific application scenario, the methodology proposed in this article can be employed with any pair of time series chosen appropriately to describe the type of anomaly to be detected.

C. Spectrogram Analysis
Spectral analysis with STFT was used to make the methodology more robust, identifying the features suitable for the temporal dynamics of the anomalies to be reported.Specifically, these features were found in the context of spectral power density.The design choices mainly concern the width of the observation window to be used to ensure good temporal resolution.It is not necessary to ensure excellent frequency resolution since the type of anomaly of interest has a high energy content across the spectrum.
The spectrogram analysis, based on the local thresholding of the spectral power density, acts as a verification step for the clustering algorithm and for the identification of the individual anomalous points.In fact, using STFT alone does not allow the detection of individual anomalous points because of the tradeoff in design between temporal and frequency resolutions.In addition, STFT alone is not able to detect time series with no anomalies because, at least one window, the one at maximum power spectral density, is flagged as anomalous by the algorithm.For this reason, K-means was employed to enhance the methodology based on spectrogram analysis.

D. Operating Principle
The working principle of the proposed methodology is summarized in the flowchart in Fig. 2. As can be seen from the flowchart, the proposed methodology is based on the use of K-means clustering, which is used to provide an initial estimate on outlier samples by identifying timestamps of the samples and providing them to the final decider.Instead, the spectrogram is used to identify points in the time series with anomalous power spectral density compared to the rest of the time series, detecting the temporal features of the anomaly.
In this way, it is possible to identify the samples containing the anomaly with only the limitation of the STFT's temporal resolution, determined at the design stage.
The final fusion is done by comparing the points reported by the K-means and the windows reported by the spectrogram, finally reporting the anomaly if the two indications agree.
Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.
The simplified operation scheme includes within it the optimization procedure of both spectral analysis and K-means, where a modification and improvement work of the algorithm itself was carried out both for the definition of the number of clusters k and for the initialization of centroids at the beginning of the training, with the aim of reducing the convergence time and computational load by adapting the algorithm to the problem under consideration.

III. ALGORITHM DEPLOYMENT
The algorithm was therefore developed in such a way as to operate on two features affected by the type of anomaly to be addressed: the root mean square (rms) of the current of one of the three phases of the electrical system and the current THD, calculated according to the following equation: where I 1 is the fundamental harmonic and I h is the generic hth harmonic of the rms current.
The first stage of the algorithm involved clustering the two features.In particular, in order to reduce the computational load of the algorithm, it was decided to employ only the peaks of the time series.These peaks were identified from the crossing for zero of the first derivative of the time series.The detection of the peaks was not subjected to any threshold or decision by the user to maintain full automation and unsupervised operation of the procedure.Preprocessing also included the timestep alignment operation of I RMS and THD time sequences.

A. Data Clustering
After preprocessing, the peak values of the two time series were placed in a vector representation plane, where THD and I rms were plotted on the two axes.Using only the peak values of the time series reduced processing time by more than 90% compared with using the entire time sequence.After plotting the dataset on the above feature plane, the K-means algorithm was then applied.
The first issue associated with the deployment of this algorithm is that of choosing the number of k or the number of clusters targeted by the clustering algorithm.This problem has been addressed many times in the literature, such as given in [46].Two major methodologies are used: the Elbow method and the Silhouette method.
The Elbow method is more computationally onerous, as it performs multiple runs of the K-means algorithm, and for each of them, the within-cluster sum of squares (WSS), which is the sum of all distances between each point and the rest of the cluster, is represented.From the resulting curve, one can choose the number of k that is the best tradeoff between clustering error and the number of clusters, hence the computational load.
On the other hand, the Silhouette method is more reliable because it combines both the concept of aggregation and the separation of individual points in clustering.The principle of operation is described in Algorithm 1.This algorithm is part of the K-means clustering block of the flowchart in Fig. 2.

Algorithm 1 Silhouette Method
Input: i, points projected on the feature space Input: C i , Cluster of the i-th point Output: s i , Silhouette Coefficient for the i-th point The Silhouette coefficient resulting from the algorithm described has a value between [−1, 1].A value closer to −1 indicates that the point probably does not belong to the indicated cluster, while a value closer to +1 indicates a higher probability of belonging.To optimize computation time, it was decided to employ a Silhouette Method in the proposed methodology, limiting the search for k between 1 and 4. For convergence time reduction, on the other hand, it was also decided to place the starting centroids, which are those with which the algorithm is initialized, at the maximum and minimum values of I rms present in the feature plane in order to adapt the algorithm to the type of anomaly under consideration.
After performing the clustering operation, the algorithm stores in memory the anomalous I rms threshold, which is equal to the value of the point with the least I rms in the anomalous cluster, plus a tolerance band of 1%.

B. Spectrogram Analysis
Spectrogram analysis requires initial tuning in relation to the portion of the time series to be analyzed.For the type of anomaly under consideration, it was decided to use as input the detrended time series.The parameters of the STFT included a rectangular window, with the number of fast Fourier transform (FFT) points not exceeding 5% of the length of the analyzed series.No overlap was employed.The sampling rate, and thus the bandwidth of the STFT, is not determined by the bandwidth of the acquisition system but by the sampling time chosen for data acquisition in the measurement system.
The algorithm detects the maximum values of power spectral density averaged over the entire band under consideration, after which it sets an alert threshold equal to 85% of the maximum power.All windows exceeding the given threshold are flagged as possible anomalies and sent to the detection algorithm.

C. Anomaly Identification
The actual identification of the anomaly occurs in the last step of the proposed algorithm, where the alert threshold calculated by the K-means and the timestamps of the anomalous windows reported by the STFT converge.The decision-maker performs a check for exceeding the alert threshold only for the anomalous windows identified by the STFT.This methodology, although highly restrictive, allows for increased confidence in anomaly reporting by reducing the number of false positives (FP).
Algorithms 2 and 3 detail the different portions of the proposed methodology.It is noticeable from these algorithms that the optimizations proposed in this article regard the choice of k, the placement of the initial point of the K-means centroids, and the fusion of the clustering and STFT algorithms.Next, the results obtained for the application of the proposed methodology on a real case will be presented.In particular, the results of processing for different observation windows will be presented in order to understand the optimal combination of operations.Finally, the results will be validated by comparison with a traditional adaptive thresholding method and a deep autoencoder.

A. Data Acquisition System
The electrical system under consideration is that of an industrial plant.The type of power grid is a three-phase one with a nominal voltage of 380 V and a root-mean-square current ranging from a minimum of a few hundred amperes to a maximum of more than 1 kA.The acquisition system used is a self-made power quality meter composed of a passive attenuated voltage probe, while the current probes are active sensors manufactured by Rogowski, which are characterized by a bandwidth of 20 kHz, a maximum measurable current of 6 kArms, and a resolution of 1% of full scale.The signal was then converted from analog to digital with an ADS131M08 from Texas Instrument: a 24-bit Delta-Sigma ADC with a maximum data rate of 32 kS.The sampling rate of the acquisition system was 16 kHz.Data have been acquired and stored at a rate of four samples every 10 s.In particular, the following were acquired: rms current and voltage, active, reactive, and apparent power values.Acquired data also included several harmonics multiple of 50 Hz.The dataset was acquired on three different three-phase transformers over a period of one month for a total of 1 047 080 samples.A picture of the installation of the probes on the transformer of the industrial plant is shown in Fig. 3.
It is important to point out that operating conditions (specifically temperature) could significantly affect the performance of the electronic hardware used to acquire the current and voltage signals.However, specific experiments have already been carried out and the results emphasized a good stability and only a minor temperature sensitivity of the PQ meter used in this work [47].As a consequence, the harsh operating conditions of the industrial application could be ignored in this work since minor effects on the meter characteristics are negligible from the algorithm point of view in case of anomalies of hundreds of Amps.

B. Setup Configuration
In terms of the setup of the proposed methodology, it was necessary to analyze the data acquired with the system described in Section IV.
As for the STFT, the ratio of the number of points on which the STFT performs the FFT to the total number of points observed for the spectrogram (time resolution) has been set to be no more than 5% of the entire window of observations.More in detail, all tests in Sections IV-C and IV-D an STFT time resolution of 1 min.On the other hand, the overlap between windows for STFT was set to zero for all tests.The window used for signal analysis was the rectangular window.
For the K-means clustering part of the proposed methodology, several tests were conducted for automatic k-search using Silhouette methodology to determine the tradeoff between the range of k to be searched and the processing time.The comparison, shown in Table I, k-search computation time, was performed on the entire dataset after peak detect processing for a total of 155 000 samples.The results confirm, as widely expected that expanding the k search results in significantly longer computation times; however, it is possible to limit the search to k between 1 and 3 because of the nature of the anomaly under analysis, reducing the search time under 3 min.The algorithm was then run on a dataset with different lengths and different time locations in order to evaluate the ideal deployment timing.

C. Deployment Results
The algorithm was run on the data acquired in the industrial plant, as outlined in Section III.Expert assessments were used to evaluate the results in order to report all anomaly events detected in the different time series.Examples of K-means clustering and STFT results are shown in Figs. 4 and 5.
The automatic k-search made it possible to distinguish cases where no anomalies were present, while the initialization of centroids to the maximum and minimum values of the I rms made it possible to speed up and accurately identify clusters even in the case of an unbalanced dataset, as shown in Fig. 6.
The deployment results were then compared with the anomalies identified by the experts.The obtained results have been reported in Section IV-D, along with a thorough comparison with the state-of-the-art.The proposed methodology is more prone to errors for analyses over very long time windows, i.e., longer than 24 h.Many false negatives (FN) occurred, while FPs affected significantly less and only for windows longer than 48 h.The proposal also performed well in analyzing time series with anomalies that were difficult to detect: these involved cases where a simple threshold, even adjusted for the window under consideration, could not have provided a correct result.Two examples of the above are shown in Figs.7 and 8.In this case, two cases of anomaly reporting error using a threshold equal to the average of the observation window values are shown in these two figures: in Fig. 7, the anomaly is not detected at all, while in Figs.7 and 8, there are numerous FPs.Increasing the threshold may decrease the FPs, but the anomaly in Fig. 7 cannot be identified without committing more errors.More in detail, in Fig. 7, a case of detecting an anomaly just before a sharp, perfectly normal increase in the baseload of current absorption: using classical techniques, this anomaly could not have been detected.In contrast, Fig. 8 shows a rough but perfectly normal I rms pattern, with no anomalies spotted by the experts; again, a thresholding or simple peak detection would not have provided the same performance as the proposed methodology.Good performance for short observation periods also goes well with the need to provide an assessment of the presence of possible anomalies in the industrial plant with relative celerity, allowing near real-time analysis and detection of possible failures.
Section IV-D will then report the comparison of the results obtained with three techniques: a classical one, adaptive thresholding, and one from the deep learning field, the deep autoencoder.

D. Results Comparison
To evaluate the performance of the proposed methodology, a performance comparison was then made between a classical adaptive thresholding system, a deep autoencoder used mainly in the deep learning field, and the proposed methodology.
The adaptive threshold was chosen to demonstrate the inapplicability of very simple techniques to the type of problem under consideration, while the autoencoder was chosen because it is the benchmark in the literature for anomaly detection techniques based on unsupervised learning [48], the primary goal of the study, despite the nonexplainability of its results, a problem that affects all deep learning techniques [49].
The deep autoencoder was structured with an architecture shown in Table II, with seven layers, five of which convolutional and one a dropout layer.These have been chosen to optimize the performance with respect to the dataset under consideration and were trained using the mean average error loss on a dataset of 50 000 samples.
The number of features used for training the network was equal to those used for the proposed methodology, which are I rms and current THD, as described earlier.The training has been performed on a dataset containing no anomalous samples, using 70% of the data for the training and 30% for the validation.The alert threshold for the deep autoencoder was equal to the maximum mean average error obtained in training.As for adaptive thresholding, this was chosen to be twice the mean of I rms in the observation window.
The results of the comparison and the details of the performances achieved by the proposed methodology are shown in Table III.In particular, the comparison between the adaptive threshold method, the deep autoencoder neural network, and the proposed methodology is presented.The proposed methodology always performed better than the other two.More in Authorized licensed use limited to the terms of the applicable license agreement with IEEE.Restrictions apply.detail, the deep autoencoder detections obtained unsatisfactory results due to the behavior of the neural network in the presence of significant anomalies characterized by high rms current spikes.In this case study, as can be seen in Fig. 9, for every single high current spike, the network reported multiple anomalies instead of one, thus spoiling the falsepositive values.

TABLE III RESULTS COMPARISON
As for the comparison of the three techniques, the three methodologies have equal performance in identifying anomalies, resulting in a low number of FN; however, the adaptive threshold and the deep autoencoder perform significantly worse with regard to FPs, especially for long analysis windows.To better highlight the superiority of the proposed approach, Fig. 10 summarizes the results achieved in terms of FP (left plot) and FN (right plot).The different colors of the bars represent the three different analyzed methods, i.e., adaptive threshold (blue bars), deep autoencoder (red bars), and proposed method (yellow bars).As it is clearly visible in analyzing the figure, the proposed approach guarantees an almost perfect result with zero FPs for every dataset under analysis, while only two datasets out of 11 include a few FNs.
On the other hand, regarding the efficiency of the three algorithms, compared from the point of view of computational load and processing time, the best was the adaptive threshold.This is because the operations to be performed to determine the threshold for each signal observation window are few and require simple mathematical operations.
On the other hand, in the case of deep autoencoder, its computational load and, therefore, its execution time turned out to be the worst.In fact, to obtain comparable results with the proposed methodology, which was optimized to run in a few seconds, it was necessary to deploy the network on a PC equipped with a dedicated graphics card (NVIDIA RTX 8000).In contrast, the proposed methodology achieved the same results on a low-end PC.This issue, peculiar to deep learning techniques, is well known and widely discussed in the literature [50].
To summarize, the adaptive threshold turned out to be the worst of the three, both in terms of results and execution autonomy, as it requires human supervision in setting the relative threshold.The deep autoencoder is completely unsupervised; however, it turns out to be overly sensitive, which can result in reporting several false alarms.The proposed methodology requires setting only the temporal resolution of the STFT based on the length of the window to be analyzed: therefore, it was found be the technique with the best results and was practically unsupervised in training, achieving zero FPs and less than 4% of FN with respect to the total number of anomalies identified by experts.

V. CONCLUSION
This article presented a new technique for fault detection of rotary machines characterized by ohmic-inductive electrical load in large industrial plants.
The aim of the work has been to propose a novel unsupervised methodology based on the fusion and optimization of a machine learning technique, K-means clustering, and a classical frequency analysis technique, STFT.Proposals also included optimizing the choice of k and initialization of centroids in the K-means clustering algorithm.
The proposal has then been tested on real data collected in an industrial plant, using fault reports made by experienced engineers as reference.In addition, the proposal was compared with two techniques employed in the state-of-the-art: adaptive thresholding and deep autoencoder, the former a classical technique and the second a deep learning technique.
Although the proposal is not as unsupervised as the deep autoencoder, requiring the choice of a temporal resolution of the STFT proportionate with the time window to be analyzed, it has obtained significantly better results than the deep autoencoder.This was mainly due to the combined use of unsupervised clustering and frequency analysis, simultaneously detecting features relating to the individual sample and others relating to the temporal dynamics of the anomaly.
Future developments of the work will involve online implementation of the proposed methodology in industrial plants.

1 a
i ← average distance of the i-th point from the other points of C i 2 b i ← average distance of the i-th point from the other points of C

Algorithm 2 1 Feature 2 K 3 Initialize: 4 C 5 C 6 7 I 8 Algorithm 3 1 Spectrogram ← STFT on I r ms time series 2 Initialize:
Clustering Input: I r ms peak time series Input: THD peak time series Output: I r ms anomalous threshold C(n): n-th cluster centroid where n ∈ N plane creation ← I r ms and THD Time series parameter choice ← Silhouette method, k ∈ [1, 4] starting centroids max [I r ms ] and min [I r ms ] (N) ← K-Means algorithm output (anomaly) ← C(n) such that I r ms of C(n)=max[C(N)] Anomalous cluster ← C(anomaly) rms anomalous threshold = = (min [I r ms ] of Anomalous Cluster)-1% end STFT Input: I r ms detrended time series Output: Anomalous time windows W(n): n-th time window where n ∈ N T h : Alert threshold |S(I RMS )| : mean of Power Spectral Density on entire signal band for each window T h = 0.85 × max [|S(I R M S )|]

3 Find:
W(n) such that |S(I R M S )| ≥ T h W(anomaly) ← W (n) end 4 IV.EXPERIMENTAL DEPLOYMENTThis section will first present the acquisition system used, its technical characteristics, and the acquisition of the dataset.

Fig. 3 .
Fig. 3. Pictures of the installation in the industrial plant.(a) Current probes connected to the transformer.(b) Voltage probes.

Fig. 7 .
Fig. 7. Example of anomaly detection with adaptive threshold, resulting in an FN and different FPs.

Fig. 8 .
Fig. 8. Example of anomaly detection with adaptive threshold, resulting in different FPs.

Fig. 10 .
Fig. 10.Summary of the results comparison between the adaptive threshold (blue bars), deep autoencoder (red bars), and proposed method (yellow bar).The left plot refers to FPs, while FN is shown in the right plot.