Estimation of Rotational Speeds Based on Gearbox Vibrations via Artificial Neural Networks

This work investigates the estimation of rotational speeds based on neural networks and high sampled gearbox vibration data. A total of 18 different network architectures consisting of fully connected network layers, long short-term memory cells, and different activation functions are compared. In addition, the impact of the sampling rate of the vibration data, the signal length, and the input data type is investigated. Vibration data in the time domain, frequency domain, and a combination of both are examined as input data. Finally, the number of neurons is varied to obtain the best possible rotational speed estimation. In order to generate an appropriate dataset for the training of the neural networks, an experimental setup is presented, which is used to record the vibration data of the gearboxes. It becomes apparent that especially vibration data in the frequency domain are suitable for the estimation of the rotational speed. Furthermore, it could be shown that the low sampling rates result in an inaccurate speed estimation. Likewise, signal lengths that are too short result in inaccurate estimates. Overall, the speed can be estimated with an accuracy of 7.88 rpm at an average speed of the test dataset of 202 rpm.


I. INTRODUCTION
I N MANY areas of condition monitoring of gears and bearings, the rotational speeds that are included in the digital signal processing are known. However, this does not apply to all areas. An example of this is the mobile machinery sector, which is also the origin of the research in this article. In this sector, no rotational speed sensors are installed as standard due to the increased costs, the wiring, and the mechanical protection that is sometimes necessary. The estimation of rotational speeds or vehicle speeds is necessary in almost all applications of rotating components. Thus, in the field of condition monitoring of gearboxes, the rotational speed is a crucial parameter to be able to analyze gearbox vibrations successfully. Depending on the rotational speed, the conclusions can be drawn about the damage in the vibration-based analyses of bearings or gears. In [1], depending on the geometry and the rotational speed, the damage frequencies of different damages are first determined. From the frequency bands occurring in the amplitude spectrum and the theoretically calculated damage frequencies, the damage to the bearing or the gear is concluded [2]. In [3], the envelope analysis is used to investigate the damaged tooth flanks of a high-speed gear. In [4], a novel method based on gated recurrent unit and complex wavelet packet energy moment entropy is presented for the detection of damage to rotating components. Simulative and experimental data are examined. The rotational speed is further applied to frequently use preprocessing of vibration signals of gearboxes. In [5], order analysis is used to detect tooth fractures in planetary gears. Here, a transformation into a rotational plane is performed by the measured rotational speed. A comparable method with respect to the relationship with the rotational speed is angular resampling, which was first presented in [6]. Here, the vibration signal is resampled as a function of rotational speed to produce a signal that is independent of the rotational speed. In [7], this method is investigated on wind turbines. In addition to the examples mentioned above, there are many other analyses that underline the importance of a measured rotational speed. It becomes apparent that measuring the rotational speed is a crucial factor for digital signal processing of rotating components. In most applications, direct measurement methods, such as inductive speed sensors [8], are installed directly in the gear housing and thereby detect passing teeth of a gear. Further-more, torque measuring shafts [9] are often installed in the system in order to record the rotational speed. Indirect measurement methods usually use recorded vibrations or accelerations to derive the rotational speed. However, the estimation is still a difficult problem, because inaccurate estimates are not sufficient for subsequent digital signal processing. According to [10], the existing problems in estimation include, above all, too small signal-to-noise ratios, harmonic disturbances, signal overlaps due to resonances, nonexistent harmonics, or the presence of too many harmonics. Furthermore, they present an overview of approaches of digital signal processing for estimating the rotational speed. In [11], an overview of rotational speed estimation for machine fault diagnosis is given. In addition to vibrations, electrical currents or video data from cameras are mentioned as physical quantities for tacholess rotational speed estimation. For vibrations, different frequency-based approaches with order tracking are presented. For electric currents, an estimation of the rotational speed is carried out via the current ripple. A rotational speed estimation from motor currents based on ripple is given in [12]. The estimation is presented based on an adaptive signal decomposition and selection method. The video-based methods examine the displacements of pixels to detect rotational speeds. An example of a video-based speed estimation is [13]. Here, the displacement of pixels is detected via a high-speed camera, and a rotational speed is determined. Many vibration-based approaches to rotational speed estimation search for harmonics in the signals. This includes tracking maxima in spectrograms in combination with a phase-based demodulation method in [14]. The spectrograms, which are crucial for the database in this approach, are determined via the short-time Fourier transform (STFT). Similarly, in [15], an order tracking method based on STFT is presented to determine gearbox speeds from vibration signals as a chirplet-based approach. Frequently, a wavelet transform is used as an alternative to this to provide more flexibility in terms of frequency and time resolution. In [16], complex shifted Mortlet wavelets are used to find the optimal shift and bandwidth for estimating the current rotational speed.
Approaches for the estimation of rotational speeds via artificial neural networks often use current or voltage measurements. This is possible for many industrial applications that are supplied via electric motors. The application case, here, is usually three-phase motors [17], [18], [19]. In these approaches, either directly measured motor currents or a voltage converted from the current was investigated. These currents or voltages were used as input data by neural networks to determine the respective motor rotational speeds. Currently, there are only few approaches for estimating the engine speed from vibration data via machine learning or even deep learning. In [20], vehicle speeds of a car are determined via a convolutional neural network (CNN) based on low-frequency acceleration values with a sampling rate of 10 and 100 Hz. Here, a root mean square error (RMSE) of about 2 m/s is obtained with a considered velocity range of about 0-35 m/s. In [21], the speed of the next 15 and 30 s is predicted via a long short-term memory (LSTM) network. The relative speed and the distance to the vehicle in front are used as input data.
Neural networks for rotational speed estimation are, thus, mainly used in the field of three-phase motors or are investigated using very low-frequency accelerometers. Since there are hardly any studies on estimating rotational speeds from higher sampled vibration data using neural networks, this article addresses this issue. The objective of this article is to set up and compares different neural network architectures in order to implement the most accurate estimation of rotational speed from vibration data. For this purpose, in addition to network architectures, various sampling rates, input data types, input length of the vibration data, and number of neurons are investigated and compared.
This contribution is structured as follows. Section II describes the network architectures used and the theoretical background. Fully connected (FC) networks, LSTM network layers, and various activation functions are used. Section III describes the experimental setup. First, the test rig for operating the gearbox is presented. This is followed by the load profile of the gearbox used. Subsequently, the datasets and parameters used are described. Section IV deals with the training and the test of the networks. At the beginning, a comparison of different input data types and sampling rates is presented. Subsequently, the length of the vibration data used is examined. Based on these investigations, first, the suitable network architectures are selected, and finally, the network is optimized with regard to the number of neurons used. Finally, in Section V, a conclusion of the contribution is given.

II. NETWORK ARCHITECTURES
A total of 18 different neural networks are investigated, which form different combinations of FC network layers, LSTM cells [22], and different activation functions. The output value o of the individual neurons of an FC network layer is obtained via an activation function h(·) according to (1) v and w are the vector representations of the network inputs and the weighting factors [23]. The chosen activation function is crucial for the behavior of the net. Usually, continuously differentiable monotone activation functions h(·) are used. The activation functions used are the logistic sigmoid function [23], the logistic hyperbolic tangent function, the rectifier linear unit (ReLU) function [24], and the leaky ReLU function [25]. The structure of an LSTM block and its inputs and outputs are shown in Fig. 1. The inputs of an LSTM block result from the current input vector v t of the current time segment t, the previous output vector o t −1 , and the previous cell state c t −1 . The outputs result from the current output vector o t and the current cell state c t . Central components of the LSTM are the input gate i t , the output gate a t , the forget gate f t , the cell candidate g t , and the cell state c t . In the following, we consider the mathematical relationships of the LSTM block according to [26]. The weighting factors of an LSTM block are composed of the weights of the input values W v and the weights of the recurrent branch of the previous output values W r . Furthermore, the bias values B are also used in an LSTM. The matrices W v and W r represent the concatenations of the individual weights according to the following: An analogous concatenation represents the matrix of bias values B according to The indexing f , g, i , and a denote the respective variables of the forget gate, the cell candidate, the input gate, and the output gate, respectively. The forget gate f t is used to decide in an LSTM block whether the previous values of the cell c t −1 are taken or not. The forget gate is defined to The result feeds directly into the Hadamard product with the values of the previous cell c t −1 . In the second step of the LSTM, the cell candidate and the input gate are used to calculate the cell state update. For this purpose, the cell candidate g t is defined to be The input gate i t is further defined by Using (5)-(7), the new cell state c t of the current time segment is given by The output gate a t of the LSTM block is further defined by From this, in combination with the current cell state c t , the output vector o t becomes The investigated networks can be divided into networks with and without feedback. Table I shows a list of the examined neural networks with feedback. IL denotes the input layer, OL the output layer, DO the dropout layer (30%), and FC the FC layer. The indices for the FC layer indicate the activation function used. The number of input neurons of the networks is calculated according to (14)-(16) based on the input data. The output layer has only one output neuron for all networks considered, since only the rotational speed is returned as a scalar value. As can be seen from the listing, these are LSTMbased networks. A simple LSTM network, a network with two LSTM layers, and combinations of LSTM networks with FC layers and different activation functions are investigated. In Table II, a listing of the investigated FC neural networks is shown.
The FC neural networks are investigated for different activation functions. Furthermore, cascaded network layers of the same activation function are examined in order to be able to reproduce more complex relationships in the network. The number of neurons of the neural networks is further determined by parameter optimization.

III. EXPERIMENTAL SETUP
This section describes the test rig, the load profile used, and the datasets. First, the test rig, which is used for the vibration measurements on the gearboxes, is presented. Afterward, the load profile used for the measurements is described. Finally, the datasets and their preprocessing are described. Fig. 2 shows a schematic drawing of the test rig for laboratory testing of gearbox vibrations as a replica of the loader wagon. On the right-hand side of the figure, there is an oil engine, which drives the gearboxes connected to it at the input shaft with an adjustable rotational speed. A shaft is coupled to the output of the gearbox, driving a second gearbox of the same type. The second gearbox is used, together with the following pump, to apply a load torque to the drive for the gearbox under investigation. An adjustable pressure relief valve limits the flow of oil through the circuit and allows the load torque to be varied. An oil tank is used to minimize the temperature influences on the oil, which can result in a change in torque. A torque measuring shaft is inserted between the input shaft of the second gearbox to measure both the torque and the rotational speed of the input shaft of the gearbox. Since both gearboxes have the same gear ratio, the measured rotational speed and torque also correspond to that of the gearbox under investigation. To measure the occurring gearbox vibrations, a piezoelectric vibration transducer is attached to the gearbox housing (material: cast iron) via a screw connection in order to ensure good material coupling of the sensor. A iCS80 [27] sensor is used. It has a linear frequency range from 0.13 Hz to 22 kHz (3-dB cutoff frequency), a measuring range of ±55 g, and a voltage sensitivity of 100 mV/g. The signal of the piezoelectric vibration sensor is sampled at a sampling rate of f s = 51.2 kHz and a resolution of 24 bit. The signal is resampled to a frequency of 44 kHz.

A. Test Rig
In a field study in a loader wagon, rotational speeds and torques of the gearbox under investigation were recorded over the period of one harvesting season. From that study, an average load profile of the gearbox has been determined. Fig. 3 shows the average rotational speed and torque at the input shaft of the gearbox during the active time of the gearbox in the field measurement.
The illustration can be divided into three segments separated by black dotted lines. These describe the loading, unloading,

B. Dataset and Preprocessing
According to the derived load profile, the selected rotational speed range for the investigation is mostly between 100 and 550 rpm. To obtain a more representative sample of training data, different gearboxes of the same type are used to generate the datasets. A total of 10 471 s of vibration and rotational speed data are recorded. This dataset is split into 8000-s training data, 1999-s validation data, and 472-s test data. Thus, there is a split of 76.4%-19.09%-4.51%. Here, the test dataset is composed of two individual measurements, which are used only for the test to ensure an independent dataset. Fig. 4 shows a breakdown of the three datasets as histograms with relative frequencies.
It can be seen from the figure that the ranges between 120 and 170, 310 and 340, and 430 and 430 rpm represent the majority of the dataset. This is a result of the load profile of the loader wagon, which drives the gearbox mostly in these ranges. Furthermore, the test dataset shows a less flat distribution than the other two datasets. A resampling filter is used to generate datasets with the sampling rates of 2.5, 5, 7.5, 10, 12.5, 15, 17.5, and 20 kHz. Using these, the need for high sampling rates of the vibration signal for rotational speed estimation via neural networks will be investigated in the remainder of this contribution. Low sampling rates could significantly reduce the required computational effort of the neural network. Since the datasets have been recorded as continuous measurement series, a segmentation of the datasets is necessary. For this purpose, datasets with different lengths of the time windows of the vibration data are generated. Time windows of the vibration data are examined starting with a length of 0.1 s. In total, 20 different datasets with the window lengths of 0.1-2 s are generated in 0.1-s steps. Depending on the selected length, the number of samples ranges from 5235 for the largest window of 2 s to 1 04 710 samples for the smallest window of 0.1 s. The rotational speed associated with each sample is determined from the average of the rotational speeds over the respective window. Since the rotational speed usually hardly changes within seconds in the investigated application, the mean value here has a small standard deviation.
As a preprocessing of the data, the segmented vibration data x[n] are multiplied by a Blackman window function [28] to create uniform vectors The Blackman window γ B [n] for a length N is given by with 0 ≤ n ≤ M − 1, where M is (N/2) when N is even and (N + 1)/2 when N is odd. Three variants are examined as input data to the neural networks in this contribution. The first variant represents the windowed vibration data x γ B in the time domain. The second variant shows the windowed vibration data in the frequency domain X γ B k , which is defined according to [29] as follows: Here, For the investigation of the frequency data, the absolute values of the one-sided fast Fourier transform are considered. Accordingly, the input size n I,freq of the neural networks results in The input size n I,time,freq of the networks with the combination of the two vectors is given by IV. TRAINING AND TEST The structure and training of the previously defined neural networks depend mainly on the type of input data (time, frequency, time, and frequency), the selected sampling rate, the length of the vibration data used, and the number of neurons. Altogether, this results in a variety of combinations, which cannot be fully investigated due to the long training times. In this contribution, therefore, all networks are first trained for all sampling rates and all input data types for a constant length of 0.5-s vibration data. This results in 432 different neural networks. Each network layer has the same number of neurons. Furthermore, three different numbers of neurons (100, 200, and 300) are trained for these 432 networks. Thus, it should be avoided that an unfavorable number of neurons is chosen, which is not able to represent the problem at hand. This results in a total of 1296 different neural networks in the first analysis. In the further course, the length of the input data and the number of neurons are examined. As a quality criterion, the RMSE is used.
The networks in this contribution were trained with the Adam optimizer. Besides a learning rate of 0.1, the default settings from [30] were used (β 1 = 0.9, β 2 = 0.999, and = 10 −8 ). During the training, all networks are trained for a maximum of 500 epochs. One epoch describes a complete run of the test dataset. During training, the dataset is divided into mini-batches, since training with all samples at the same time would exceed the RAM memory of the graphics card used. A mini-batch size of 1024 samples has been used. Furthermore, a termination criterion has been used, which terminates the training prematurely, if no improvement of the RMSE value has been achieved over 100 mini-batches. This can reduce overfitting of the network if no further improvement is achieved. The following investigation for selecting a suitable neural network is divided into three phases. In the first phase, different sampling rates of the vibration signal and three different input data types are investigated. Subsequently, the length of the input data is varied to investigate which length provides the best rotational speed estimation. Finally, the remaining networks are varied with respect to their number of neurons.

A. Sampling Rate and Input Data Comparison
In the first step, for a constant input length of 0.5-s vibration data, the sampling rates from 2.5 to 20 kHz in 2.5-kHz steps are examined. The datasets for the different sampling rates are generated by a resampling filter from the initially created dataset with a sampling rate of 44 kHz. Furthermore, for these eight variants, a distinction is made between the input data in the time domain, the frequency domain, and the combination of time and frequency domain. To avoid a completely unsuitable number of neurons for the networks, the neuron numbers 100, 200, and 300 are trained. The result of the 1296 neural networks sorted by the sampling rates is shown in Fig. 5. The RMSE values of the test data after training are shown. Furthermore, the minimum values (red circles) of all networks for each of the three numbers of neurons are shown.
It can be seen that the comparable results can be obtained for the different sampling rates. Overall, the sampling rate of 10 kHz achieves the most accurate rotational speed estimation. Since the average rotational speed of the test data is ≈202 rpm, most networks with relatively high RMSE values can only provide a very poor estimation. Since the networks with very high RMSE values are not relevant for further consideration, the representation was limited to values up to 40 rpm. With the minimum values shown, it can be seen that for any chosen sampling rate, networks with fairly good estimation can be found whose values are less than 20 rpm. The estimation with the lowest RMSE is achieved with an RMSE value of about 16 rpm. In the next step, the individual network architectures are compared. For this purpose, the RMSE values of the test data are illustrated for each network architecture in Fig. 6. Furthermore, the input data types are highlighted by colors.
The color coding clearly shows that the data in the frequency domain achieve the best results. It is noticeable that some network architectures achieve better results with the time input data than with frequency data. The examples of this are the networks 5, 13, and 17. Overall, however, these networks perform significantly worse. With the current parameters, networks 3, 4, 6, 12, and 16 show the best estimation accuracies with the RMSE values below 17 rpm. The networks 2, 5, 13, 17, and 18 show the worst estimation accuracy. Based on the investigation of the first parameter selection of the networks, it can be said that the lowest RMSE values and, thus, the most precise estimation can be achieved at a sampling rate of 10 kHz. For this reason, this sampling rate is used for further investigations. Furthermore, it has been shown that with regard to the input data type, the vibration data in the time domain provide very inaccurate estimations. The combination of the data in the time and frequency domain achieves somewhat better values. Overall, the best results are obtained by the vibration data in the frequency domain. Consequently, this input data type is chosen for further investigations. The different network architectures are mostly comparable with respect to the RMSE values. Only networks 2, 5, 13, 17, and 18 achieve very inaccurate estimations. Therefore, they are excluded from the further investigation.
It has been shown that the input data in the frequency domain achieve the best results. As an example, two frequency spectra in dB up to 1 kHz have been investigated. In these, it is investigated whether the rotational speed can be determined directly from the vibration signal.
The measurement (a) has a rotational speed of 103 rpm (1.72 Hz) on the input shaft. Using the gear ratio, the resulting rotational speed is 28.8 rpm (0.48 Hz) for the second shaft and 6.74 rpm (0.11 Hz) for the third shaft. The measurement (b) has a rotational speed of 420 rpm (7 Hz It can be seen that the rotational speed and harmonics are not directly identifiable, and therefore, common order tracking methods are not applicable. A possible explanation for this could be coupled vibrations of other surrounding mechanical components as well as reflections of the vibrations. As expected, however, it can be seen that the distinctive frequency ranges are higher at faster rotational speeds.

B. Signal Length Comparison
In the next step, the input length of the vibration signal for training the neural networks is investigated. The length is varied from 0.1 to 2 s in 0.1-s steps. This parameter distinction is used to investigate which input length achieves the best estimation accuracies. For this purpose, Fig. 7 shows  the RMSE values of the test data of the remaining 13 networks for each of the investigated input lengths. All networks have been trained for three numbers of neurons (100, 200, and 300) with 10-kHz sampling rate. In addition, the RMSE minimum values of the networks are highlighted in the figure.
From the figure, it becomes apparent that very short input lengths (≤0.7 s) achieve worse results with the RMSE minimum values between 16 and 22 rpm. Better results are obtained by the networks with longer input lengths with the RMSE minimum values smaller than 16. The best estimation accuracy is obtained with a length of 1.6 s with an RMSE minimum value of 11.2 rpm. For this training series, a comparison of the individual networks is again examined. For this purpose, the individual RMSE values of the test data sorted by the respective network architecture are illustrated together with the RMSE minimum values in Fig. 8.
It can be seen that network architectures 1, 3, 4, 6, 10, 12, and 14 achieve the best estimation accuracy with the RMSE minimum values of ≤13 rpm. Network architectures 7, 9, 11, and 15, however, do not represent the present context well and achieve the worst speed estimates.
Based on the results of the trained networks, it can be stated that the best result can be achieved with a minimum RMSE value of 11.2 rpm for an input data length of 1.6 s (LSTM-based network architecture 6). Architectures 1, 3, 4, 6, 10, 12, and 14 are chosen for the following optimization of the number of neurons, since these have achieved the best estimation accuracies.

C. Neuron Number Comparison
The number of neurons is crucial for the training of the neural network. Too few neurons in a network layer will not be able to appropriately represent a possible complexity of the signal, resulting in poor results. Too many neurons increase the computation time of the network enormously. Thus, the time in training and test is increased. The training time of the networks in this article was a 5 min per network, while the calculation of the test values by the trained networks was in the millisecond range. Since only the time for calculating the test values is important for real implementation, it can be assumed that the application is sufficiently fast. Furthermore, too many neurons can increase unwanted overfitting of the network [31]. As a result, the adaptability of the network can be poor. The selection of the number of neurons aims to choose the smallest possible number of neurons over which an existing context can be analyzed [32]. In the following, the same number of neurons is chosen for each network layer. The number of neurons is varied from 5 to 1000 in steps of 5. The RMSE values of all seven network variants for the respective neurons are shown in Fig. 9. In addition, a moving average over ten elements with the averaged RMSE value of the respective number of neurons is shown. The black filled circles visualize the networks with the best estimation accuracies with the RMSE values less than or equal to 9.5 rpm.
Based on the determined RMSE values and the moving average of the trained networks, a trend in the number of neurons can be identified. Too few as well as too many neurons result in higher RMSE values. The lowest mean values are achieved in the range from 200 to about 450 neurons. Over 500 neurons, the mean value curve drops again. Fewer neurons than 200 or more than 550 result in the worst RMSE values on average. Looking at the networks with the RMSE values less than or equal to 9.5 rpm, it can be seen that most of these occur in the region of the lowest mean value. In the second step, the individual network architectures are compared in the range of RMSE values of 5-30 rpm. Fig. 10 shows the RMSE values for the individual network architectures.
Based on the illustration, the individual network architectures as well as the best rotational speed estimations can be identified. It can be seen that network architecture 6 (LSTM network with subsequent hyperbolic tangent activation function) with 505 neurons again performs best with an RMSE minimum of 7.88 rpm. This corresponds to a mean percent error of 2.40% as well as a mean absolute error of 4.09 rpm. Furthermore, it can be seen that the LSTM-based networks (1, 3, 4, and 6) tend to perform best. The only exception to this is network 14, which is composed of two FC network layers with the hyperbolic tangent activation function. Consequently, based on the results, it can be concluded that both LSTM cells and the hyperbolic tangent activation function are well suited for determining rotational speed from vibration data for this gearbox type. In order to compare the accuracy, selected networks of the different described training phases are exemplarily shown in Fig. 11.
The figure shows the measured rotational speed, the results of the best networks and, as an example, a network with a relatively high RMSE of more than 50 rpm. The network with an RMSE of over 50 rpm reflects the real rotational speed very poorly in some areas. It can be seen that the network is not able to properly represent high rotational speeds. The other networks can represent the measured rotational speed better according to their RSME value. Overall, the best result hardly shows any deviations from the measured rotational speed. Very low rotational speeds below 100 rpm are poorly detected by all networks. A possible explanation for this could be the small number of samples with low rotational speeds in the dataset used. The use of a more balanced dataset (see Fig. 4) within the training should improve the rotational speed estimations for this range. Within the examined application of the gearbox in a loader wagon, the rotational speeds change very slowly for the most part. In other applications, this may not be the case, and gearbox rotational speeds can change very quickly. The rotational speed changes in the test dataset used also indicate a good rotational speed estimation for changing speeds. For gearboxes with frequently changing speeds, a new training could be useful. Depending on the rate of change, it would be reasonable to investigate shorter signal lengths to achieve a faster response time.

V. CONCLUSION
In this contribution, we proposed an investigation on the estimation of gearbox rotational speeds based on vibration and artificial neural networks. The rotational speed estimation was examined for different input data in the time domain, frequency domain, and a combination of both for a total of 18 different network architectures. It was shown that the input data in the frequency domain achieve the best results, while the other two variants achieved very inaccurate estimations. Furthermore, different sampling rates of the vibration data were investigated. It was shown that, in general, good estimation could be achieved for all sampling rates. The most accurate estimation could be achieved with 10-kHz sampling rate. The variation of the signal length showed very inaccurate results for the signal lengths shorter than 0.8 s. The most accurate estimation was achieved for a signal length of 1.6 s. Finally, the number of neurons in the individual network layers was compared. It could be shown that the significantly worse rotational speeds are estimated for less than 200 and more than 550. The RMSE minimum value of 7.88 rpm was obtained for a number of 505. Overall, the rotational speed can, thus, already be estimated quite accurately using the approaches investigated. When comparing the network architectures across all studies, it was found that the LSTMbased networks, in particular, achieved the best results. For future work, additional investigation of CNNs and retraining with a larger database are planned. This could further reduce the deviation of the estimated rotational speed. In addition, the transferability to the real vehicle will be investigated. For this purpose, a new training of the network based on further data from the vehicle is planned. In addition, the method presented for selecting the networks and parameters is to be transferred to other gearbox types.