Skip to Main Content
This paper considers the use of feature selection within the state detection module for an ocean turbine condition monitoring system. The goal is to reduce the quantity of data to be processed while maintaining or improving state detection capabilities. Five feature selection techniques (Chi-squared, Information Gain, Signal-To-Noise, AUC and PRC) are evaluated based on their effects on four widely used machine learning algorithms, namely Naive Bayes, k-Nearest Neighbors, Decision Tree and Logistic Regression, when each machine learner is trained on the top n features selected by each feature selection technique. Six values of n (2, 4, 6, 8, 10 and 15) were considered. Features were extracted from the raw vibration signals using a Short Time Wavelet Transform with Baselining (STWTB) technique designed to allow for reliable state detection regardless of the turbine's operating conditions, which are often reflected within its vibration readings. The condition-independent features extracted by the STWTB are then fused to combine all the data observed by all sensor sources. Models were built on data gathered at one operating condition and tested against data from a different operating condition to simulate the problem of building models which work regardless of operating condition. Results show that k-Nearest Neighbors, Naive Bayes and Logistic Regression have improved classification performance when using less than 11% of the 78 available features, with Logistic Regression needing just 2 features selected by the Signal-To-Noise technique to generate a perfect classification model. The Decision Tree performed best without feature selection.