Abstract:
In modern process industries, it is of significance to build data-driven soft sensors to predict key performance indicators (KPIs) that are difficult to measure directly....Show MoreMetadata
Abstract:
In modern process industries, it is of significance to build data-driven soft sensors to predict key performance indicators (KPIs) that are difficult to measure directly. However, the industrial data obtained are usually characterized by uncertain time series, different degrees of outliers, multiple redundant variables, and abundant unlabeled data, presenting difficulties in data-driven modeling. To address these difficulties, a semi-supervised and robust data-driven modeling algorithm is proposed. First, the t-distributed stochastic neighbor embedding (t-SNE) is applied to reduce the dimensionality of unlabeled samples. Second, a bidirectional long short-term memory (BiLSTM) network based on a capped Huber loss is developed to deal with outliers, and the least absolute shrinkage and selection operator (LASSO) is introduced to remove redundant variables. Third, experiments on an artificial dataset and an industrial dataset demonstrated that the developed algorithm had a higher prediction accuracy than other state-of-the-art methods. Furthermore, ablation studies were conducted to evaluate the contributions of different techniques to the model performance.
Published in: IEEE Transactions on Instrumentation and Measurement ( Volume: 74)