Sensor-Driven Learning of Time-Dependent Parameters for Prescriptive Analytics

Big data analytics is rapidly emerging as a key Internet of Things (IoT) initiative aiming at providing meaningful insights and supporting optimal decision making under time constraints. In this direction, prescriptive analytics has just started to emerge. Prescriptive analytics moves beyond descriptive and predictive analytics aiming at providing adaptive, automated, constrained, time-dependent and optimal decisions. The use of time-dependent parameters in prescriptive analytics models provide a more reliable and realistic representation of the complex and dynamic environment and the associated decision making process; however, their estimation poses signiﬁcant challenges due to the uncertainty derived from inaccurate user input, noisy data, and non-stationarity of real-world data streams. Since feedback and learning mechanisms for tracking the prescriptive analytics are crucial enablers for self-conﬁguration and self-optimization, this paper proposes an approach for sensor-driven learning of time-dependent parameters for prescriptive analytics models deployed in streaming computational environments. The proposed approach was validated in an Industry 4.0 use case, while it was further evaluated through extensive simulation experiments. The proposed approach overcomes challenges related to uncertainty derived from user’s input, non-stationary data and sensor noise and provides estimates of time-dependent parameters that lead to more reliable prescriptions.


I. INTRODUCTION
Big data analytics is rapidly emerging as a key Internet of Things (IoT) initiative aiming at providing meaningful insights and supporting optimal decision making under time constraints [1]. Data analytics is categorized into three main types characterized by different levels of value and intelligence [2], [3]: (i) descriptive analytics, answering the questions ''What has happened?'', ''Why did it happen?'', ''What is happening now?''; (ii) predictive analytics, answering the questions ''What will happen?'' and ''Why will it happen?'' in the future; (iii) prescriptive analytics, answering the questions ''What should I do?'' and ''Why should I do it?''.
Currently, the vast majority of data analytics efforts are spent on descriptive and predictive analytics, while The associate editor coordinating the review of this manuscript and approving it for publication was Feng Xia .
prescriptive analytics is less mature [4], [5]. Recently, however, prescriptive analytics has been increasingly gathering research interest since it is considered as the next step towards increasing data analytics maturity and leading to optimized decision making, ahead of time, for business performance improvement [5]. Prescriptive analytics aims at suggesting (prescribing) the best decision options to take advantage of the predicted future utilizing large amounts of data [6]. To do this, it incorporates prediction events about what might occur and utilizes artificial intelligence, optimization algorithms and expert systems in a probabilistic context to provide adaptive, automated, constrained, time-dependent and optimal decisions [3], [5].
To this end, time-dependent parameters can contribute to prescriptive analytics models that provide a more reliable and realistic representation of the complex and dynamic environment and the associated decision making process [7], [8]. VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ This is due to the fact that different actions may be optimal or more appropriate in different implementation time. Time-dependent parameters can be found in functions and analytics models in various domains such as: econometrics, inventory management, demand management, reliability analysis, social media, etc. Examples include decisions about: economic order quantity models, predictive maintenance actions, online advertisements for offers, etc. However, their estimation poses significant challenges due to the uncertainty derived from inaccurate user input, noisy data, and nonstationarity of real-world data streams [9], [10]. Feedback and learning mechanisms for tracking the results generated by the actions taken are crucial enablers for self-configuration, self-optimization and proactive event processing [3]. The advancements in IoT technology have the potential to facilitate sensor-driven learning of time-dependent parameters for prescriptive analytics models. Tracking inventory replenishment through RFID, cost functions of mitigating maintenance actions, and preferences of social media users within time constraints are some indicative applications. To the best of our knowledge, sensor-driven learning for self-configuration of prescriptive analytics models in terms of their time-dependent parameters is an unexplored area. This aspect obtains higher importance taking into account that research on prescriptive analytics on streaming data has just started to emerge.
In this paper, we propose an approach for sensor-driven learning of time-dependent parameters for prescriptive analytics models deployed in streaming computational environment. The proposed approach overcomes challenges related to uncertainty derived from the user's input, non-stationary data and sensor noise and provides estimates based on effective change detection. These challenges, existing in data analytics algorithms, are even higher in estimations of time-dependent parameters based upon non-stationary sensor data due to their dynamic nature. On the other hand, the accuracy of such estimations is crucial for the reliability of the prescriptive analytics models.
The rest of the paper is organized as follows: Section II provides a literature review on the processing of non-stationary sensor data and on learning of time-dependent parameters. Section III describes the proposed approach for sensor-driven learning of time-dependent parameters for prescriptive analytics. Section III presents a case study in an Industry 4.0 context, while Section IV presents further experimental results for the validation of the proposed approach. Section V concludes the paper and discusses our plans for future work.

II. LITERATURE REVIEW A. PROCESSING OF NON-STATIONARY SENSOR DATA
From the data distribution point of view, there are two types of data streams: stationary (stable) data streams, where the probability distribution of instances is fixed, and non-stationary (evolving) data streams, where the probability distribution of incoming data evolves or target concepts (labelling mechanism) change over time [11], [12]. In machine learning, data mining and predictive analytics, this latter is referred to as concept drift [13]. In pattern recognition the phenomenon is known as covariate shift or dataset shift [13]. In signal processing the phenomenon is known as non-stationarity [14].
From the generation of non-stationary monitoring signals, the sources may be different, and these may cause many different characteristics, such as multiple sampling frequencies, different types of sensors reflected the operation state of the same device or subsystem [15]- [20]. The advancements in IoT technology lead to significant challenges in terms of adaptability and computational efficiency [14].
Real-world data streams are usually non-stationary in which the underlying distribution evolves and must be analyzed and adapted accordingly in (near) real-time [9], [10], [14]. This aspect is of increasing importance as more and more data is organized in the form of data streams rather than static, and it is unrealistic to expect that data distributions stay stable over a long period [14]. Such non-stationary data sets can be observed in many real-world cases, e.g. climate, transportation, manufacturing, energy, fraud management, social media, software engineering data streams, etc.
Concept drifts may manifest in different forms over time [13], [14]. These forms can be divided into four general types: abrupt (sudden), gradual, incremental, and recurrent (recurring). In abrupt or sudden concept drifts, the data distribution at time t is replaced suddenly with a new distribution at time t+1; Incremental concept drifts occur when the data distribution changes and stays in a new distribution after going through some new, unstable, median data distributions; In gradual concept drifts, the amount of new probability distribution of incoming data increases, while the amount of data that belong to the former probability distribution decreases over time; Recurring concepts happen when the same old probability distribution of data reappears after some time of different distribution.

B. CHANGE DETECTION IN STREAMING DATA
In streaming computational environments, an important task in analysing high-dimensional time series data generated from sensors is to detect changes in the statistical properties of the time series [21]. A restrictive assumption in change point analysis is ''stationarity under the null hypothesis of no changepoint'', which is crucial for asymptotic theory but not very realistic from a practical point of view [22]. Non-stationary and noisy data pose significant challenges in accurate and efficient (near) real-time change detection and parameter estimations [23]- [26].
To overcome these challenges, [27] considered change detection in non-stationary data streams as an optimization problem, which is solved by a bio-inspired algorithm to validate the heterogeneity of drifts and achieved high diversity through self-learning optimization technique. Reference [28] modified weighted one-class SVM and improved it for the nonstationary streaming data analysis. They claimed that one-class classifier can adapt its decision boundary according to new data streams along with forgetting mechanism which helps the model to re-learn the parameters. Similarly, [29] presented an efficient ensemble learning technique for recognizing activities in real-time. The system modifies iteratively the weights of Naïve Bayes classifier and makes it smoothly adaptable to the current situation of the stream even without an external drift detector. Recently, deep learning has gathered research attention since it achieves to extract and learn features directly from raw time-series or frequency data to avoid feature selection [30], [31]. However, its training time is usually a barrier to its widespread application in this context [24]. Reference [32] proposed a Time-variant Local Autocorrelated Polynomial model with Kalman to model the underlying dynamics of a time series (or signal) and mine the deep pattern of it, except estimating the instantaneous mean function, including: (1) identifying and predicting the peak and valley values of a time series; (2) reporting and forecasting the current changing pattern (increasing or decreasing pattern of the trend, and how fast it changes). [23] proposed an unsupervised cluster-based algorithm for modelling normal behaviour in nonstationary data streams and detecting anomalous data points.

C. LEARNING OF TIME-DEPENDENT PARAMETERS
Time-dependent parameters in data analytics models have the potential to provide more accurate and realistic representations of the physical environment [8]. Models that embed time-dependent parameters are flexible enough to capture the complex nature of dynamic systems, thereby yielding better forecasts and a better fit to data than models with constant parameters [7]. However, their estimation poses significant challenges to the reliability of the models due to the uncertainty that is inserted to them through different channels, e.g. inaccurate user input, noisy data, dynamic and complex environment, etc.
Bayesian and machine learning approaches have been proved effective for time-dependent parameter estimations provided that noise filtering algorithms are applied to estimate unobservable states related to them [33]. The vast majority of the literature in the IoT context deals with time-dependent parameters related to domain-specific signal processing and monitoring applications [8], [34]- [36].

III. PROPOSED APPROACH FOR SENSOR-DRIVEN LEARNING OF TIME-DEPENDENT PARAMETERS FOR PRESCRIPTIVE ANALYTICS
The proposed approach addresses the learning of timedependent parameters of prescriptive analytics models by acquiring and processing non-stationary time-series sensor data. It aims to provide accurate estimates of time-dependent parameters that are related to recurring concept drifts based on noise filtering and changepoint detection algorithms. To do this, it tackles effectively with the uncertainty derived from user estimates at design time, the non-stationarity of the data and the sensor noise.
The proposed approach has been developed in a modular manner in order to be able to be integrated into any prescriptive analytics model in the context of an event-driven architecture. As shown in Fig. 1, Prescriptive Analytics concludes in optimal decisions. Upon approval, the user implements the prescriptions (recommended actions). During action implementation, Sensor-driven Learning, which is the focus of the current work, acquires data from sensors in order to estimate the time-dependent parameters of the prescriptive analytics models. In this way, it removes subjectivity in their estimations made by humans in the prescriptive model's configuration. The proposed approach consists of three modules: Action Start/End Detection (Section III.A), Noise Filtering of Non-stationary Sensor Data (Section III.B), and Timedependent Parameters Estimation (Section III.C). For each sensor measurement, the Action Start/End Detection detects whether a recommended action (prescription) starts or ends. At the same time, the sensor measurements are processed with Noise Filtering of Non-stationary Sensor Data to remove noise and thus, to enhance data quality. For each implemented action within a decision horizon, Time-dependent Parameters Estimation estimates the parameters as functions of time and updates their value into the prescriptive analytics models. The following sub-sections describe in detail the three aforementioned constituting algorithms.

A. ACTION START/END DETECTION
The Action Start/End Detection algorithm implements a real-time Bayesian changepoint detection [37] to detect changepoints (i.e. times when the probability distribution of a stochastic process or time series changes) as events in sensor measurements that indicate the start or the end of an action until a decision horizon.
The algorithm identifies the change of the system state (no action, action) based on the probabilistic distribution over the possible runs. Rather than retrospective segmentation, this algorithm executes causal predictive filtering, generating a recursive message-passing algorithm for the joint distribution over the current run length and the next unseen data. The Bayesian inference in recursive anomaly detection algorithms has been proved to achieve high levels of accuracy and effectiveness [38]. The decision horizon indicates the time after which there is no reason for applying the prescriptions. Assuming a sequence of sensor observations x 1 , x 2 , . . . , x T may be divided into non-overlapping product partitions, the delineations between partitions are the changepoints. For each partition, the data within it are independent and identically distributed (i.i.d.) from some probability distribution. The notation of the algorithm is depicted in Table 1, while the algorithm is depicted in Table 2.

B. NOISE FILTERING OF NON-STATIONARY SENSOR DATA
The Noise Filtering of Non-stationary Sensor Data algorithm implements Kalman Filter and its non-linear extensions, i.e. Unscented Kalman Filter [39], [40] for both uniform and non-uniform sampling.
Kalman Filter is one of the most widely used methods for filtering, tracking, estimation and prediction. Its main advantage over other noise filtering methods is in its computational efficiency [41], [42] making it suitable for sensor-generated big data processing. It should be noted that at the point of transition (changepoint) from the ''no action'' to the ''action'' state and vice versa, the noise filtering algorithm stops and restarts being applied for the new values. The notation of the algorithm is depicted in Table 3, while the algorithm is depicted in Table 4.

C. TIME-DEPENDENT PARAMETERS ESTIMATION
The Time-dependent Parameters Estimation algorithm estimates the parameters of the prescriptive analytics models as functions of time considering that the prescribed actions may have different effects at different implementation times. For various implementations of each action, the proposed  To estimate the analytical expression of the parameter function from these points (which have been derived from the corrected/filtered sensor data), we apply Curve Fitting using the Levenberg-Marquardt algorithm [43]. Since this distribution is not known in advance, a curve comparison algorithm is applied. Polynomials of various degrees are compared with respect to the Mean Squared Error (MSE), and the one with the lowest MSE is selected. The notation of the algorithm is depicted in Table 5, while the algorithm is depicted in Table 6.

IV. DEPLOYMENT IN AN INDUSTRY 4.0 CASE STUDY
The proposed approach is demonstrated in an Industry 4.0 case. Industry 4.0 is an emerging paradigm that aims at bringing interaction and convergence between the physical world and the cyber world taking advantage of the multitude of data generated in manufacturing environments [45]. In this context, prescriptive analytics and analytics-based decision support are two major challenges in smart manufacturing [46]. The case under examination deals with predictive maintenance in the offshore oil and gas industry. With a typical day rate for a modern oil rig being around 500,000 Euro, reducing undesired downtime is of outmost importance in the oil  drilling industry [47]. Since the aforementioned prescriptive analytics model is highly sensitive to its cost-related parameters, which are expressed as functions of implementation time, their accurate estimations can have a significant impact on eliminating the maintenance costs [47].
The proposed approach was implemented as an event processing agent and integrated into a prescriptive analytics model which recommends the optimal mitigating action (out of a list of alternative actions) and the optimal time for its implementation [44]. The latter is updated based on processing of non-stationary sensor data with the proposed approach. Specifically, we examine a case in which the failure mode ''gearbox breakdown'' is predicted to occur in 221 hours. This failure mode, with a cost of 155,000 Euro, is assigned to 3 alternative mitigating actions, as shown in Table 7. In the beginning, the input of the proposed approach is the user-defined cost functions. The output of the prescriptive analytics model, depicted in Fig. 2, is to operate at reduced equipment load (a2) starting in 124 hours.  The a2 cost function represents the cost due to production loss, which can be measured indirectly through a flow monitoring sensor related to the volume of oil gathering during oil drilling per time unit. According to IEC 60770, flow monitoring sensors have an accuracy of 99.5 % in terms of Full Scale Output (FSO). Oil volume is multiplied by its associated cost and the constant costs are added. Fig. 3 depicts a snapshot of real-time monitoring and noise filtering functionality.   probability density of having a changepoint at that time (due to action start), while the second peak represents the highest probability of having another changepoint (action end given that the action is already implemented).
Based on these results, the cumulative cost of the action over action implementation time is calculated, as shown in Fig. 5. The updated cost function feeds into the prescriptive analytics model and is exposed to the user in contrast to the initially user-defined one, as shown in Fig. 6. After several executions of the proposed approach, the prescriptive analytics results in a different prescription concerning both the prescribed action and the prescribed time, as shown in Fig. 7. Therefore, the implementation of the prescription of Fig. 2 leads to a non-optimal prescription due to inaccurate estimations of their time-dependent parameter.

V. EXPERIMENTAL RESULTS
The experimental results deal with the evaluation of the consequences of inaccuracies in the user's input and sensors. They were derived from a comparative and sensitivity analysis by performing extensive simulation experiments to evaluate cases with arbitrary ranges of values (e.g. when there are no constraints about the minimum and maximum values of input parameters). The simulated computational environment generates sensor measurements in the form of events with either uniform (i.e. measurement provided at regular intervals) or non-uniform (i.e. measurement provided at irregular intervals) sampling.

A. IMPACT ON USER'S INPUT INACCURACIES
In this sub-section, we present the experimental results about the impact of the proposed approach on the user's input inaccuracy caused due to human subjectivity on time-dependent parameters. We compared the output of the prescriptive analytics algorithm mentioned in Section IV in the case that it does not incorporate the proposed approach (user-defined time-dependent parameters) and in the case of 100 executions of the proposed approach. Table 8 shows the prescribed action, implementation time and the associated expected loss for various timestamps of prediction events in the two aforementioned cases. It should be noted that the prediction events are expressed with respect to the end of the decision horizon. The benefit of the proposed approach concerning user input inaccuracies is multiplied taking into account that sensor-driven infrastructures process a multitude of data and make predictions about various future events. These prediction events trigger various instances of prescriptive analytics algorithms that generate prescriptions for a variety of proactive actions. To demonstrate the multiplication effect, we followed the same procedure with an extensive number of prediction events for 10 different instances. The results are presented in an aggregated form in Table 9. For each instance, we present the percentage of the cases where the recommended action, the recommended time and the expected loss changes, as well as the average difference in the expected loss after the deployment of sensor-driven learning. These changes occur due to the improvement in terms of accuracy in time-dependent parameters estimation and thus, in terms of the reliability of the generated prescriptions.

B. IMPACT ON SENSOR INACCURACIES
In this sub-section, we present the experimental results about the impact of the proposed approach on sensor measurement inaccuracies due to low data quality (sensor noise).
We performed extensive simulations to examine the output time-dependent parameter estimations of sensor-driven learning for various sensor noise levels. Fig. 8 shows the improvement of MSE in time-dependent parameter estimations when using noise filtering in sensor measurements. In each diagram, the x-axis represents the noise that has been transferred to the time-dependent parameter from the sensors. The range of the noise has been derived from the sensor accuracy in terms of FSO. The y-axis represents the MSEs of the noisy and the corrected time-dependent parameter. For each time-dependent parameter, we also show the value of the ratio derived from the MSE of corrected data to the MSE of noisy data, i.e. the Corrected-to-Noisy Data Ratio (CNDR) for indicative noise levels. For example, a CNDR of 0.3922 means that the MSE of corrected data is equal to 39.22% of the MSE of noisy data.
According to the experimental results shown in Fig. 8, the MSE of the noisy time-dependent parameter is significantly higher than the one calculated by the proposed approach, for various parameters and noise levels. Although the MSE of the time-dependent parameters of the proposed approach increases with the sensor noise, that increase is significantly lower than the one of the noisy parameters. On the other hand, higher noise levels lead to a lower CNDR, since the MSE of the corrected data becomes significantly lower than the MSE of the noisy ones.
For the indicative time-dependent parameters showed in Fig. 8, Table 10 shows the percentages of the cases where the prescribed actions and times changed after noise filtering and the average CNDR. The prescribed times are particularly sensitive to sensor noise, while higher values of time-dependent parameters lead to higher percentages of changes in the prescribed actions.

VI. CONCLUSIONS AND FUTURE WORK
In IoT environments, prescriptive analytics is an emerging paradigm that increases data analytics maturity and leads to optimized decision making ahead of time. In this context, feedback and learning mechanisms for tracking the generated prescriptions are crucial enablers for self-configuration and self-optimization of prescriptive analytics models. However, the non-stationarity of sensor-generated time-series data as well as the uncertainty derived from the user's input inaccuracy and sensor noise pose significant challenges, especially to the estimation of time-dependent parameters and thus, to the reliability of prescriptions. The proposed approach is capable of being deployed in streaming computational environments to provide continuous learning of time-dependent parameters for prescriptive analytics models by processing non-stationary sensor data. In this way, the accuracy of estimations increases and the reliability of prescriptions are improved. A significant advantage of the proposed approach is that it does not require historical training datasets, the availability and suitability of which is a major challenge, especially in an enterprise context. However, on the other hand, this leads to the limitation that changepoints may occur due to factors not related to the prescribed actions. In addition, the current research work is not involved with the design time configuration of the sensors that are related to specific actions.
Our plans for future work belong in three directions: (i) development of an approach and algorithm for mapping changepoints to prescribed actions; (ii) development of a human feedback mechanism aiming at building an optimized human-machine collaboration through explainability and interpretability; and, (iii) development of information fusion algorithms for combining sensor-driven and human-driven learning of prescriptive analytics in the presence of concept drifts.
NIKOS PAPAGEORGIOU received the Diploma degree in electrical and computer engineering, the M.Sc. degree in techno-economic systems, and the Ph.D. degree in electrical and computer engineering from the National Technical University of Athens (NTUA), Greece, in 1998, 2007, and 2016, respectively.
He is a Senior Researcher with the Institute of Communication and Computer Systems (ICCS), NTUA. He has authored seven journals and more than 30 conference papers. He has over 20 years of industrial experience as a software developer, network administrator, and IT consultant. His research interests include proactive computing, event processing, machine learning, cloud computing, and e-collaboration. He has been a member of the Technical Chamber of Greece (TEE-TCG), since 1999. He has received one best paper award at the international conference of Through-Life Engineering Services 2016. Since 2006, he has been a Senior Researcher with the Information Management Unit (IMU), NTUA. His work focuses on situation awareness, event-driven computing, proactive decision making, recommender systems, behavioral change support systems, and expert systems. He has authored more than 51 articles, and he has organized workshops dealing with big data, open data, and behavioral change support, while he has been a PC member of more than 16 workshops and conferences. During his time at NTUA, he has been involved in or managed research bids that resulted in grants of over 4 M Euros, while he has managed the work of NTUA in more than twelve EC and national co-funded ICT projects and projects addressing societal challenges.
Dr. Magoutas has been awarded two excellent performance scholarships from the Greek State Scholarships Foundation and the Alexander S. Onassis Public Benefit Foundation during his studies. His three papers received the Best Paper Award at the international conferences of EGOV 2009, Through-Life Engineering Services 2016, and ICE-B 2019.
DIMITRIS APOSTOLOU received the Ph.D. degree in electrical and computer engineering from the National Technical University of Athens, Greece. He has twenty years of experience in management and technology consulting. He is an Associate Professor with the Department of Informatics, University of Piraeus, Greece. His professional experience focuses on the design and management of large-scale IT projects. He has coordinated several EU-funded research and innovation projects, and participated in more than twenty ones. He has more than 50 publications in journals, such as Communications of the ACM, the IEEE INTELLIGENT SYSTEMS, the IEEE COMPUTATIONAL INTELLIGENCE, the International Journal of Information Management, Expert Systems With Applications, the Journal of Knowledge Management, and Internet Research, more than 90 publications in conference proceedings, and one book published by Springer-Verlag. His research interests focus on decision making and intelligent information systems. He is a member of various national scientific and professional associations.
GREGORIS MENTZAS received the Diploma degree in engineering and the Ph.D. degree in operations research and information systems from National Technical University of Athens (NTUA), in 1984 and 1988, respectively. His experience includes twelve years of management consulting in corporate strategy and information systems strategy. He is a Full Professor of management information systems with the School of Electrical and Computer Engineering, NTUA, and the Director of the Information Management Unit (IMU). He has published four books and more than 200 articles in international peer-reviewed journals and conferences. He has led or contributed to more than 50 European research and development projects conducted in collaboration with SAP, IBM, HP, Siemens, Software AG, ATOS, and other leading technology firms. The research carried out by his group has led to the establishment of three Internet technology companies. His research concerns big data management in multi-cloud environments and prescriptive analytics in Industry 4.0. He has received five best papers awards, sits on the editorial board of five international journals, and has served as a chair, a Co-Chair, or a program committee member of more than 55 international conferences. From 2006 to 2009, he has served as a member of the Board of Directors of the Institute of Communication and Computer Systems.