Edge Intelligence for Data Handling and Predictive Maintenance in IIOT

The use of IoT has become pervasive and IoT devices are common in many domains. Industrial IoT (IIoT) utilises IoT devices and sensors to monitor machines and environments to ensure optimal performance of equipment and processes. Predictive Maintenance (PM) which monitors the health of machines to determine the probable failure of components is one IIoT technique which is receiving attention lately. To achieve effective PM, massive amounts of data are collected, processed and ultimately analysed by Machine Learning (ML) algorithms. Traditionally IoT sensors transmit their data readings to the cloud for processing and modelling. Handling and transmitting massive amounts of data between IoT devices and infrastructure has a cost. Edge Computing (EC) in which both sensors and intermediate nodes can process data provides opportunities to reduce data transmission costs and increase processing speed. This article examines IIoT for PM and discusses how and where data can be processed and analysed. Initially, this article presents sampling and data reduction techniques. These techniques allow for a reduction in the amount of data transmitted to the cloud for processing but there are potential accuracy trade-offs when ML algorithms utilise reduced datasets. An alternative approach is to move ML algorithms closer to the data to reduce data transmission. There are three main techniques that utilise the EC paradigm to perform ML and data processing on intermediary nodes. These techniques are categorized according to where data processing occurs: Device and Edge, Edge and Cloud and Device and Cloud (Federated Learning). In addition to exploring traditional approaches, these three state-of-the-art techniques are examined in this article and their benefits and weaknesses are presented. A novel architecture to demonstrate how EC can be utilized both for data reduction and PM in IIoT is also proposed.


I. INTRODUCTION
The Internet of Things (IoT) is envisioned to make our lives easier. Since its inception, almost every sector has somehow exploited it. For example, it is common to find the use of IoT for smart healthcare [1], [2], agriculture [3], smart homes [4], smart grid [5], [6] and smart industry [7]. Smart industry, also referred to as industry 4.0, makes use of information and communication technologies for efficient productivity [8]. In the context of industry, IoT is known as Industrial IoT (IIoT) and it has gained significant research attention recently [9], [10].
In IIoT, different sensors are employed to monitor the performance of equipment or even a complete production processes [20]. In IIoT, one technique called Predictive Maintenance (PM) has recently gained attention. The basic concept of PM is to monitor machine health with the The associate editor coordinating the review of this manuscript and approving it for publication was Giacomo Verticale . help of sensing data to determine probable future degradation or failure of the machine. PM employs Machine Learning (ML) on the collected data to make predictions. Indeed, the accuracy of the ML models depends mainly on the collected data.
In IIoT, a traditional approach to collect data is to stream it from sensing devices to the cloud where it is processed and modelled. Sensing devices generate enormous amounts of data, continuously or periodically, often in a very short time frame. For example, within a second, thousands of records can be generated by a machine [16]. According to the Cisco cloud index (2013-2018), an automated facility can generate a terabyte of data every hour. To this end, approaches such as sampling, compression, filtering are used to reduce the data size. These techniques allow for a reduction in the amount of data forwarded to the cloud. However, there are potential accuracy trade-offs for the ML models which utilize reduced datasets. In the face of cost and other challenges (latency, bandwidth and energy consumption) incurred by the traditional approach, a new computing paradigm called Edge Computing (EC) [31] has recently emerged. EC provides computation and processing nearer to the data source to reduce the data sent to the cloud for processing [19]. In EC, both sensors and intermediate nodes can process data and provide opportunities to reduce data transmission costs. In this respect, developers have options when wishing to reduce the data and associated costs and latency. They can use the limited processing of EC devices (e.g. sensors) to reduce the data being sent to the cloud using various sampling techniques or perform ML for PM on the EC device or even use a hybrid approach in which ML/PM is carried out using EC and the cloud. Another approach, proposed recently by Google is Federated Learning (FL) which seeks to train Deep Learning (DL) models on an edge device with the cloud serving as a global model aggregator [32]. All of the approaches have various trade-offs in terms of data size, transmission cost and accuracy of the ML/PM, which this article explores.
Literature shows that EC can help in meeting the real-time requirements for IIoT [33]. Authors in [34] have provided a range of applications for a smart factory where EC can play a role. There is interest in the research community to propose an optimum solution and so surveys on EC, PM, IIoT or ML have been conducted. However, they typically consider the technological [8], architecture [35], security [36], [37] and systems perspective [13] or focus on the analytics aspect [18] in the IIoT context. This article focuses more on the data and in particular, discusses the location within an IoT network where data can be processed (data reduction or analysis). The article presents the state-of-the-art by • Reviewing traditional approaches which help in reducing data in IoT. This includes sampling, compression and fusion. These techniques help in reducing the data where generated and result in not only consuming fewer communication resources (e.g. network bandwidth) but also require less cloud resources for storage and computation.
• Presenting the research contributions which have been proposed to push ML closer to the data source. In this, ML training takes place in the cloud and only the model is pushed to the edge nodes. Such approaches can greatly benefit from powerful cloud resources for training complex ML models such as DL.
• Discussing the recently proposed techniques which exploit EC to implement ML for data processing and the PM in IIoT. In this, hybrid approaches in which frameworks use the sensing device and the Edge, are presented. Techniques based on such an architecture can help in meeting stringent latency requirements in some application domains.
• Reviewing the FL paradigm, recently proposed by Google, in which rather than aggregating raw data from devices, the cloud aggregates DL models trained locally on the edge. It is particularly beneficial in meeting privacy requirements when data are confidential that can not be shared with cloud providers in raw form.
• While some observations are presented at the end of each section, some future directions are proposed using reduced data for training ML models and where to implement what part of the PM framework in IIoT.
We also propose an Edge-Cloud based architecture that utilises data reduction at the edge informed by network-level information (e.g. congestion) and PM analytics constraints from the cloud (e.g. accuracy) to reduce the data required for analysis tasks. Figure 1 shows how the literature is categorized while Table 2 summaries the acronyms used in this article. The remainder of this article is organized as follows. Section II presents related surveys published since 2018. We build on the review of the literature from 1993 to 2018 presented in [18]. However, we also discuss PM in conjunction with EC which has emerged recently. In section III, data reduction approaches, which do not employ ML for IoT are discussed. We discuss sampling, compression and fusion techniques. Section IV focuses on the role of ML both for data reduction and PM analytics within EC. In particular, we review techniques that rely on Device-Edge interconnection in subsection IV-A, Edge-Cloud interconnection in subsection in IV-B and FL for IIoT in subsection IV-C. This unfolds where ML can be implemented in the overall architecture of IIoT systems. Section V presents some research challenges and provides future directions to address the need for data reduction and continuous retraining of ML models. Finally, a conclusion is given in section VI.

II. RELATED WORK
Since [18] provides a comprehensive review from 1993 to 2018 for PM, this article builds on that and provides a review of the PM primarily since 2018.
EC is a fundamental pillar of modern IIoT systems and has been discussed in many surveys [8], [12]- [14], [17], [19]. Research shows EC can assist in deploying ML models, analytics and data handling, however, existing surveys lack the discussion which realizes EC for these tasks. Table 1 depicts the focus of recent surveys.
Some recent surveys have discussed how ML can be deployed on Edge [13], [14], [17], [38]. [14] discussed the hardware and software frameworks for employing ML at the edge. Likewise, [17] discussed the ML and Artificial Intelligence(AI) implementation in the form of agents. The approach proposed in [13] also covered ML. However, PM analytics in IIoT systems are not considered. Authors of [38] have discussed the role of ML in offloading tasks to the edge. Table 1 summarises the various aspects of the IIoT paradigm that have been reviewed to date.
The importance of PM is seen in recent works [12], [15], [16], [18], however, these works do not consider the benefits or use of EC. In [12] authors focused on discussing the building blocks (such as the equipment, their integration in the system and analytics) of an IoT based smart factory. The articles in [15] explore techniques of PM analytics in IIoT. This includes knowledge-based approaches (ontology, rule-based etc.), techniques using ML models, and approaches involving DL models which help in inference and PM analytics. A comprehensive survey of the PM field is given in [18]. The authors selectively covered the literature from 1993 to 2018 in the field of PM. However, like other analytics, PM also involves data. Meeting real-time latency requirements depends on how data are being collected and processed.
Authors in [13] discussed the role of data. However, data reduction mechanisms are not considered. By searching the data reduction mechanism specifically for IIoT systems, it is clear there are few significant contributions. Therefore, we extended our literature review to consider data reduction techniques within EC in IoT and analytics within the EC paradigm in IIoT systems. Therefore, this article builds on the existing body of knowledge and reviews research efforts made using emerging technologies such as EC and ML to reduce the data as well as performing PM analytics in IIoT systems.
What differentiates this article from related surveys is that 1) it puts forward the available data reduction mechanisms which are very important for future IoT systems, especially in the case of redundant sensing. 2) Unlike related surveys which cover ML literature implementation from the what perspective in EC, this article instead focuses on the where perspective. It is important to unfold the location in the IoT network where a particular ML framework could be employed. For instance, if the application demands implementation of ML on a sensor node, the ML algorithm would need to be designed for low power devices.
In the next section, the focus is on data reduction approaches that do not employ ML. The techniques discussed are either implemented on a sensor node or an edge node.

III. TRADITIONAL APPROACHES
Since IIoT is new, not much attention has been given yet to data reduction mechanisms. Therefore, the search criterion for recent papers is expended from IIoT to the more general IoT domain. However, as compared to general IoT, IIoT applications have a variety of sensors and may push more data with higher velocity. In this section, traditional data reduction approaches are discussed. The term traditional, in this article, refers to those techniques which do not utilize ML for data reduction, nor are tested for complex IoT analytics such as PM. They are either implemented on a sensing device or the next immediate node which could be a gateway node. Their main purpose is to reduce the size of data that are forwarded to the cloud for analysis. Table 3 provides a summary of the reviewed approaches.

A. SAMPLING
Sampling refers to how frequently data points are taken from the incoming data. For instance, it describes the frequency sensed value(s) are being forwarded; every second, every minute or even every hour. This is generally used in applications having frequent redundant values. Constant temperature monitoring is an example of this. In this case, deduction or decision making can be done with a reduced number of samples. In this subsection, a few recent sampling techniques are reviewed.
ApproxIoT proposed in [40] works by applying reservoir and random sampling on the data stream and associates weights which indicate the significance of the data values at an aggregator. The problem with such an approach is that the multiplication of the weight with the data point eventually changes the values. This aspect makes it unsuitable for applications that demand actual values or values in a particular range. Unlike this approach, the technique proposed in [41], does not alter the values. It is based on two subsets, maximum and minimum values and the aggregating node uses those subsets to obtain an approximated stream. Such an approach may work well for basic queries but has no mechanism to deal with duplicate values which makes it unsuitable for some IoT applications.
The above approaches are designed and tested on basic queries such as average, sum, etc. However, the field of analytics is now matured and advanced analytics are required by today's IoT systems. Particularly in IIoT, advanced analytics include PM in which failure or maintenance of some equipment is detected/predicted ahead of time. Considering such a complex case, the authors in [23] have proposed an Adaptive moving average Window Based Sampling (AWBS) algorithm to reduce the data. The window size varies based on variation in the incoming data. When more variation, window size is reduced to forward more values to the cloud and vice versa. Using their proposed algorithm, they reduced one of NASA's datasets [42] to just 6.91% and passed it to an unsupervised Anomaly Detection (AD) algorithm, Local Outlier Factor (LOF) [21]. Results show that the reduced data have almost similar AD performance as compared to a case when the complete dataset is used or data are reduced using the approach of [40], [41] to 28.95% and 20%, respectively.

B. DATA COMPRESSION
Data compression is an important data reduction approach in which data are compressed before transmitting to the cloud for further processing/analysis. The authors in [43] proposed a compression approach that uses the edge storage concept. It exploits existing compressed data points and saves data either at an edge device or in the cloud without inter-device communication at the edge. Similarly, another compression approach called Sensing-data Reconstruction Algorithm under Intelligent-migration Strategy (RdS-ImS) to handle the data streams considering the whole network is proposed in [28]. The authors used the correlation of time-series from different nodes and compressed the data before forwarding. To achieve reliability, a re-transmission mechanism from sensors to edge server is employed. In case communication between an edge node and a sensor is not possible, an edge server estimates the value with the help of a predictive model and sends it to the cloud.
In [44], the authors proposed an energy-efficient approach for compressing multivariate time-series data for IoT devices. The approach tweaks the SZ compression algorithm [22], originally proposed for compressing data of high-performance computing applications. The sensor nodes compress the data using SZ and forward to the edge node which reconstructs the data. As the use case, they used the data set from [45], to determine the stress level of the driver given features such as electrocardiogram signals, respiration and heart rate, to mention a few. Results show that a DL model could effectively predict using compressed data without compromising accuracy. Moreover, the results also show the approach is efficient in terms of computation time and energy consumption of a smart device. However, this approach only works for floating-point values and is tested on labelled data which are not always available in general IoT applications [46]. However, given the heterogeneity of IoT data, compression designed for one case often performs poorly in the other. An adaptive approach for compression is proposed in [47].
The authors have proposed to equip an edge server with several compression approaches and adapt the compression according to the dataset. To determine which one to adopt, they proposed to take a sample of the data and apply compression approaches and select the one which offers a better compression ratio and rate. However, how to select the sample for comparing the approaches is not provided. Moreover, how accurately it compressed the data needs to be evaluated.

C. DATA FUSION
Apart from compressing an individual data stream [28], there is a technique called fusion in which data from various streams are fused to decrease data redundancy, increase data quality, improve reliability, handle missing data and more coverage of the area being monitored [48].
In the research proposed in [49] data are first reduced on the sensor node with the help of Lagrange Polynomials and then sent to the edge server. In the second stage, the edge server reconstructs the data and performs a Kolmogorov-Smirnov test to reduce the data aggregated from several neighbouring nodes and forwards to the cloud. Similarly, in-networking data reduction using two-layer architecture on the edge is proposed in [50]. In the first layer, the data are filtered based on the deviation between the actual value and estimated value, removing the redundancy. For estimating the value, Kalman filtering [51] is employed. This layer passes the data to a Fusion layer which is responsible for gathering data from several sensors, removing redundancy, filling the missing data and improving reliability. The quality of the data is still one of the challenges which data VOLUME 9, 2021 heterogeneity presents in IoT systems [52]. The importance of data quality increases when the goal is to use reduced data in ML models.

D. OBSERVATIONS ON TRADITIONAL SENSOR-CLOUD ARCHITECTURE
Based on the reviewed research, there are some fundamental observations on traditional Sensor-Cloud architecture which are briefly described in this subsection.
Although Sensor-Cloud architecture reduces the data being sent, stored and processed in the cloud [53], traditional approaches such as sampling, compression and fusion are implemented mostly on the device (sensor) itself and analytics are performed in the cloud. However, they are not evaluated or tested for complex analytics such as PM. Even performance of the AWBS [23] is tested only for detecting abnormality of the data points. Therefore, data reduction at the edge needs further exploration. The limited resources of sensor nodes impact the implementation of sophisticated reduction algorithms on them. For example, perceptual importance point-based algorithms have complexity that a sensor can barely handle [50]. The situation becomes more severe with evolving applications of IoT which involve rich data types such as images. Furthermore, developing real-time analytics in the cloud is almost impossible to achieve.
Different IoT applications demand local analytics. For instance, in the IIoT context, based on local analytics, the decision to turn some equipment ON/OFF quickly in a production environment can avoid a catastrophic situation. Analytics depend on ML algorithms which are computationally expensive for some tiny sensors. Also, the energy consumption of tiny sensors has been one of the important concerns even before ML emerged in IoT. Thus, meeting a real-time goal with sensor-cloud architecture seems ambitious. This calls for EC which provides computation power near the data source (sensors), eliminating the latency issues of sensor-cloud architecture. Bringing EC into the architecture creates further possibilities of hybrid architectures that have been adopted in several research works recently. The following section examine recent efforts to increase intelligence at the edge through the use of ML. This can be done by using ML to produce intelligent data sampling or conducting ML/analytics on edge devices.

IV. EDGE COMPUTING BASED & MACHINE LEARNING ENABLED APPROACHES
AI and ML are now fundamental pillars of modern IoT applications. Recently, many efforts have been devoted to this research area. Given the different training time complexity of various ML models, the research community has explored which ML models work best for different IoT applications and contexts. However, not much attention is given to investigate the location where ML is most suitably implemented, which is of paramount importance for a few reasons. Firstly, training is a computationally expensive task for which the cloud can offer resources. Secondly, moving massive data volume to cloud storing the data and eventually training ML models using that consumes a lot of resources. Thirdly, where to deploy the prediction model is not consistent in every application. For instance, in a hazardous production facility, prediction on the device or the edge is more important than in the cloud [66] to combat any latency issues.
Unfortunately, deploying ML in an IoT system faces challenges due to constraints of the IoT system. For example, if ML is implemented in the cloud, real-time local decision-making [67] is almost impossible to achieve due to underlying limited bandwidth connectivity between sensing nodes and the cloud. To address the problem, ML can be deployed on the device. However, the limited computing capacity of the sensing nodes is a major challenge. Therefore, a hybrid architecture to implement computation intensive tasks such as training on the cloud and deploying models for prediction on the sensing node has emerged. However, this approach also presents challenges in the case when models require retraining based on new data. In this case, again all of the new data need to be moved to the cloud, incurring costs in terms of latency, energy consumption, and also the use of network resources [68].
Recently, EC which offers computation by residing between sensing nodes and the cloud has emerged. Some of the techniques presented in section IIIutilize EC for data reduction. However, research shows that EC which offers more computation ability than sensing nodes can be exploited to implement ML. Numerous research efforts have been proposed using EC for different IoT applications including PM. Therefore, this section reviews state-of-the-art of deploying ML for data processing and PM analytics in an IoT and IIoT network.
ML offers several advantages including accurate predictions, speed, automation and scalability [69]. Research shows that ML can greatly help with monitoring systems in IIoT [70]. Where on one hand complex DL models are being developed, on the other hand, research on EC is accelerating to provide more computing resources to DL models to support more applications [71]. Various ready-to-use ML frameworks with EC are presented by the authors in [14]. Before ML was used in IIoT, cognitive ability (to learn the environment) of the machines was merely predefined heuristics. However, sophisticated ML algorithms have enhanced cognitive ability by finding patterns in the data and making predictions [30].
This section will unfold the benefits and drawbacks of deploying ML models at different locations (device, cloud, edge or a hybrid) in an IoT network. Table 4 shows a summary of ML-based approaches. In particular, it reveals different aspects of particular approaches. Most importantly, it highlights the locations where specific parts of the implementation are performed such as pre-processing (also includes data reduction mechanisms, if used), training of the model and where the final analytics are performed. Figure 2 depicts the three-layer standard architecture of an IIoT employing EC. It shows that the edge layer which lies between the device and the cloud is a suitable location for deploying ML, therefore provides opportunities to implement the frameworks using hybrid architectures.

A. TECHNIQUES USING DEVICE-EDGE ARCHITECTURE
In this section, techniques that rely on sensing nodes and edge nodes are reviewed.
In [80], authors considered the latency requirements in a proposed hybrid architecture. Their proposed hybrid architecture consists of an edge server and sensing device. Similar to [26] and [66], they also utilise image processing as a use case. Image data can also be converted to time-series text data with the help of EC. For instance, in [88], the edge server first performs pre-processing on the fetched image data and transforms it into text-based time-series data. Once the data are ready, they are passed to the Long Short-Term Memory (LSTM). The second component, as well as the novelty of the approach, is the parameter tuning of the LSTM through Particle Swarm Optimization (PSO).
Sometimes rather than using one model, several models are ensembled to achieve more accuracy. The authors in [83] proposed to deploy a lightweight ML algorithm, Light Gradient Boosting Machine (GBM), on the edge nodes, however, DL is deployed on the master nodes which are edge routers. One of the benefits of the proposed approach is that raw data are not pushed to the master nodes. Instead, Light GBM learns the features from the raw data and passes the learned features to the master node which further increases the accuracy with more computations (using DL). However, the architecture assumes that an edge router can be used to deploy computationally expensive DL models. To implement such an approach, reduced data can help in retraining DL models on the edge node without connecting to the cloud. Overall, ensemble approaches consume more resources as several models are trained.
An alternative approach to ensemble models is online training in which a single model is trained iteratively. Based on this idea, a big data cleaning technique, called Mobile Data Cleaning Model (MDCM) which utilises EC, is proposed in [24]. On the edge server, multidimensional data are cleaned with the help of first employing Angle Based Outlier Detection (ABOD) [25] and then training the ML model. However, MDCM outperformed the compared traditional cleaning model and ABOD, which are the baseline techniques. MDCM has used ML, so it needs to be compared with techniques that also use ML at the edge. Such techniques reduce data at the edge node, however, local transmissions from sensor nodes to edge nodes are not reduced.
To reduce the volume of data transmitted from sensor nodes to the edge server, in [87] authors have proposed to implement ML on the sensor node. The basic idea is to train the ML model on the computationally powerful device (in experiments they used the edge but argued that the cloud can be used too) and push the model to a sensor node. When a new value is sensed, it is passed to the ML model which predicts its label, which is only forwarded to the edge node if it has not been forwarded before. Comparison of different ML algorithm including Linear Discriminant Analysis (LDA), Quadratic Discriminant Analysis (QDA), Gaussian Naive Bayes (GNB), Support Vector Machines (SVM), Decision Tree (DT) and Random Forest (RF). Results showed that SVM outperformed all of mentioned in reducing the data. Such an approach may work well when a single value makes a difference. However, similar to RdS-ImS [28], it does not support the use cases when rather than an individual value, a sequence of values (also called a pattern) is more important, such as IIoT data [72].
Another approach with the same goal as [87], has been proposed in [74]. The authors used the importance of the data as a measure to limit the transmission from sensors to the edge server. Two feature selection methods, namely Impurity and Perturb, gauge the importance of the block of data. The already trained ML model, deployed on the edge server, predicts the spatial information (from which sensor to collect) for the next slot of the time based on already aggregated data. The sensor node actively communicates with the edge server to determine if the next block is worth forwarding. However, this approach works on a distributed level i.e., data from which particular sensor are more important in the next time slot. More specifically it does not reduce the data at the stream level (being pushed by one sensor constantly).
In [73], the authors presented two case studies of novelty detection on the edge of IIoT. In the first case, the condition of an electric motor placed on a kitchen hood (fan extractor to clean the air) is monitored. A microphone detects the vibrational signals and sends them to an IoT gateway where an ML classifier is deployed. Each classifier (or resulting label) has a novelty detection algorithm that is executed, and a novelty score is calculated. The second use case is fault detection of a water filtration plant. LOF [21] is used as an AD algorithm. These algorithms are trained on large datasets before detecting possible anomalous behaviour. The performance of these algorithms needs to be explored when reduced datasets are used.
Based on the discussion, sensor nodes are still part of most frameworks implementation. However, since they have reduced computation resources, they have not been used for computationally expensive tasks such as training. Although they have been used for analytics in some cases, they are mostly used for pre-processing tasks or compressing the data as depicted in Table 3.

1) OBSERVATIONS ON DEVICE-EDGE ARCHITECTURE
Frameworks designed on this architecture have several benefits. First, the architecture requires fewer communication resources as well as less burden on cloud resources. Second, if designed to work independently from the cloud, techniques can even work when there is no connection at all [66]. Third, the edge offers more context awareness as compared to cloud-based systems [19]. Fourth, with this architecture, meeting real-time requirement is possible. In the IIoT context, for example, real-time AD is very important to avoid a catastrophic situation. Fifth, it also meets security and privacy concerns as edge locally processes data.
While using a Device-Edge architecture decreases dependency on the cloud, it raises a few concerns which need to be considered if adopting this architecture for designing frameworks. Firstly, inherited from traditional sensor-cloud architecture, it also uses resources of sensor nodes for implementation. Secondly, even though the edge server is there and sensor nodes can offload computationally expensive tasks, offloading is still not matured and is being explored [90]- [92]. Thirdly, even when a reduction and analytics framework can be implemented using Device-Edge architecture, some of the data still need to be sent to the cloud. This becomes more important for IIoT where an organization can have production facilities in different locations and data from all facilities are required to have a broad picture of services.
To handle highly distributed scenarios in which sensing devices are located at different locations, generally, fusion is used. Research shows that the fusion of data from sensors deployed at different locations can impact the accuracy of further analysis or predictions [48] because fusion reduces data. Authors in [93] proposed a three-level architecture for the healthcare industry. The amount of processing of data at the bottom layer where data are being sensed is less than the middle layer where communication nodes are processing data. Similarly, a global aggregator in the cloud which has more computation power is processing more data than the middle layer from different locations.
Finally, even though an edge node has more resources than a sensor node, it still provides far less resources than the cloud. Research to deploy computationally expensive DL models on edge nodes is still being conducted [94]- [96]. This calls for another potential architecture that involves the edge and the cloud for implementation. The Edge-Cloud architecture is reviewed in the next section.

B. TECHNIQUES USING EDGE-CLOUD ARCHITECTURE
This section reviews research efforts in which edge devices serve the role of middle-ware. More specifically, on one side, an edge device is connected to sensor nodes that collect data. On the other side, the edge device is connected to the cloud. Figure 2 depicts the location of the Edge as a middle entity. This section reviews the approaches that have deployed ML partially on the cloud and the edge server.
Authors in [66] proposed to monitor the real-time data from sensors deployed on equipment in oil/petroleum wells. The goal is to use ML to monitor the equipment performance especially in cases where there is no connectivity available between the site and the back-end cloud, thereby solely depending on the edge. The edge gateway first retrieves sensor data, runs analytics, reports abnormal behaviour and periodically (subject to connectivity) connects with the back-end cloud to update the ML model. The approach uses an ensembler that contains different techniques such as Convolutional Neural Network (CNN), Siamese Neural Network (SNN), Autoendcoder Neural Network (ANN) and Histogram of Oriented Gradients (HOG). Once all models are trained in the cloud, only one model is pushed to the edge gateway-to save its resources. Furthermore, the approach uses the cloud to pre-process the data which are then used to train ML models in the cloud. Thus, the initial data need to be pushed to the back-end to train the model.
When computing servers are deployed on the edge for real-time data processing, accuracy is also of paramount importance. Based on this concept, the authors in [26] proposed Accuracy Maximization offloading with Latency Constraints (AMLC) as an intelligent edge-cloud approach and a new metric called service accuracy. The overall computation offloading, and service involves the following three steps. First, IoT devices estimate the accuracy that edge servers can provide based on ML models deployed on them. The edge server having more accuracy of the ML model is prioritized. The second step involves the estimation of the delay which is involved in offloading the computation task. Then in the final step, when a mobile/sensor device is aware of the delays and accuracy of all the servers (edge + cloud), it sorts the servers in descending order based on accuracy and selects the first one. Similar to [66], it also works on image-based data.
For sensory time-series data, two techniques with the same use case (packaging industry) are proposed in [30] and [81]. In the former, the authors proposed cognitive ability and DL for better knowledge discovery and decision making in IIoT. Their framework is called Deep Reinforcement Learning Dynamic Adaptive Planning (DRL-DAP) has three layers including perception, transmission, and application. In the perception layer, data are collected from the devices using a RESTful application programming interface at the edge nodes. The transmission layer is responsible for using given technologies such as cellular, long-range wide area network and long-term evolution to transmit the data. The cognitive ability which helps in eradicating data ambiguity and building data semantics are also part of this layer. In the last layer, ML optimization models are deployed which help to take the intelligent decision of the production setting being monitored. In [81], the authors proposed Edge-AI which is implemented in a microcontroller as an edge device. Their purpose is to classify vibrational data of sensors placed on a power-train, used in the packaging industry. In both techniques, the ML model was trained on a massive amount of data collected and stored in the cloud earlier. However, in both techniques, EC is used for pre-processing and data analytics.
Techniques to monitor electric equipment have also been proposed. A distributed architecture to monitor an IIoT process is proposed in [82]. Temperature sensors are deployed at different locations of the transformer for measuring time-series temperature values and inferring the level of oil present in the transformer. An agent application gets the data from the IoT devices and makes decisions with the given knowledge. An agent also has a local data repository where results during the computation are saved. However, data are also forwarded to a back-end where an ML model is trained and updated. Similarly, in [84], the health of an electric induction motor is monitored by employing an accelerometer to collect vibrational data. The edge node first pre-processes the data by taking the temporal and spectral features and then passed to a CNN model which is also deployed on the edge node itself to classify if the object is working normally or faulty. The authors proposed to integrate prediction at the edge node to a marine vessel alarm system in case fault is predicted. Likewise, a solution to monitor the bearing health within a machine is proposed in [97] by using a traditional cloud-edge architecture. Different sensors such as temperature, rotation speed, vibration and humidity monitor the relevant parameters. The edge serves the purpose of initial data processing and forwards to the cloud where an ML algorithm predicts the future values and thus the possible equipment failure. However, the solution does not disclose how the edge server performs processing on data streams.
To this end, in [98] a framework called SERENA is proposed for PM using a hybrid cloud-edge computing approach. The sensor nodes send data to an edge gateway where statistical features of the raw data such as average, maximum, and minimum are calculated. Such data are called smart data. The model building module, based on historical smart data, generates the model. This AI/ML model is then used to predict the incoming new data which eventually helps in detecting the abnormal data and thus possible failures. It is worth noting that the ML model building and training take place in the cloud and then the model is pushed to the edge gateway.
The techniques described so far considered a fixed edge device. However, the authors in [27] proposed Supervised Learning of Genetic Tracking (SLGT) in an edge architecture(fixed + mobile) for an industrial park where resources are moved from the production phase to the next using trolleys. The architecture has three parts, namely front-end, near-end and far-end. As the name suggests, the front-end deals with the actuators and sensors deployed on the equipment such as trolleys moving logistics. EC is deployed in the near-end part in a divided way. To be specific, a fixed edge gateway passively receives the signal transmitted using Bluetooth low energy technology and performs pre-processing and forwards data to the back-end cloud. In addition to the fixed edge gateway, there is a mobile edge gateway that actively listens to the transmitted signal. A K-Nearest Neighbors (KNN) algorithm estimates the location zone where a trolley may be at any given time. Support from network devices has also been investigated. Authors in [85] proposed condition monitoring of an industrial motor. They proposed to take the electric current and vibrational data from sensors, pre-process at the edge and send only the frequency spectra to the cloud. They assumed that network devices support storing of the raw data, which is not possible in many cases.
Based on the discussion and the summary given in Table 4, it is clear that DL algorithms are often used within IIoT. This comes from the fact that DL models are generally trained on the bulk of data and provide high accuracy. A review on deploying DL models on the edge is given in [71]. However, VOLUME 9, 2021  using reduced data to train these models requires further exploration. If they do not provide high accuracy, other models need to be considered or designed. Furthermore, it is also clear that for data reduction, the edge or the device is mostly exploited. However, given the fact that initial training requires much computation, the cloud is still being used in most of the proposed techniques for training the models. In cases where a dedicated edge node is not available, network devices can be exploited too. For instance, authors in [78] describe an architecture called UrbanEdge. They proposed to use network devices such as routers to serve as edge nodes that pre-process the data and forward them to the back-end cloud where DL is used for PM analytics.
The techniques reviewed so far do not learn from positive or negative changes in the environment in which they operate. Fortunately, advances in ML have made it possible to adapt learning based on the environment. The type of ML which fits in such a scenario is Reinforcement Learning (RL). In RL, a learning agent takes an action in a given environment. The environment assigns a reward to the agent based on the outcome of actions. Agent repetitively acts to maximize the reward, which in other terms means the agent has learned the environment well and knows what action is correct to be taken in the next step. Authors in [70] have proposed to gather IIoT data using an edge node (called a gateway node) which forwards to the cloud where a well-known RL algorithm, called Q-Learning(QL), is used to detect failures. The RL algorithm is responsible for detecting the safety of the equipment in the factory. It generates a detection policy with high accuracy to ensure the safety of the equipment. Another approach based on RL is proposed in [89] for a grid sorter in IIoT. A grid sorter is a device that can move an object in four directions e.g., left, right, up and down. The local edge node forwards sensor values to the cloud which trains a global model and returns the model to the edge in a factory. Each edge node in every factory then retrains an adaptive model based on local factory policy. When a grid sorter moves objects, the agent keeps learning in which direction a grid sorter can accurately move the objects.
RL has also helped in managing network resources in IIoT. Authors in [99] leveraged RL to assign actions to networking and control systems in a combined manner under a dynamic IIoT environment. More specifically, based on the data forwarded from sensing nodes about the system, an extended Kalman filter estimates a system's state which is forwarded to an RL based agent which decides commands for the networking and control. For the networking, it adjusts the modulation type, and for control systems, it tunes the sampling rate of the sensors (frequency of observations). Similarly in [100], the authors leveraged RL in combination with blockchain to manage resources of a distributed Software-Defined Network(SDN) framework for IIoT. In this, RL helps in optimizing computation resources which are shared by cryptography tasks of a blockchain-based distributed SDN network, and non-cryptography tasks. To manage the IoT network of smart energy management, RL is used in combination with EC is in [86]. A Deep Neural Network(DNN) model is trained in the cloud and QL is employed on the edge node. Devices from smart building send scheduling tasks to an RL agent at the edge server which makes decisions locally and if further training is required, forwards to the DNN model in the cloud. Authors in [101] provide a comprehensive review of how RL can be used in blockchain-based IIoT.

1) OBSERVATIONS ON EDGE-CLOUD ARCHITECTURE
Most of the frameworks reviewed in this article followed the Edge-Cloud architecture. This is also evident from Table 4.
Frameworks proposed using this architecture have their advantages. First, they can reap the benefits of both edge and cloud resources. This means that they can support more rich data types such as images. They can also train computationally expensive models such as DL, thanks to the abundance of cloud resources. Second, in this case, no burden on tiny sensor nodes is required as all the computation is either performed on the edge node or offloaded to the cloud. Third, the scalability of the system is easy as applications are globally managed in the cloud and adding another geographical site with the help of an edge node requires less effort. This is also important in the PM use case for IIoT as production/manufacturing facilities can be extended.
Although Edge-Cloud architecture addresses some of the concerns of Sensor-Cloud and Device-Edge architectures, it also presents a few concerns which can play an important role while designing frameworks. Firstly, deciding which part of the application needs to run where requires careful consideration. To be specific, part of the framework deployed on the edge node will meet the real-time requirement and those deployed in the cloud will leverage more computational power. Secondly, it also depends on the underlying networks which connect the edge and the cloud. Lastly, as data privacy and security have been hindering the adoption of IIoT [102], not all production facilities will be willing to store confidential data in the cloud. However, these concerns can be addressed to some extent in the IIoT context. For example, in IIoT, dividing an implementation based on analytics (realtime alarms, PM analytics) can help. The second challenge can be addressed by exploiting the network information such as congestion. A potential IIoT architecture proposed in this article uses these concepts and is discussed in the section V.

C. FEDERATED LEARNING AND IIoT
FL is another computing paradigm that was recently proposed by Google [32]. Rather than storing and training ML models on a centralized location (cloud), FL is used to train local models on the edge/end devices (called clients) where raw data are available. The clients then upload their model updates to a central server which computes a global model using a Federated Averaging algorithm. The new global model is then shared with all clients. This approach serves two purposes. Firstly, it meets privacy requirements because raw data are not leaving the source. Secondly, it minimizes the communication burden as only the model updates are forwarded, not the raw data. The approach relies on having sufficient computation resources to train models.
Since FL was proposed, efforts have been made to apply it in IIoT. For instance, for AD, authors in [103] used an FL approach to train an LSTM model for detecting the anomalous behaviour of a sensor in a smart building. Similarly, research by authors in [104] involved detecting anomalies in IIoT scenarios. In this case, an AD model is collaboratively trained on edge devices which is generalized later. Unlike the work in [103], the approach also captured the most important features, with an attention-based CNN model. These features are then passed to LSTM which predicts the future time-series data. Moreover, the approach also provides a mechanism to limit the number of Stochastic Gradient Descent (SGD) updates which the FL client can send to the server, improving communication efficiency. However, when training with FL, parameter selection for DL networks requires attention. For example, the authors of [105] made a specific effort towards such optimization using a PSO approach.
For the aeronautical industry, authors in [106], have combined FL with active learning. A DT model is trained on historic data of an aircraft. Then, during the flight, a local model is trained. During training, client nodes obtain labels for uncertain data from the server while maintaining the communication budget. In the air conditioning industry work has been done to use FL and blockchain to detect device failure in IIoT [107]. To alleviate the issue of class imbalance, a distance-based weighted federated averaging is also proposed. An incentive mechanism, to encourage clients to participate in the learning process is also provided which takes into account the size of the client data and class (normal or abnormal) of that data. RaspberryPi is used as an edge node which is equipped with two ML models, logistic regression and neural network. This technique is not based on FL as initially proposed by Google where raw data do not leave the source. However, such an idea where there is a middle entity in form of an edge server between sensing devices and the cloud is also gaining attention. The research work proposed in [108] is another effort based on a new hierarchical version of FL. In this, the edge serves as a local aggregator and the cloud as a global aggregator. An edge node aggregates models from sensing nodes and forwards to the global aggregator. Therefore, the edge node serves as an FL server for sensing nodes and as an FL client to the global aggregator.
FL has also been used to increase security and for preventing IIoT from attacks. For instance, authors in [69] have used FL to avoid Denial of Service Attacks (DDoS) in IIoT. Models are trained locally and edge nodes contain detection and analyzing modules for DDoS. These modules have traffic policies and any network traffic must pass them. In case an attack is detected, it is blocked and an update is sent to the cloud. Similarly, authors in [109] proposed an FL based approach to defend against an attack on DNN models in IIoT while others have used FL to detect malware in applications of IIoT [110].
While FL promises privacy, it also faces security challenges. For instance, research shows that parameter sharing with the server is sufficient for an attacker to infer knowledge of underlying data. Moreover, since the models are being trained locally, an adversary can attack local models on a client, eventually affecting the global model at the server [111]. Therefore, several efforts are also made to address these issues. For instance, research work done in [112] is specifically focused on the idea of data mining from several sources while sharing the data in a cipher state. The authors claim that using their approach, a client can share the data in cipher state with the server and the server can train the model using data in cipher state. Similarly, the authors of [111] provide a secure gradient aggregation framework. The authors in [113], [114] provide a comprehensive survey on the security and privacy of FL.

1) OBSERVATIONS ON FL
Although FL has gained significant attention since it was proposed, some challenges are yet to be fully resolved. These are discussed below.
Research shows that a very large number of model updates between edge nodes and server could result in failure of model convergence [115]. In IIoT scenario, in particular, a scalability issue can arise if every IoT sensing node participates as a learning client. Furthermore, Google's Federated Average algorithm does not take into account heterogeneity in the data which exists in industrial data e.g., size and also the distribution of datasets on each edge device could differ [107]. Moreover, sometimes environmental conditions are heterogeneous which have a direct impact on data being recorded. In such a case, a local model update could report negative knowledge to the global aggregator as FL is based on data similarity of the participating clients [116]. However, a new version of FL in which a local server aggregates the data from IoT nodes and serves as a client to a global aggregator can help in overcoming such issues [108].
In FL, two further problems may arise. Firstly, when different nodes have heterogeneity in terms of quantity and quality of the data, it is difficult to decide the weight or importance to assign to an update from a particular node. Secondly, the convergence of the global model depends on the slowest node in the network. In the IIoT context, a sensing node can have poor network conditions which results in a delay in getting updates. In such as case, a naive node dropping approach can have a detrimental effect on the accuracy of the global model given the situation that a model of the slower node was trained on more or better data samples. In this direction, work needs to done, although some contributions are emerging [117], [118].
Finally, limited resources to deploy DL models on a device, unreliability of wireless channels during frequent client-server updates and the trust of the participating clients to share the trained models with the cloud limit the possible applications of FL [119]. Moreover, designing incentive-based FL frameworks to encourage clients to participate in the learning process is still a challenge [120]. Furthermore, FL is developed for only DL as it takes SGD updates from distributed clients. For using other models, it requires modifications [106]. Based on these observations, deploying DL models on tiny sensors in IIoT seems an inappropriate approach. However, the Edge-Cloud architecture can be leveraged to transmit data from sensors to the edge node (where a DL model can be trained) and pass on the model update to the cloud.
While we reviewed some of the state-of-the-art of FL based contributions, and more have been discussed in [121], [122], it is worth noting that FL has not been studied for the PM use case in IIoT except the contribution of authors in [123] in which they compared two algorithms with the FL and non-FL. They revealed that FL can preserve data locally and at the same time achieve similar performance to a traditional non-FL approach when an ML model is trained in a centralized manner. However, more work needs to be done to realize PM using FL in IIoT.

V. CHALLENGES & FUTURE DIRECTIONS
On one hand, given the heterogeneity of data types of IoT systems, one universal data reduction approach seems an infeasible option. On the other hand, an application or scenario-specific data reduction approach is also inconvenient. This is especially true given the growing variety of IoT domains (e.g. smart cities, agriculture, health and environment monitoring). A naive approach could be to design an algorithm based on data types. However, different applications having the same data type generate data at different velocity and volume. Therefore, a more realistic approach would be to design data reduction approaches based on data types and using EC and ML technologies. When reducing the data, considering the accuracy of the prediction model is important.
Based on the different techniques reviewed in the previous section, the role of EC is important for the future of IIoT systems. The underlying reason is the location and computation ability of the edge. Since the edge is very close to the data source, the data do not need to be transmitted from the sensor to the cloud which adds latency and cost to the system. With EC, data will be processed in near real-time. The computation ability of edge devices means they provide more computation resources than the sensor nodes. This results in shifting more computation burden of the ML algorithms from the cloud to the edge. One such example is online/recurring training of the ML models based on new data.
Retraining is especially important when observations from the machine being monitored deteriorate but are less likely to cause it to fail. In this case, new observations which are still normal would need to be passed to ML for retraining. New ML algorithms can be designed which can be retrained based on environmental change. However, it can be costly (computation and energy) if all of the data are passed for retraining. This calls for data reduction approaches which can help in reducing the data size before passing it to the retraining phase. It would incur less computation and energy consumption while maintaining the latency requirement of PM analytics.
Research on data reduction approaches, based on the accuracy of the ML models, especially for complex analytics such as PM, are needed. Data reduction is yet to be explored given the fact that even state-of-the-art data reduction techniques have tested performance on the reduced data for very basic queries and do not support complex analytics such as PM. The accuracy of the models trained on reduced data should also be a concern when optimizing energy consumption and latency. This is more important in DL approaches which require more data to be trained on. We have seen from Table 4 that most of the time DL models are used for prediction. Moreover, the transmission cost can be reduced if a data reduction mechanism is deployed on the device itself. The literature shows that contextual information learning could be paramount for improving the performance of IoT systems [124], [125]. In IIoT, an approach proposed in [126] is an approach that learns context based on energy, backlog and conflict of participating nodes. Similarly, the authors of [127] proposed learning for task offloading in low latency and ultra-reliable communication scenarios. Therefore, in future, RL can greatly help in IIoT while leveraging EC.
Based on the reviewed PM work, it is evident that there is no de-facto architecture to be followed. From Table 4, different researchers have exploited different network entities for deploying frameworks for PM based on the application. This shows that the field has not fully matured and demands further exploration. The emergence of EC and an ability to deploy ML algorithms on the edge (sensors and EC), has also provided an opportunity. However, it is still unclear what is the best practice on where to implement the different parts of the application. Firstly, since most of the time DL is used, which requires large amounts of data for training, not much attention is given to other algorithms. Even FL only supports DL algorithms. Using other algorithms requires further efforts. However, if models can be trained on reduced data, then data reduction approaches would be helpful. Secondly, data types also demand attention. If there are observations from sensors or images, research needs to be done to determine which ML algorithms can produce the best results and where they can be implemented.
We propose an abstract level architecture of PM for IIoT, shown in Figure 3. This is based on the de-facto threelayer architecture involving EC. The three layers are Device Layer, Edge Layer, and Cloud Layer. Since this is based on Edge-Cloud architecture, the device layer merely consists of IoT devices that are forwarding data to the edge layer. For example, it can be assumed they are placed in a production environment to monitor some equipment. The cloud layer is responsible for detailed data analytics and defining accuracy constraints. However, the role of the edge layer is worth describing here. The core of the edge layer is a data reduction module. This module 1) reduces data, 2) passes data to local analytics such as AD, and 3) forwards data to the cloud for detailed and long-term analytics such as PM. For intelligent data reduction, this module leverages certain information. Firstly, it gets information about the underlying network such as congestion from connected network nodes. For example, when there is more congestion in the network, a smaller number of data samples can be forwarded to the cloud system and similarly when network conditions improve, a greater number of samples can be forwarded. Exploiting network devices for data reduction has been proposed already [50], [78]. Secondly, the data reduction module gets accuracy constraints from the cloud for the phenomena under observation. When greater accuracy is required in the cloud, more samples can be forwarded and vice versa. However, a more intelligent VOLUME 9, 2021 decision to adopt data reduction would be to use both PM accuracy and network information.
When reducing data, local retraining on reduced data is also possible. For instance, when the data reduction module is extracting a greater number of samples to forward to the cloud for the PM model, data can also be passed to retrain a local AD model to improve its accuracy. Retraining requires storing data at the edge which has also been proposed by authors in [128]. A retrained local model can be pushed to the cloud and can be integrated with a PM model to further improve PM accuracy.

VI. CONCLUSION
This article presents data processing and PM analytics in the IIoT context. Firstly, simple data reduction approaches which do not use ML including sampling, compression, and fusion, are discussed. Secondly, frameworks for data processing proposed specifically for IIoT are presented. The IIoT architecture is dissected and presented. In particular, three categories are discussed; 1) Device and Edge 2) Edge and Cloud 3) FL. In these approaches, we discuss what part of the frameworks is being implemented in which location of an IoT system. Finally, some challenges and future directions are presented. In this, a new architecture for implementing data reduction in conjunction with PM analytic is proposed. The proposed architecture is based on a three-layer EC architecture. It proposes to exploit the edge for data reduction, dynamic local short term decisions and forwarding data to the cloud for detailed data analysis and long-term decisions.