Data Management in Networked Industrial Environments: State of the Art and Open Challenges

Information and communication technologies are permeating all aspects of industrial and manufacturing systems, expediting the generation of large volumes of industrial data. This article surveys the recent literature on data management as it applies to networked industrial environments and identifies several open research challenges for the future. As a first step, we extract important data properties (volume, variety, traffic, criticality) and identify the corresponding data enabling technologies of diverse fundamental industrial use cases, based on practical applications. Secondly, we provide a detailed outline of recent industrial architectural designs with respect to their data management philosophy (data presence, data coordination, data computation) and the extent of their distributiveness. Then, we conduct a holistic survey of the recent literature from which we derive a taxonomy of the latest advances on industrial data enabling technologies and data centric services, spanning all the way from the field level deep in the physical deployments, up to the cloud and applications level. Finally, motivated by the rich conclusions of this critical analysis, we identify interesting open challenges for future research. The concepts presented in this article are based on an exhaustive research of recent scientific publications and thematically cover the largest part of the industrial automation pyramid layers. Our approach is multidisciplinary, as the selected publications were drawn from two fields; the communications, networking and computation field as well as the industrial, manufacturing and automation field. The article can help the readers to deeply understand how data management is currently applied in networked industrial environments, and select interesting open research opportunities to pursue.


Data Management in Networked Industrial
Environments: State of the Art and Open Challenges Theofanis P. Raptis, Andrea Passarella, and Marco Conti Abstract-Information and communication technologies are permeating all aspects of industrial and manufacturing systems, expediting the generation of large volumes of industrial data. This article surveys the recent literature on data management as it applies to networked industrial environments and identifies several open research challenges for the future. As a first step, we extract important data properties (volume, variety, traffic, criticality) and identify the corresponding data enabling technologies of diverse fundamental industrial use cases, based on practical applications. Secondly, we provide a detailed outline of recent industrial architectural designs with respect to their data management philosophy (data presence, data coordination, data computation) and the extent of their distributiveness. Then, we conduct a holistic survey of the recent literature from which we derive a taxonomy of the latest advances on industrial data enabling technologies and data centric services, spanning all the way from the field level deep in the physical deployments, up to the cloud and applications level. Finally, motivated by the rich conclusions of this critical analysis, we identify interesting open challenges for future research. The concepts presented in this article are based on an exhaustive research of recent scientific publications and thematically cover the largest part of the industrial automation pyramid layers. Our approach is multidisciplinary, as the selected publications were drawn from two fields; the communications, networking and computation field as well as the industrial, manufacturing and automation field. The article can help the readers to deeply understand how data management is currently applied in networked industrial environments, and select interesting open research opportunities to pursue.
Index Terms-Data Management, Industrial Networks, Manufacturing, Industry 4.0.

I. INTRODUCTION
T HE manufacturing industry needs to lead innovations to face the global competitive pressures in the advent of intelligent manufacturing across the broad range of manufacturing sectors [1]. The fourth industrial revolution, or Industry 4.0 (I4.0), which is being realized in the recent and next years, is expected to deeply change the future manufacturing and production processes, and lead to smart factories and networked industrial environments that will benefit from its main design principles: interoperability, virtualization, decentralization, distributed control and communication, real-time capability, service orientation, quick and easy maintenance, T. P. Raptis  low cost, and modularity [2]. In modern industrial applications however, traditional centralized point-to-point control and communication cannot be suitable to meet the increasingly challenging new requirements [3]. For this reason, most members of the I4.0 community think in terms of decades rather than years as to when the full I4.0 vision will become state-of-the-art [4]. The I4.0 is highly heterogeneous; in fact it is the aggregation point of more than 30 different fields of the technology [5].
In order to address the upcoming challenges of I4.0, several pivotal technological enablers have emerged (Fig. 1). Novel assembly lines used in the production process are expected to boost the reconfiguration of automated manufacturing systems and provide robust operation and short production lifecycles needed by manufacturing firms so as to stay competitive in the marketplace [6]. The industrial Internet of Things (IIoT) and the industrial cyber-physical systems (ICPS) utilization in industrial settings are expected to revolutionize the way enterprises conduct their business from a holistic viewpoint, i.e., from shop-floor to business interactions, from suppliers to customers, and from design to support across the whole product and service lifecycle [7]. The cost decrease coming from industrial robot integration in the production process towards mass customization is expected to further improve the robot transparency and promote human-robot collaborations, just as if they were human-human collaborations, since the robot will have ideally the same set of skills and requirements as a human co-worker [8]. Wireless sensor and actuator networks (WSAN) are able to provide remote monitoring and control of factory plants and machines for the sake of reducing potential equipment failures as well as improving the industrial efficiency and productivity [9]. Networked contol systems (NCS), which connect cyberspace to physical space enabling the execution of several tasks from long distance, eliminate unnecessary wiring reducing the complexity and the overall cost in designing and implementing industrial solutions [10]. The improvements coming from novel customized protocol arXiv:1902.06141v1 [cs.NI] 16 Feb 2019 stacks in machine-to-machine (M2M) communication, which achieve multi-gigabyte per-second data rates, submicrosecond latencies, and ultrahigh reliability, are expected to approximate the I4.0 requirements [11].
On top of those technological enablers, groundbreaking services will further boost the I4.0 vision (Fig. 2). Big data analytics, machine learning and semantic modeling are expected to make industrial integration easier because the typical data integration involves a lot of data volumes, traffic, mappings and conversions among different data formats [12]. Decision making, job scheduling and human-in-the-loop approaches are expected to constitute a kind of hybrid control systems with a dynamic structure and distributed intelligence capable of meeting industrial needs and rapid market changes [13]. Augmented reality (AR), virtual reality (VR), camera and vision identification services are expected to [14] mimic the human information processing system in order to take advantage of and interpret the ambient industrial environment. Prognostics and prediction processes, anomalies detection and fault diagnosis are expected not only to enable the collection of data, but also to support advanced analytics to extract useful insights with high returns on investments in the manufacturing industry [15]. Last but not least, local or global cloud integration, smart energy management and increased security solutions are expected to horizontally fortify a more sustainable production process [16].

A. The crucial role of data
The natural evolution of those industrial technological enablers and services leads to the generation of huge amounts of data; data of many different volumes, traffic and criticality. Data will serve as a fundamental resource to promote I4.0 from machine automation to information automation and then to knowledge automation. In the past several decades, large amounts of data have been generated in the industrial environments, through to the wide use of networked control systems (NCS). At the very beginning, those large amounts of data have rarely been used for detailed analyses, which were instead only used for routinely technical checks and process log fulfillments. Later, awareness of the importance in extracting information from data has taken a leading role for the I4.0 [17]. This is because there has been an exponential increase in the number of data sources, both archival and in real time. However, data is not equal to value and consequently, to create value with data, one needs data processes which facilitate data reduction to actionable items thus creating value [18].

B. Contributions of this survey article
This article surveys the literature over the period 2015-2018 on data enabling industrial technologies and data centric industrial services from the point of view of data management as it applies to networked industrial environments and identifies open challenges for the future. A thorough research in two categories of important journals has been conducted, based on two different but complementary groups of scientific fields: • Communications, Networking and Computation • Industrial, Manufacturing and Automation Fig. 3 displays the primary sources of information for this article, identified after an exhaustive literature research. There are some articles coming from some other sources as well, but the list of Fig. 3 represents the sources from which the critical mass of the references of this article were drawn. The choice of reported articles is highly selective, due to the fact that in order to be included, an article needs to provide new knowledge on a technological enabler, service, architecture or methodology directly applied on industrial environments. For this reason, a large portion of related literature which investigates similar concepts, but on environments other than industrial, has purposefully been excluded from the current survey.
Although there are existing surveys which cover some datacentric aspects of industrial processes, like industrial data management, data-driven manufacturing and cloud manufacturing, to the best of our knowledge, there is no existing survey that covers horizontally, in a holistic way, diverse aspects of data management in heterogenous networked environments of industrial deployments. Consequently, this is the first comprehensive survey which discusses data management in networked industrial environments in a broad view, exposing different use cases, technologies and services that can facilitate the management of distributed data. A comparison to other published surveys is provided in section II. The major contributions of  1) An extraction of data properties (volume, variety, traffic, criticality) and an identification of the corresponding data enabling technologies in different I4.0 fundamental use cases, based on practical applications, is provided (section III). 2) A detailed outline of recent I4.0 architectural designs with respect to their data management philosophy (data presence, data coordination, data computation) and the extent of their distributiveness (section IV). 3) A holistic survey and taxonomy of the latest I4.0 data enabling technologies (section V-A) and data centric services (section V-B), spanning all the way from the field level deep in the physical deployments up to the cloud level. This outline is based on an exhaustive research of recent publications and covers the largest part of the I4.0 automation pyramid (Fig. 2). 4) A discussion on future interesting open research challenges regarding data management in networked industrial environments (section VI). To the best of our knowledge, such practical survey for data properties, management, technologies and services, for industrial networked environments, coming from recent research contributions does not exist in previous works. The roadmap of this article is displayed in Fig. 4.

II. COMPARISON WITH EXISTING RELATED SURVEY ARTICLES
The purpose of this article is to provide a holistic overview on data management as it applies to networked industrial environments. Although both data management and industrial networks are quite vibrant research fields, they are rarely mentioned together in a holistic manner. To the best of our knowledge, this is the first time that the topics of data management on the industrial networking realm are systematically extracted, dissected, categorized and put together in a survey article, hence bridging the gap between these two seemingly disconnected yet highly complementary paradigms. There exist, however, several published works that cover in depth multiple niche areas found in our survey. In fact, some of them explore several data centric aspects, but for focused application areas, services and technologies. This section will provide an overview of some of those relevant studies. Table  I displays the comparison with other survey articles focusing on networked industrial environments.

A. Industrial data management
The most relevant to this article surveys investigate industrial data management. In [19], the authors present a survey on the IIoT aspects of large-scale petrochemical plants as well as recent activities in communication standards for the IoT in industries, with a slight flavor of data management. The article addresses the key enabling middleware approaches, e.g., and highlights the research issues of data management in the IoT for large-scale petrochemical plants. As such, it is entirely focused on this specific use case. In [20], the authors provide a survey of the recent developments in data fusion and machine learning for industrial prognosis. To this end, a principled categorization of feature extraction techniques and machine learning methods is provided. This analysis is highly focused on the data centric services of machine learning, data fusion and prognostics. Different from those works, we investigate data management aspects in a much wider spectrum of use cases and data centric services.

B. Data-driven manufacturing
Another group of relevant articles is the surveys investigating data-driven manufacturing. In [21], the authors focus on highlighting the major specificities of data engineering and the data-processing difficulties which are inherent to data coming from the manufacturing industry. They specifically emphasize on the data centric services of machine learning and deep learning and consequently the survey is highly focused both in terms of use case and in terms of services. In [22], the authors aim to provide an overview of data-based techniques with recent developments focused on the industrial closedloop applications like process monitoring and control. Another overview on the model-based control and data-driven control methods is presented in [23]. Those two articles focus entirely on control related issues.

C. Cloud manufacturing
In [24] and [26], the authors survey the state of the art in the area of cloud manufacturing, identify recent concepts, implementations and technologies, and discuss potential research trends and opportunities. In [25], the authors provide a review of the more specific field of virtualization and cloudbased services for manufacturing systems and of the use of big data analytics for planning and control of manufacturing operations. Although those surveys incorporate some data related concepts, they focus their investigation on the cloud layer of networked manufacturing environments and explore a specific subset of related technologies and services.

D. Industrial wireless standards
As wireless technologies penetrate more and more the manufacturing landscape, industrial wireless standards are emerging. [27] discusses key aspects of the four most popular industrial wireless sensor network standards: ZigBee, Wire-lessHART, ISA100.11a, and WIA-PA. The detailed design and protocol architectures are comparatively examined. [ [30] reviews the scheduling mechanisms for 802.15.4-TSCH and slow channel hopping MAC in low power industrial wireless networks. It categorizes the numerous existing solutions according to their objectives (e.g. high-reliability, mobility support) and approaches and identifies some open challenges, expected to attract much attention over the next few years. All those studies provide an interesting glimpse into the standardization domain for industrial networked environments, but, naturally, their focus is highly specific and is very different from the holistic approach focusing on data management which is presented in our survey.

E. IIoT technologies
Due to the fact that IIoT is a core technological enabler for the realization of I4.0, there is a significant number of surveys that report on various IIoT aspects. [31] provides an overview of the Industrial Internet with the emphasis on the architecture, enabling technologies, applications, and existing challenges. More specifically, it investigates the enabling technologies of each layer that cover from industrial networking, industrial intelligent sensing, cloud computing, big data, smart control, and security management. Moreover, it discusses the application domains that are gradually transformed by the Industrial Internet technologies, including energy, health care, manufacturing,  [32]. A careful evaluation of industrial systems, deadlines, and possible hazards in industrial atmosphere are discussed. The primary objective of [33] is to explore the state of the art as well as the state of practice of I4.0 relating technologies in the construction industry by pointing out the political, economic, social, technological, environmental and legal implications of its adoption. The recent advancements in FPGA technology, emphasizing the novel features that may significantly contribute to the development of more efficient digital systems for industrial applications are presented in [34].Various proposed controllers for high-mix semiconductor manufacturing processes are surveyed in [35] from an application and theoretical point of view. Remaining challenges and directions for future work are also summarized with the intent of drawing attention to these problems in the systems and process control communities. In [36], a comprehensive survey of IIoT technologies has been presented, including IIoT architectural approaches, applications and characteristics, existing research efforts on control, networking and computing systems in IIoT, as well as challenges and future research needs. Finally, in [37], the authors provide an overview of the standards used to implement industrial WSANs and discuss the characteristics of the wireless channel in industrial environments. Different to the current survey, all those articles have an exclusive focus on a subset of technological enablers, IIoT and WSAN technologies.

F. Industrial cognitive radio
This is a specialized group of survey articles, which we report in order to provide a complete list of relevant existing survey articles. The relevance to data management is minimal, but, nevertheless, the core technological enabler is already applied to industrial networked environments. [38] summarizes cognitive radio methods relevant to industrial applications, covering cognitive radio architecture, spectrum access and interference management, spectrum sensing, dynamic spectrum access, game theory, and cognitive radio network security. [39] highlights and discusses important QoS requirements of IWSN as well as efforts of existing IWSN standards to address the challenges. It also discusses the potential and how cognitive radio and spectrum handoff can be useful in the attempt to provide real-time reliable and smooth communication for IWSNs.

G. Scheduling and synchronization
An interesting higher level application for the I4.0 is the scheduling and synchronization of multiple factories. [40] provides a review on the multi-factory machine scheduling. It classifies and reviews the literature according to shop environments, including single machine, parallel machines, flowshop, job shop, and open shop. The concept of proximity is used to analyze synchronization between suppliers and the construction site. [41] presents a framework for explaining I4.0 concepts that increase or reduce proximity. The authors find that Industry 4.0 technologies mainly influence technological, organizational, geographical and cognitive proximity dimensions. [42] gives a review on recent advances on the analysis and design of fuzzy-model-based nonlinear NCS with various network-induced limitations such as packet dropouts, time delays, and signal quantization. With these network-induced constraints, the developments on various control and filtering design issues are surveyed in details, and some essential technical difficulties are mentioned.

H. Product-service systems
Product-service systems are business models that provide for cohesive delivery of products and services. Product-service system models are emerging as a means to enable collaborative production and consumption of both products and services, with the aim of pro-environmental outcomes [46]. They are thus an important application on the top of the I4.0 automation pyramid. [43] is dedicated to the systematic status survey on product-service systems requirement management. The results of this work provides references for future research in the area of product-service systems development, with the aim of offering integrated and holistic requirements management for product-service systems. It analyzes the state of the art of requirements management for product-service systems by reviewing extensive literature of requirement identification, analysis, specification, and forecast. [44] reviews multiple defect types of various inspected products which can be referenced for further implementations and improvements. The objective of [45] is to provide a comprehensive literature review on recent research and development in product modeling from three perspectives: product knowledge in product representation, distributed computing in information technology, and product lifecycle in product development process. Contrary to our survey, this group of articles is distant both from data management and from industrial networking technologies. However, it is worthy having it reported, as it is a nice example of I4.0 post-production applications.
In summary, our survey attempts to give a holistic review of the state-of-the-art regarding data management as it applies to networked industrial environments. The review is centered around a plethora of technologies and services brought forth by the relevant I4.0 use cases and architectural designs, and provides a more recent view of the industrial data management field. Our article is an ambitious effort to capture the interplay between data management and networked industrial environments, instead of delving into one particular data centric service or one data enabling technology exclusively. The motivation behind this survey is to provide researchers coming from both the communications/networking/computation fields and the industrial/manufacturing/automation fields a glance of the intersection between these two domains at a higher level.
III. DATA PROPERTIES OF FUNDAMENTAL I4.0 USE CASES In this section, we provide a thorough extraction of data properties in different I4.0 fundamental use cases, based on practical applications reported in recent research contributions. To the best of our knowledge, such practical extraction, coming from real world applications and reports does not exist in previous work for the reported activity period. At the same time, we identify the basic set of technological enablers that are needed for the realization of those important use cases, and we use them as a compass for the follow-up analysis which is presented in section V. The extracted data properties about the use cases are displayed in Table II. Our interest is to extract three specific data properties, in order to understand the data requirements in recent I4.0 use cases. The four data properties we focus on are the following: 1) Data volume: The size of the data to be circulated in a network environment is of crucial importance to the network design and the technological enablers used in the deployment. In industrial networked environments there can be a diversity of data volumes, depending on the scope of each use case. We label as data of small volume the data of lower sizes, such as sensor measurements, of medium volume the data of higher sizes, such as images or sound files, and of big volume, the data of the highest sizes, such as videos and detailed 3D representations. 2) Data variety: The diversity of the data can also be variable, according to the use case. We label as diverse the data variety in use cases where different kinds of data are needed and as uniform the data variety in use cases where similar kinds of data are needed. The data variety can significantly affect algorithmic decisions and service provisioning when targeting efficient solutions per use case. 3) Data traffic: Different data varieties, as well as different data generation velocities and use case requirements can lead to diverse traffic patterns in an industrial networked environment. Although deterministic solutions for traffic regulation have started becoming mature for various types of wired industrial deployments, the wireless part is still facing great challenges and comes hand in hand with strict I4.0 requirements. Communication support for industrial automation is challenging in wireless environments as the lossy nature of radio links and node unreliability greatly affects the performance of real-time data delivery. We label as intense the data traffic in a network where large amounts of data have to be generated and delivered in small amounts of time, in many cases without predefined global schedules, typically leading to various networking problems necessitating algorithmic solutions for traffic management. On the other hand, we label as mild the data traffic in a network where data can be circulated without serious problematic phenomena. 4) Data criticality: Data that are not managed according to the underlying I4.0 requirements may adversely affect the performance of system monitoring, control and safety. For example in chemical plant, the chemical leakage must be informed in predefined times [47]. This inherent importance separates the data in two major categories, critical and non-critical data. We label the first category as data of high criticality and the second category as data of low criticality. Based on the extracted data properties, we differentiate the use cases in two categories: on the one hand we have the use cases which necessitate a combination of multiple "heavy" accomplishments in terms of data requirements and on the other hand we have the use cases with "light" data properties. The most important industrial use cases that we identified in the recent literature are the following.
A. Use cases necessitating high data efficiency 1) Oil / Gas: Large-scale petrochemical plants incorporate dense wireless devices such as RFID tags for machine identification, sensors for large-scale rotational equipment monitoring and fault diagnosis, and employ IIoT technologies for tight and seamless integration between lower layer components, such as sensors and actuators, to the higher level connected with the cloud platforms [19]. In order to ensure the safety of production sites in large petrochemical industries [49], and long interconnected gas networks [50] those sensorial artifacts are positioned around gas pipes, targeting 24/7 monitoring. Data generated by the wireless sensors about parameters and abnormal events are processed for decision making thereby improving production, predicting maintenance and failures for the industrial equipment. Data usually come from sensor devices in small volumes, typically including sensor measurements of various types. Although the variety can be limited to the various sensor readings, there can be increased wireless traffic in the network; a result of thousands of sensors operating simultaneously both in real-time and periodically. The use case offers a mix of both critical and non-critical applications. An example of the first is a gas leakage must be informed as soon as possible. An example of the second is the predictive maintenance of a set of gas pipes over an interval of some years.
2) Automotive: In the last two decades, distributed embedded electronic applications have become the norm in a large part of the automotive assembly industry. Due to critical requirements and the distributed nature of the various ECUs implementing assembly functions, the validation of end-to-end timing constraints on those networked industrial environments has become an important part of the design process of a car [52]. In addition to existing stand-alone solutions, cooperating networked information and control systems are increasingly used as tools for the coordination of this challenge for production support [53]. The volume of generated data can vary in the automotive production process, providing also a great range of diversity. For example, there can be small volumes of data (positioning systems with various sensors for determination of the exact position of vehicles, tools, resources and processes), as well as big volumes of data (assembly assistance system through monitors or data glasses which guide the workers during their working process, by exploiting audio-visual data). The majority of the generated data is usually distributed via wired deterministic networks, and for this reason the traffic can be regulated in an offline, centralized manner. For the same reason, the data criticality is not significantly high in this type of use case.
3) Marine Vessels: Today's shipbuilding industry is characterized by one-off manufacturing and complex construction processes, and as such, it is difficult to estimate a construction process at the planning stage and many diverse problems are involved, such as backorders and over-loaded capacity between consecutive processes [54]. Data processing, can be used for fault detection and diagnosis in such complex industrial processes, starting from the construction stage of a marine vessel and finishing at its running operation [55]. Sensing technology is a cornerstone for many industrial applications, including preventative equipment maintenance, both inside fabrication plants and onboard the marine vessels [56]. Recent shipbuilding industry advancements introduce production management methodologies and a pre-verification in virtual environments. Related tools facilitate the traffic and criticality constraints on the production phase and lower their intensity [57]. Similar to the automotive industry, the volume of generated data can vary in the marine vessel production process, providing also a great range of diversity.
4) Asset Tracking: Mass production in manufacturing puts greater emphasis on real-time asset location monitoring which renders the sensor data to be of paramount importance. When location information can be associated with monitored contextual information, e.g. machine power usage and vibration, it can be used to provide smart monitoring information, such as which components have been machined by a worn or damaged tool [58]. RFID is the most commonly utilized product tracking and automation technology, especially useful in the supply chain industry [59], as well as in more specialized industries of asset tracking like identification of individual farm animals [61]. The generated data can be diverse over all asset tracking applications, but usually only one tracking method is used for each individual application, leading to a uniform data variety. The volume of the data also varies per application, coming from some simple RFID readings in product tracking to images or videos in farm identification. The data criticality is low, as the related data processing and calculations are conducted a posteriori. 5) Customized Assembly: Serial assembly lines are mainly used for large scale production since they can provide short cycle times and high production rates with high efficiency in terms of cost, time and quality. In pursuit of flexibility, different paradigms have been investigated in terms of automation level and production system organization [63], like customized assembly lines. IIoT integrates the key technologies of industrial communication, computing, and control so as to provide a new way for a wide range of assembly resources to optimize management and dynamic scheduling [62]. With the technological enablers on flexible assembly lines ranging from IIoT and ICPS to robotic bimanipulators, NCS and moving robots, it is natural that there is a great diversity of data resources to be analyzed. The volumes of data significantly differ from application to application. For example, in the case of mobile robotic assembly, large volumes of motion data are IIoT, NCS, Assembly Line small / big diverse mild / intense low usually exchanged between the different controllers for further data fusion, while in the case of custom part identification, smaller identification data are needed. This use case family is usually characterized by a high criticality factor, due to the fact that the assembly process has to be quick and accurate, affecting accordingly the related data processes.
B. Other usecases 1) Crane Scheduling: Container terminals have to improve their service efficiency to seek the optimal trade-off between energy-saving and service efficiency improvement. Since the energy consumption and service efficiency of container terminals are mainly contributed by the handling cranes, the scheduling of the handling cranes is critical [64]. Moreover, with the increase of sizes of container vessels, container terminals are encountering another challenge, i.e., the rapid handling of containers for mega-vessels. Thus, container terminals must shorten the vessel turnaround time, which is an influential factor of their service level [65]. Due to the fact that the necessary computations are conducted in an offline manner, usually via optimization modules, the data properties of this use case are simple. An input module, which is the basis for generating crane schedules and evaluating the schedules, consists of two data parts: static data and dynamic data. The static data part include all parameters such as the handling volume of each container, the time window on each container and the handling efficiency of each crane. The other parameters are used for evaluation, such as the cost of unit energy consumption. The dynamic data include all decision variables, which are generated by the optimization module.
2) Refridgerated Warehouses: Changing the cold storage temperature set points of the refrigerated warehouses will cause the reduction of product quality and further increase economic costs to the industrial consumers. Reduction of the electricity price on the grid, the total costs of maintenance, and the total energy consumption comparing has recently been a target objective of operations research [66]. This use case is characterized by small volumes of sensor data (mainly temperature), periodically sent to a central control station for long term planning.
3) Healthcare monitoring: Industrial manufacturing has recently started embedding new functions in the form of safety monitoring or smart factories. Another recent trend of interest is the combination of heterogeneous services from different fields for providing automated healthcare services in industrial environments [67]. As with typical monitoring use cases, the data come in small volumes, from a range of different but limited sensors targeting long term or real-time healthcare optimization. 4) Production Control: Controlling the various stages and processes during the production process has attracted a widespread research interest in various areas, ranging from the shop floor with vibration control [68], PLC design control [69] up to the application layer with economic optimizations [70]. Depending on the layer of the industrial integration we are considering, data volumes can be small or large, and the related traffic in the networked environment low or high.
IV. DATA MANAGEMENT TRENDS IN RECENT I4.0 ARCHITECTURAL DESIGNS In this section we attempt to place recent architectural innovations in the broader context of networked industrial environments by surveying the fundamentals of both recently proposed I4.0 enabling architectures and by extracting the data management philosophy of these architectural alternatives. The section's primary emphasis concerns data related concepts, rather than specific architectural constructs. A number of research teams have proposed the development of relevant architectures which incorporate either directly or indirectly some kind of data management interfaces and control mechanisms across one or more architectural layers. For the reported period, 2015-2018, the most important I4.0 enabling architectural designs have been presented in [71]- [95].
The data management information is displayed in Table III. We aim at extracting three specific data properties, in order to understand the recent trends in recent I4.0 architectural design. Meanwhile, we also identify the major supported technological enablers per architectural design. The three data properties we focus are the following: 1) Data presence: Data can be acquired from specifically defined, localized sources, or from pervasive data generators. We label the first category as localized data presence. This category usually includes (but is not limited to) data generation sources such as fixed robotic manipulators in a factory environment, stationary network controllers, servers, office workstations, and fieldbus masters. We label the second category as ubiquitous data presence. This category includes (but, again, is not limited to) workers' portable devices, IIoT enablers, sensors and actuators with uncertain communication patterns and online third party data sources (e.g., via Internet). 2) Data coordination: Coordination of the industrial processes, based on the input data, can be performed by global or local process (or network) managers. In the case of involvement of local managers, usually hierarchy is applied, where the coordination is structured among different layers of managers. We label the first case of global managers as centralized coordination and the second case of local managers participating in hierarchical managing as hierarchical coordination. The most usual trade-off that exists between the different types of coordination is balancing the effect of central control on the network over the minimization of important metrics such as end-to-end data delivery delay and energy consumption. 3) Data computation: Computation tasks over the received data can take place either on central entities with significant computational abilities (which may or may not coincide with the coordination managers) or on a large part, or all, of the devices available in the architectural design. We label the first method as concentrated computation and the second method as distributed computation. Following the concentrated computation model, implies stronger computational power located on single computational components, while following the distributed computation model implies that computation components are located on different networked computers (usually of lower computational ability compared to the concentrated computation case), which communicate and coordinate their actions by passing data to one another. As with typical distributed systems, the three significant characteristics of distributed computation in I4.0 are concurrency of computations, lack of a global clock, and independent failure of the computational devices. For this reason, usually, a failure in the concentrated computation case can lead to much higher failure impact on the industrial processes. A conclusion drawn by the information extracted by the relevant articles and provided in Table III is that the architectural trends can be classified in two distinct categories, each one with their respective data management philosophy. On the one hand, we have a set of architectures dealing mostly with localized data, coordinating the industrial devices in a centralized manner and providing a mix of either concentrated or distributed computing. The basic data enabling technologies for those architectural designs are the assembly line and the industrial robots. On the other hand, we have a set of architectures dealing mostly with ubiquitous data presence, with a twist on coordination towards a hierarchical manner, providing again a mix of centralized and distributed computation. The basic data enabling technologies for those architectural designs are IIoT / ICPS, and WSAN. This distinction in two categories of architectural data management makes clear also the diversity of the two research fields (Communications/Networking/Computation and Industrial/Manufacturing/Automation), as well as the necessity of a convergence between the two fields in order to address the I4.0 requirements with common tools and methodologies. This fact is identified as an open challenge for the future and is also presented in section VI-D.

A. Architectures focusing on assembly line and industrial robots
In [72], the authors introduce an architecture for the design and customization of product families. Specifically, they design a formal computer-assisted approach that addresses the requirements for the design of product family architecture as identified by academia and industry. The suggested design is based on formal computational models which employ related centralized methods, not leaving much space for ubiquitous data presence and coordination.
In [74], the authors present an architectural design for interoperable end-to-end manufacturing which guarantees seamless interoperability, thus ensuring proper communication and data exchange between all the partners in a manufacturing network throughout the entire manufacturing life cycle, from supplier search to manufacturing execution and monitoring. In terms of data presence, although the data can lie on different physical locations (e.g., different factories) we consider the layout as localized, since it is perfectly defined beforehands where, when and how the data will be accessed by the platform provided in the architecture.
Cloud manufacturing has been a vibrant field for architectural research. In [77], the authors argue that existing cloud manufacturing models operate in a centralized way through a cloud manufacturing platform, the management of which is identified as a critical part of the manufacturing cloud operation, and strive for decentralization. In fact, they propose a decentralized network architecture which builds upon the concept of autonomous work systems for use as service providers (Fig. 5). In this design, data can be generated from various sources, even from third-party online knowledge clouds and the various computations can happen in different cloud services, with a decentralized coordination, distributively among the users. In [93] the authors introduce the concept of a cloud manufacturing framework with auto-scaling capability, aiming at providing a systematic and rapid development approach for building cloud manufacturing systems. Contrary to [77], the design of [93] provides a structured and centralized bulletin board data exchange mechanism, serving specifically defined data. However, due to the fact that workers are involved in the design, the number of which varies from time to time (due to the auto-scaling mechanism of the cloud manufacturing framework), the data presence can be considered as ubiquitous also in this case.
In [82], the authors investigate how to find the optimal manufacturing service composition path from a service composition network. In order to satisfy the specific demands of manufacturing service composition, they provide a design which solves two problems: how to design the appropriate QoS evaluation model to depict the manufacturing service composition based on networked collaboration, and how to improve the existing service composition method to deal with the rapid increase of candidate service composition solutions. The structure of service supporting system they propose is highly centralized, with regulated coordination and computation of the data resources, which come on the one end from manufacturing, lab and management sources, and on the other end from service requestors.
In [84], the authors introduce a service oriented architectural framework that supports a new programming paradigm for designing dynamic distributed manufacturing systems. The framework supports concurrency and reactivity of multiple computing machines that run data computations asynchronously with each other. Each machine is potentially running concurrent software behaviors that need to execute in synchronously with each other. The entire coordination of the operations is regulated by a master controller.
In [86], the authors design an architecture to integrate modules of two industrial standards, IEC 61131-3 and IEC 61499, allowing the exploitation of the benefits of both. The proposed architecture is based on the coexistence of control software of the two standards. As both standards refer to PLCs and control systems, the presence, coordination and computation of data are fundamentally concentrated.
In [92], the authors propose a layered architecture which covers five critical aspects of computer integrated manufacturing, separated in five architectural layers: physical, functional, managerial, informational and control. Although the holistic design of this architecture is hierarchical and each layer is a separate entity from the other layers, the intra-layer functions regarding coordination and computation can be considered focused on central entities.
In [71], the authors present a general framework for mobile robot navigation in industrial environments in which the openloop behavior of the robot and the specifications are based on automata. A modular supervisory controller ensures the correct navigation of the robot in the presence of unpredictable obstacles and is obtained by the conjunction of two supervisors: a first one that enforces the robot to follow the path defined by the planner and a second one that imposes other specifications such as prevention of collisions, task and movement management, and distinction between permanent and intermittent obstacles. The data related components are highly centralized both in the planning and in the supervising process of the robot.

B. Architectures focusing on IIoT / ICPS, and WSAN
In [81], the authors introduce a hybrid wireless communication and data management architectural design (Fig. 6). This design is coined as hybrid due to the fact that it is actually a multi-tier network architecture in which distributed communication and data entities interact in order to coordinate their decisions in a hierarchical manner and ensure the correct operation of the whole network. Devices scattered in the network deployment have the ability to perform local computations, lightening the burden of local and global managers by offloading data and computation. The architecture is designed to support ubiquitous data existence in various types of industrial environments.
In [76], the authors present an energy-efficient architecture for IIoT deployments, which consists of an IIoT nodes domain, RESTful service hosted networks, a cloud server, and user applications. This architecture focuses on the IIoT domain where large amounts of energy are consumed by large numbers of nodes. The architecture includes three layers: the IIoT layer, the gateway layer, and the control layer (Fig. 7). Unlike other hierarchical deployment schemes like [81], in this architecture direct communications between IIoT nodes are not allowed. Also, the gateway nodes are always used as central computation entities and the control node as coordination entity, allowing IIoT nodes to not necessary to implement sophisticated hardware or run complicated routing mechanisms, thus reducing computational complexity and system cost.  In [78], the authors argue that a convergence between deterministic industrial networks and best effort IIoT should occur and support low latency and jitter, and based on this argument, they provide an architectural design for a deterministic IIoT core network consisting of many simple deterministic packet switches configured by an SDN control plane. Although there is a pervasive presence of data due to the IIoT support, the determinism imposes a highly centralized data coordination and schedules computation.
In [80], the authors propose a closed loop design in order to facilitate the deployment of fully automated wireless networked control systems. The topology of the architecture consists of a plant system having sensor and actuator nodes, a controller system having input and output nodes, an intermediate network system having interconnected nodes, and wireless communication links for the information transfer between the different nodes (Fig. 8). The data presence in this setting is ubiquitous, as data can be received by a wide number of sensors placed in the network. However, both the computation and the coordination is taking place centrally at the controller system, which uses the input nodes to receive information and the output nodes to provide controller decisions.
Service-oriented modeling has attracted a lot of attention in the I4.0 architectural design community. In [75], the authors suggest a service oriented architecture which exposes objects' capabilities by means of web services, thus supporting syntactic and semantic interoperability among different technologies. WSAN devices and legacy subsystems cooperate while orchestrated by a manager in charge of enforcing a distributed logic. The architecture supports dynamic spectrum management, distributed control logic, object virtualization, WSANs gateways, a SCADA gateway service, and data fusion transport capability. In order to implement those functionalities, a hierarchical coordination scheme has been followed with different kinds of managers provided as reusable core software components. The middleware's virtualization layer enables the architecture to support pervasive data access and management. In [83], the authors suggest another service oriented architecture, targeting structured migration of process control systems. The argue that although today's control systems are typically structured in a hierarchical manner, there are nevertheless non-resolved challenges with respect to various fundamental migration functionalities. The suggested approach combines distributed computation abilities with a per-layer centralized coordination, handling data coming from ubiquitous data sources like WSANs. A particular note about this design is that the coordination can also be viewed as decentralized, if we consider the entire system definition and if we do not examine each architectural layer individually. In [91], the authors argue that the scope of I4.0 shall be defined by considering the major value chains and in order to achieve this they introduce a design and the basic process to achieve a reference model for I4.0 service architectures. The design relies upon the assumption that a reference model should take into account existing reference models for distributed processing as well as those of the Internet of Service and IIoT. This architecture provides a computational modularity which enables distribution through functional decomposition of the system into objects which interact at interfaces.
In [87], [95], the authors introduce two different, yet complementary hierarchical data transmission architectural designs for WSAN and smart factories. Those architectures constitute an ideal example of pervasive data generation, as data are received from a wide variety of stationary and mobile sources, such as automatic guided vehicles, mobile workers' devices and WSANs. Hierarchical coordination lies at the core of those designs as well as the decentralized computation through subnetworks formation, leader election algorithms and mobile intelligence units.
In [88], the authors introduce a distributed modeling framework for plant-wide process monitoring. Based on this framework, the plant-wide monitoring process is decomposed into different blocks, and statistical data models are constructed in those blocks. The data obtained from different blocks are integrated through a centrally located decision fusion algorithm. Due to the large volume of the pervasive plantwide data generation, the authors note that unlike traditional industrial processes, several new data characteristics should be paid attention to in the plant-wide process: the data volume in the plant-wide process is larger, different types of data can be obtained, sampling rates of process variables are always different from each other, and the density of the collected data from the plant-wide process may be quite low.
Finally, in [73], the authors, rather than presenting a concrete architecture, are providing the future I4.0 architectural insights, based on current designs and future trends, focusing on TSN and 5G designs. Although their analysis includes different vertical integration layers (which enable ubiquitous data presence), it seems that the data coordination and the relevant computations are considered centralized, for the sake of ultra-high reliability.
V. DATA ASPECTS OF I4.0 TECHNOLOGIES AND SERVICES In this section, we provide a holistic outline of the latest I4.0 data enabling technologies and data-centric services, that were identified through the exhaustive state of the art research, spanning all the way from the field level deep in the physical deployments up to the cloud level. Fig. 2 visually displays the partitioning of the networked industrial environment building blocks in two fundamental planes: data enabling industrial technologies and data centric industrial services. It is visible that each building block can have thematic and functional overlaps with other building blocks that lie in its proximity. This is natural, and is due to the interplay between modern technologies and services. The articles that we have identified and present in this article on I4.0 technologies and services are displayed in Fig. 9. In fact, the information presented in Fig. 9 provides a concise classification in the two categories of the recent research works.
A. Data enabling industrial technologies 1) IIoT / ICPS: Industrial networked environments are composed of the physical part, which performs the physical processes, and networks of IIoT devices, which perform the computational processes required to control the physical ones. The cyber part of the system is constituted by computational processes, which receive data from the physical processes, calculate the required outputs and apply them to the physical plant [118], providing and using, at the same time, data accessing and data processing services available on the Internet [120]. Due to the fact that production scheduling is optimized using objective functions based on punctuality criteria such as earliness and tardiness [117], significant part of those computations are taking place at the edge of the IIoT deployments, transforming edge computing in a fundamental type of computation, with contributions ranging from adaptive transmission optimization [109] to multiple gateway optimization [110]. Additionally, different IIoT deployments usually incorporate different communication and networking alternatives, such as WIrelessHART [105], RPL [126] and 6TiSCH [106], as well as frequent protocol conversions [103], operations which have to seamlessly exchange data with each other. Consequently IIoT and ICPS technologies enable intelligent, adaptive control with seamless vertical, horizontal and dynamic data exchange between heterogeneous platforms and networks, through an exhaustive use of data exchange, coordination and collaboration [119], as well as through recently proposed techniques like network slicing [114]. Important ICPS operations include fault management [121], clustering analytics [122], reusable software [123], as well as reactive test case generation [124] and modular reconfiguration [125]. Typical IIoT applications include predictive maintenance [100], where a successful network configuration is able to determine the condition of the in-service equipment in order to estimate when maintenance should be performed, real-time RFID monitoring [96], for tracking products in the assembly line. Other research issues include IIoT topology optimization [97], packet scheduling [102], and IIoT network construction and operation under massive multiple-input multiple-output M2M communication [113].
There have been some interesting recent data related advancements in the IIoT domain. In [98], the authors identify the need for data access control along the supply chain, especially when it comes to product data related to sensitive business issues, and they design a scalable industry data access control system that addresses these limitations. In [101], the authors present an industrial data exchange mechanism based on ZeroMQ for the ubiquitous data access in rich sensing pervasive industrial applications. This investigation highlights the major concerns in building a distributed industrial data system in a systematic manner. In [104], identify that most of the current data clustering techniques that could only deal with static data become infeasible to cluster the significant volume of data in the dynamic industrial applications, and introduce an incremental clustering algorithm by fast finding and searching of density peaks based on k-mediods, as a way to find the underlying pattern structures embedded in unlabeled data. Driven by the pursuit of green communication, the authors of [116] present a space reserved cooperative data caching scheme for IIoT, where the cache space in a base station is divided into two parts, one is used to store the prefetched data from the servers ahead of the device request time and the other is reserved to store the temporarily buffering data in the wireless transmission queue at the device request time. Timely data delivery is also another crucial data management issue in IIoT, and has been frequently combined with the optimization of other important metrics. For example, in [112], the authors provide a loss tolerant data delivery scheme with low energy consumption and end-to-end guarantees. In [127], the authors present a method for identifying and selecting a limited set of proxies in the IIoT network where data needed by the consumer nodes can be cached, so as to guarantee timely data access. In [115] they combine it with MAC layer improvements, in [111] with incremental time-triggered data flows, and in [99] with a fusion of relaying and data aggregation at the source nodes. Regarding this, there are multiple open challenges to address, such as security concerns (the specific case of DDoS mitigation was addressed in [108]), and estimation accuracy [107].
2) WSAN: WSAN are defined as a group of spatially dispersed and dedicated sensors and actuators for indoor [135] and outdoor [131] monitoring and recording of the physical conditions of the industrial environment. WSAN cooperatively deliver the collected data at a central location via single-hop or multi-hop communication [150]. WSANs measure environmental conditions like temperature, sound, pollution levels, humidity, and so on. In fact, WSANs are the base to establish a supervisory control and data acquisition system with the benefits of extending the network boundaries and enhancing the network scalability of the industrial environments [151]. Recent research interest in the data-driven industrial WSAN literature has been focused on a number of emerging problems. Localization achieved by using the available plant data in WSAN-enabled industrial environments is one of the problems addressed, both in terms of finding the optimal placement sensor locations in the industrial space space (with Delaunay triangulations [129] or particle swarm optimizations [149]) and of managing to effectively localize mobile robots [142]. The industrial environment that the WSANs operate in is very challenging because of dust, heat, water, electromagnetic interference, and interference from other wireless devices, which make it difficult for current WSANs to guarantee reliable realtime communication. For this reason several communication oriented performance improvements have been achieved. Such improvements include reliable communication slot assignment [128], autonomous channel switching for spectrum sharing [130], synchronization for nodes with imprecise timers [138], and real-time link quality estimation [144]. Cooperative data relaying schemes also facilitate secure and interference-free data management, with recent approaches employing fountaincoding aided transmissions [132] and belief function based cooperation [134]. Other interesting identified data-driven problems for industrial WSANs include neighbor discovery with mobile nodes based on distributed topology data [141], network isolation avoidance based on local energy data [145], distributed node clustering based on (among others) node similarity data [139], and coverage data hole healing [148]. Data routing improvements are also traditionally a core research aspect, recently with approaches targeting network stability based on nodal data [143], and reliable, SNR-assured, anti-jamming data transfers [147]. Cross-layer optimization frameworks have also been proposed for this technological enabler, with SchedEx-GA [133] (spanning MAC layer and network layer) attempting to identify a network configuration that fulfills all application-specific process requirements over a topology, and CLOC [137], attempting at maximizing the minimum resource redundancy of the network under system stability and schedulability constraints. Last but not least, datadriven learning with sensing data [136], delay and energy improvements with empirical data [140], [146] have also emerged as important research directions, especially with the introduction of local clouds in the production process.
3) NCS: NCS are control systems wherein the control loops are closed through a communication network. An NCS uses a network as a communication medium to connect the plant to a central controller [153]. The defining feature of an NCS is that control data and feedback data are exchanged among the system's components in the form of data through a network. The most important feature of NCS is that they connect cyberspace to physical space enabling the execution of several tasks from long distance. In addition, networked control systems eliminate unnecessary wiring reducing the complexity and the overall cost in designing and implementing the control systems. They can also be easily modified or upgraded by adding sensors, actuators. Usual types of such network communication are fieldbuses like CAN and LON, wired connections like IP/Ethernet, etc. Automated or semiautomated verification of access control is a necessary building block in NCS [152], and sampled-data control has been proven to guarantee their synchronization by reducing the updating frequency of the controller and the network communication burden [161]. Due to the difficulty in observing the full relationship among numerous NCS components, high-dimensional and sparsematrices describing partial relationships among them have been recently introduced [159]. NCS can also be used to connect different plants with solutions provided to achieve given specifications when there are communication delays and losses in communication networks linking central network controllers and the plants [154]. Data-driven network control is known Industrial Robots
4) Industrial Robots: Robot systems have been widely used in industry and also play an important role in human social life [189]. Industrial robot research can be classified in two categories (Fig. 10): stationary robots and mobile robots.
Tracking control of robot manipulators is a fundamental and significant problem in robotic industry [191]. Tracking control of robotic manipulators with uncertain kinematics and dynamics (gravitational torque, friction torque, moment of inertia and disturbance) is addressed using data-driven observer-based control designs [174], some of which providing convergence of tracking errors [175]. Preplanned path tracking corrections of robotic [164] or teleoperated manipulators [167] can be achieved through iterative learning control algorithms. Smaller robotic parts of larger potential constructs can be controlled distributively through redundant actuation (an example is provided in [170], for a tracking control of a joint). Energy and power efficient methods have also been presented, for a number of cost functions [180]. Manipulability optimization of redundant manipulators is shown to be achieved through dynamic neural networks [182]. Neural control is also applied in the case of bimanual robots (which are able to perform more complicated tasks that a single manipulator), resulting in guaranteed stability and precision [184], or in reduced vibrations [186]. Data delivery delay is also an important aspect, subject to minimization, shown to be decreased with practical and adaptive time-delay control schemes [190]. Coordination and cooperation control for networked mobile manipulators over a jointly connected topology with time delays is another topic that needs fast data delivery in the network [179]. Modular design has been proven helpful in the configuration of multirobot cooperation (for example in [187] for sewing personalized stent grafts).
Localization of mobile robots in industrial environments is a classic topic that will remain challenging in the I4.0 era. Mobile robots operating in indoor environments [169] can be localized with a combination of data coming from heterogeneous sensors, and those operating in outdoor environments [172], with a combination of ambient data (movement dynamics, velocity data, RSSI) High-precision probabilistic localization of mobile robotic fish can be achieved using visual and inertial cues [165]. Robot navigation in space is another major topic for data-driven research. Online navigation of humanoid robots has been proven feasible through multi-objective evolutionary approaches [166]. Wall-following trajectory control of hexapod robots can be realized via data-driven fuzzy control learned through differential evolution [168] and relevant uncertainties can be addressed with decentralizing this control with dynamic controllers [171]. Homing (mobile robot returns back to a reference home position) using just the visual information can be implemented by extracting coarse location data with respect to the reference position using a bit encoding algorithms [173]. Autonomous exploration using mobile climbing robot allow dangerous tasks to be completed more quickly and more safely than is possible with human inspectors [181]. Wireless charging helps mobile robot to become more and more autonomous and navigate easier [188]. Except for navigation, several approaches regarding other robot properties have been presented, such as balancing and velocity control [163] with in-wheel motors, human behavior transfer to robots through learning by imitation/demonstration [183] and visual servo regulation with simultaneous depth identification [192]. Robot collaboration and data sharing is also an emerging interesting research issue. Teleoperation control frameworks for multiple coordinated mobile robots through have been proposed using a brain-machine interface [177]. A particularly interesting topic in the mobile robots collaboration field field is the collaborative and adaptive data sharing. Collaborative robots are multirobot systems working together for the same industrial task such as robotic assembling. To achieve an efficient collaboration, robots require not only locally sensing the environmental data but also immediately sharing these data with neighbors. However, there exists a dilemma between the large amount of sensory data and the limited wireless bandwidth. The relevant problem of throughput maximization of sensory data sharing in collaborative robots has been studied in [176]. Another interesting topic which again necessitates distributed data exchange is the consensus problem. The consensus problem has experienced a fair amount of research interest, aiming at forcing a group of mobile robots to reach an agreement on a quantity of interest such as the rendezvous position, velocity, and heading direction [178]. Multiple robots can also collaboratively achieve a common coverage goal efficiently, which can improve work capacity, share coverage tasks, and reduce completion time [185].
5) Assembly Line: The assembly process is composed of several data intensive stages, namely, resource identification, resource recognition, data collection, data transmission, data mining, and feedback control [212]. Flexibility is critical for manufacturing firms to respond to demand uncertainty and achieve product customization. For example, in automotive plants, vehicles with multiple styles, models, and options can be made on the same production line. Similarly, computers with different configurations are assembled on the same line as well [206]. Similar observations are found in many other manufacturing systems, such as appliances, electronics, furniture, food, and are usually described by model-based processes [199]. However, replacing a resource or introducing a new product variant often requires manual integration work and considerable downtime. For this reason, automated systems for manufacturing need to adapt increasingly fast to the new [200]. Data is already playing a crucial role in customized manufacturing, as advanced systems are needed that analyze the assembly and use the plethora of data available at the shopfloor to generate highly flexible assembly sequences.
In order to increase the requested flexibility and boost the data availability in the production process, assembly lines are being evolved and are featuring new technological improvements. Some fundamental data-enabling advancements for the modern assembly lined include: Sensor data acquisition systems producing large amounts of small volume data [210], (3D) CAD/CAM systems and models producing considerable amounts of large volume data [202], simulation-based systems [220] for rearranging manufacturing facilities targeting material handling and costs minimization producing complex mathematical data [209], digital twins of physical products producing assembly orchestration data [216], as well as integrated ICPS producing coupled cyber-physical data [213]. In [196], the authors introduce a knowledge-based approach exploiting distributed declarative data and cloud computing and target data and software exchange and reuse, maximizing the potential to facilitate new business models for industrial solutions.
Real-time data operations for flexible manufacturing are becoming increasingly popular, are now in the core of the production process and are using different kinds of data. Real-time performance assessment of manufacturing systems by monitoring continuous and discrete variables of different machines is based on data extracted from factory machines [218]. Real-time monitoring of the production process is based on data (features) extraction and selection (for example, highpower disk laser welding in [219], with fifteen features extracted). Real-time production exception diagnosis is based on sensor data streams [207]. Real-time geometrical re-definitions of products in the assembly line are based on 3D data from CAD systems and models [215]. The same holds for realtime capturing, structuring and assessing the design rationale of product design [205]. Real-time coded aperture techniques targeting the alignment process for industrial machinery producing high resolution image data [195].
Some specialized recent contribution on assembly line improvements include the following. In [194], the authors argue that the diversity and uncertainty of data over the dimension, damage degree and remaining life characteristics of used parts make the remanufacturing process route decision more complicated, and they propose a model for finding the optimal remanufacturing route. Due to similar uncertainties of complex mechanical products, the authors of [198] suggest an assembly quality adaptive control system, in order to improve the products' assembly precision, stability and efficiency. In [197], the authors adopt a visual product architecture representation in combination with a PLM system data to support the development of a family of products. In [193], the authors introduce an efficient automation and control for a particular type of industry, the conventional cable manufacturing industry, a conventional stranding plant of which takes up approximately 300-400 m 2 of space. Last but not least, taking into account that the practice of kitting (to supply the required parts for a single assembly in pre-set containers) provides an alternative to the currently dominant practice of continuous supply linestocking, the authors of [221] analyze the value of modelbased kitting for additive manufacturing.
Several theoretical frameworks have also been proposed. Industrial machines using probabilistic Boolean networks enable the study of the relationship between machine components, their reliability and function [217]. Manufacturing systems with batches and duplications can be effectively modeled by timed event graphs and then studied using algebraic tools [214]. Time-varying properties of industrial processes can also be seen as data-driven, autoregressive models and be estimated with relevant recursive algorithms [208]. Improvements of key features of product manufacturing can be realized via weighted-coupled network-based quality control methods [203]. Petri nets modeling can augment the performance of event driven systems like intelligent part dispatching using temporal data [201]. Integrated process planning and system configuration for machining on rotary transfer machines can be effectively realized through the employment of sophisticated optimization tools [211]. Finally, automatic adaptation of assembly models can be modeled with attributed kinematic graphs [204].
6) M2M communication: Industrial M2M communication refer to direct communication between industrial networked devices using any communications channel, including wired and wireless. Emerging smart factories are envisioned to be seamlessly integrated with diverse communication technologies. Consequently, production, networking, and communication will become tightly integrated. Cooperation among different sites of a factory or even different factories will be easily possible [232]. The research emphasis on this technological enabler is put less on the large scale network optimization aspects (which are investigated in the rest of the technological enablers) and more on the device to device communication links, channels, transmissions and one hop data exchanges. The exact contributions range from the lower technological level of circuit network model design [224], up to the higher technological levels of antenna design [243], filtering [247], multiplexing [249], interference management [235] and others. Particular attention has been paid on guaranteeing the QoS of the subsequent data delivery over the communication media, through various methods, such as function splitting between delay-constrained data delivery and resource allocation [227], redundant communication schemes [245], or precise communication and network modeling [239]. Optical communications have also started penetrating the industrial sector, especially for moderate and high data rates with enhanced security (due to the spatial confinement of optical links) for both short [225] and longer ranges [252], however their full potential remains to be unlocked, as the cost of optical equipment is still high [254].
The M2M Communication configuration has a direct impact on the efficiency of the industrial network data management, and especially on specific sensitive data-related metrics. Those metrics are fundamental operatives of the I4.0 and are guaranteeing the smooth function of resource-intensive industrial applications. Some indicative examples where the impact of communication scheme is highly beneficial are the following: self-triggered sampling schemes for NCS targeting low data losses and delays [228], statistical dependences management in channel gains of industrial WSAN targeting efficient data routing [229], phase-sensitive sensing and communication targeting safety-critical data distribution [256], mmW deployments targeting large number of data hops [242], field-oriented network control decoupling targeting effective machine operation [238], and optimized cooperative multiple access techniques targeting efficient resource sharing [253].
A useful standardized recent data enabling communication mechanism is a recent extension of IEEE 802.15.4. Several studies have highlighted that the IEEE 802.15.4 communication standard presents a number of limitations such as low reliability, unbounded packet delays and no protection against interference, that prevent its adoption in applications with stringent requirements in terms of data reliability and latency [28]. For this reason, IEEE has released the 802.15.4e amendment that introduces a number of enhancements to the MAC layer of the original standard in order to overcome such limitations. Following this release, there is a constant flow of research on improving various aspects of the amendment. This part of research includes a great number of works on the M2M communication technological enabler, and more specifically concentrated on three of the main 802.15.4e MAC operation modes, Time Slotted Channel Hopping (TSCH), Deterministic and Synchronous Multi-channel Extension (DSME) and Low Latency Deterministic Network (LLDN) (for more details on the functions of those modes, the reader can consult [28]). Regarding the TSCH mode, the main research focus has been recently placed on synchronization, with some techniques using learning and prediction data from neighboring nodes [223], and other techniques using mutual synchronization of  [223], [233], [237], [244], [250] IEEE 802.11(a/n) [230], [234], [236], [240], [241], [251], [255] EtherCAT [226] CAN [222] OPC-UA [231], [257] ISA100.11a [248] WirelessHART [246] distributed nodes [237], as well as on fast network joining algorithms [250]. Regarding the DSME mode, improved network formation has been studied in [244]. Regarding the LLDN mode, significant efforts have been invested in transforming the standard compatible for ultra-low latency applications, where the critical data need to be delivered with high reliability [233]. Another widely used data enabling technology used for data management in industrial environments is the IEEE 802.11 WLAN and its various amendments. The IEEE 802.11 standard revealed effective since it is able to provide satisfactory performance for several industrial applications in which tight requirements in terms of both timeliness and reliability are encountered [230]. Specifically, the possibility of implementing ad hoc data management schemes as well as infrastructure configurations, renders it very convenient. Here the emphasis is put on several important aspects. The first aspect is seamless redundancy to improve reliability through reference architectures [255], experimental campaigns [241] and joint interference prevention [251]. The second aspect concerns soft real-time control applications where the relevant constraints are met through efficient bandwidth management [236], as well as enhanced communication determinism [240]. The third aspect is dynamic rate selection algorithms, where data is delivered within the deadlines, while transmission error is minimized [234].
Other data enabling communication technologies include: CAN with jitterless communication via stuff bits prevention [222], OPC-UA with enhanced throughput increased via RESTful architecting [231], [257], EtherCAT with very short cycle times via priority-driven swapping-based scheduling of aperiodic real-time data [226], ISA100.11a with increased reliability via adaptive channel diversity [248], WIrelessHART for harsh industrial environments [246]. Table IV displays an overview of selected references regarding specific communication technologies.
B. Data centric industrial services 1) AR / VR: There have been very few works on augmented reality (AR) and virtual reality (VR) services. Typically, those services require large volumes of video data which are processed centrally with high computational overhead. In [258], the authors introduce a context-aware augmented reality assisted maintenance system, in which industrial users can add and arrange various contents spatially, e.g., texts, images and CAD models, and specify the logical relationships between the AR contents and the maintenance contexts. The data in this system are stored in a context database of the context management module. A context sensing module acquires raw data from the users and various physical sensors in the environment, and interprets the raw data to obtain low-level contexts. The sensor interpreter obtains and interprets data from the physical sensors. For example, it processes the raw images captured by the cameras, and outputs the marker ID and transformation matrix. The data processing is conducted offline on large volumes of acquired data. In [259], the authors apply AR technologies for the improvement of occupational safety in industrial environments. The application is installed on workers' mobile devices that are used as the input and output of the system. All the necessary data are stored in a central database that is accessed by the application whenever required. The system is personalized according to skills of a worker by taking into account his professional training and work experience. Depending on that it is determined the amount of data to be displayed to a worker helping even less skilled workers to perform a task. Therefore, in this case although the data presence is localized, the data processing is distributed.
2) Camera / Vision: There have been some works which use camera and vision technologies for efficient pattern recognition, fault estimation and template matching. In [260], the authors develop a data-driven decoupling feedforward control scheme with iterative tuning to meet the challenge of the crosstalk problem in MIMO motion control systems. This scheme is data-driven in the sense that, unlike typical modelbased approaches of this field, it uses an iterative tuning which uses the available data to overcome the practical obstacles in obtaining an accurate dynamic model. The authors show that through the beneficial use of data and with only one measurement data collection, the decoupling control scheme can reduce the effect of the crosstalk with a decrease of two orders of magnitude (10 −8 → 10 −10 ). In [261], the authors present two estimator designs for WSANs in multi-target tracking under signal transmission faults due to the uncertainties in the surrounding environmental conditions. In [262], the authors describe a model-based template matching system, which is robust to undergo rotation and scaling variations. The data used as input in the system are comprised of image data, and, in fact, the authors test the system with different categories of image data, through three diverse datasets: logos and badges, image patches, and PCB components.
3) Prognostics: Prognostics engineers face various situations regarding collected data from the past, present, or future behavior, and have to come up with efficient data-driven solutions. Generally, the modeling of data-driven prognostics has to go through necessary steps of learning and testing. First, raw data are collected from machinery and are preprocessed to extract useful features to learn degradation behavior. Second, in the test phase, the learned model is used to predict future behavior and to validate model performance. An example of prognostics operations in industrial environments is systems health management, an enabling discipline that uses sensors to assess the health of systems, diagnoses anomalous behavior, and predicts the remaining useful performance over the life of the asset [268]. In [263], the authors present a new approach for feature extraction based on vibration data, targeting accurate prognostics for machinery health monitoring. The main breakthrough of the paper is the mapping of raw vibration data into monotonic features with early trends, which can be easily predicted. The data collection and processing is concentrated on central computation entities. The contribution is naturally data-driven and the authors strive for a good balance between model accuracy and complexity. Prognostics also present a widespread application in network-based industrial processes, with [264], where combined fault-tolerant and predictive control is introduced and [267], where a weighted linear dynamic system for nonlinear dynamic feature extraction is proposed. In those works, the authors try to identify the considerable redundancy and the strong correlations between data as well as to manage the random noises present at data. Other interesting data-driven industrial prognostics applications include [265], which presents an extended prediction self-adaptive controller employing graphical programming of industrial devices for controlling fast processes, and [266], which investigates fault prediction of power converters in industrial power conversion systems.

4) Anomalies Detection:
Considering the aspect of data management, current anomalies detection approaches are either centralized and complicated or restricted due to strict assumptions, a fact that renders them difficult to apply on practical large scale networked industrial systems. The accommodation of high rates of data capture and total data volume generated by complex WSANs that typically monitor industrial systems pose one of the main challenges for online anomalies detection. The paper [271] outlines such centralized data-driven systems for anomalies detection for ICPS using several use cases from industry. Based on data, these systems extract most necessary knowledge about the diagnosis task. Another ICPS-enabled work is [273], in which the authors present an anomaly detection approach for ICPS based on zone partitioning. Additionally, in [272], an online two-dimensional changepoint detection algorithm for sensor-based anomalies detection is proposed. Interestingly enough, in [269], the authors introduce a distributed general anomaly detection scheme, which uses graph theory and exploits spatiotemporal correlations of physical processes to carry out real-time anomaly detection for large scale networked industrial sensing systems. Finally, in [270], a work of different flavor, the authors display the concept of early problem identification in collaborative engineering with different product data modeling standards.
5) Fault diagnosis: Fault detection, isolation and reconstruction methods are essential to improve the reliability, safety of the automatic control systems. In [274], the authors develop a model-based fault location method is developed for intermittent connection problems on controller area networks. In this type of networks time critical data are transmitted, hence, the reliability of the network not only has a direct impact on the system performance but also affects the safety of the system operations. In [275], the authors introduce a condition monitoring and fault diagnosis scheme of electric motors for harsh industrial applications. The authors also note that for a real implementation in industry, since the proposed scheme assumes prior knowledge of various data in a motor current spectrum, small additional memory might be required to implement the proposed method. Also sufficient bandwidth of data acquisition is required, particularly for high-frequency signal detection. In [276], the authors discuss some basic properties of the failure rate of redundant reliability systems in industrial electronics applications. They note that the the problem of reliability evaluation of the single components is data related and is not an easy matter, and this is exactly in view of the scarcity of failure data. In [277], the authors design a fault isolation technique based on the k-nearest neighbor rule for industrial processes. A notable data related remark on this paper is that the technique focuses on the problem of isolating sensor faults only based on the normal data, without any fault information. In [278], a reconstruction-based method is proposed to monitor nonlinear industrial processes and isolate their fault types. This method includes numerous data operations (such as normal data decomposition and faulty data decomposition), and In the experimental section, monitoring data of an electro-fused magnesia furnace is used to show its effectiveness. In [279], the authors suggest a component analysis algorithm for fault monitoring in industrial processes, and in [280] a threshold-free error detection scheme for WSANs. Various data oriented techniques are used by the authors, such as exploitation of the information related to the spatial and temporal relationships among sensor data streams, data correlations and mapping of residual data streams.
6) Multi-Agent Systems: Multi-agent systems have been presented as a suitable service to develop modular, flexible, robust, and adaptive large-scale production lines. However, the classical multi-agent systems are defined by a static hierarchy of data structures, which makes them very difficult to modify [282]. For example, in [283], the authors present a software platform structured around a central data repository, containing engineering data and information from ongoing and completed line design projects. The central data repository is used by software agents that allowed the seamless update and use of engineering data. Also, in [284], the authors investigate the tracking control problem of networked multi-agent systems centrally with multiple delays and new characterizations of impulses. Many of the recent works focus on the decentralization of industrial functions and data distribution over a community of distributed, autonomous, and cooperative agents. The application of distributed agent data and services allows the achievement of important features, namely modularity, flexibility, robustness, adaptability, reconfigurability, and responsiveness [288]. Some recent ones are the following. In [281], the authors develop a multi-agent system for process and quality control in a laundry washing machines factory. They construct an agentification of the factory's production line and distribute the various types of data among different kinds of agents. In [285], the authors model manufacturing machines as agents, which can collect production data and distributively control the machines. Giving them self-organization capability, machines can be reconfigured for different tasks to achieve the highest resource efficiency. Manufacturing processes are monitored and adjusted by the self-adaptive model when exceptions occur. In [286], the authors propose the modeling and synthesis procedures to obtain optimal decentralized industrial controllers in state-feedback form for distributed agents. [287], presents a multi-agent method for industrial process integration implementing coordination optimization mechanisms that enable distributed agent data exchanges, by using cultural algorithms. In [289], the authors introduce noncooperative agents which make decisions based on the capacity allocation and the data of all other agents, thus creating a decentralized feedback loop.
7) Decision Making: The integration of ubiquitous sensing capabilities of IIoT with the industrial infrastructure of I4.0 can enable the automation of the decision making process inside and outside the shop-floor. The data collected by IIoT systems in smart industries can be used to replace manual employee evaluation systems where there are ample chances of bias. In [291], the authors develop a large-scale datadriven multitask learning and decision-making system, which can quickly coordinate machine actions online for large-scale custom manufacturing tasks. In [292], the authors present a self-organized system with data based feedback, coordination and improved decision making ability. In [290], the authors propose a model for automated performance evaluation of employees in a smart industry. The model uses the data collected by embedded sensors in smart industrial system to identify various industrial activities of employees. The identified activities are then classified as positive, negative and neutral activities. Here the word "decision" refers to the action taken in response to the performance of employees. The proposed model consists of an IIoT network, an information processing system and a central database system. The data collected by the IIoT network are stored in the database and used by the information processing system to infer the useful requested results. Another interesting data enabling entity in this paper is the data conversion block, which is used to classify a particular activity into positive, negative or neutral and to calculate the amount of profit or loss corresponding to positive or negative activity respectively. Finally, a decision making block is automatizing the decision making process using game theoretical tools. 8) Job Scheduling: Job scheduling has been traditionally considered as a core field in the manufacturing research area. The field spans from the single machine scheduling problem which is the simplest type of industrial scheduling problem, to multiple machine scheduling, and even multiple assembly lines scheduling or even inter-factory job scheduling. Examples of single machine scheduling are [293], where nested partitioning-based integration of process planning and scheduling in flexible manufacturing environment is introduced, [298], where the authors study the single machine scheduling problem with deadlines where the processing times are described by uncertain variables with known uncertainty distributions, and [295], where the recovery policy of job-shop manufacturing systems is evaluated. Also in [296], the authors propose a software composition method for automated machines that exploits their mechatronic modularity, and they demonstrate that desired behavior of a certain class of machines can be composed of behaviors of its mechatronic components, including nonlinear process monitoring optimal operational indices selection locally weighted learning radial basis function networks recursive slow feature analysis M2M communication fully decentralized scheduling and operation control. Multiple machine job scheduling has been presented in [294], where the authors address the problem of scheduling multi-robot cells with residency constraints and multiple part types, in [299], where the authors consider the serial batching scheduling problem in which a group of machines can process multiple jobs continuously to reduce the processing times of the second and subsequent jobs, and in [300], where the authors study a twomachine scheduling problem in fuzzy environments. Multiple assembly lined scheduling is presented in [301], where the authors investigate robust order scheduling problems in the fashion industry by considering the preproduction events and the uncertainties in the daily production quantity. Inter factory scheduling is presented in [297], where production planning with remanufacturing and back-ordering is discussed, in which there are multiple factories in a cooperative relationship to produce new or remanufactured products.

9) Machine
Learning: Machine learning services are by definition data-driven and are used on top of the technological enablers in order to further enhance industrial applications. An outline of the recent industrial machine learning services and the corresponding technical methods used is displayed in Table V. For the IIoT technologies, emphasis has been put on data-driven schemes for predicting the missing QoS values for the IIoT based on kernel least mean square algorithms [302] and on intelligent IIoT traffic classification using search strategies for fast-based-correlation feature selection [310]. WSANs benefit from the exposition of features for sensing that provide high-accuracy measurements for reducing the required manufacturing precision (capacitive displacement sensing in [305]). Machine learning is also beneficial for industrial robot enablers, for example with iterative learning procedures with reinforcement for high-accuracy force tracking in robotized tasks [311]. Applications in the assembly line focus on process modeling and include data-based methods for automatically selecting optimal operational indices for unit processes in an industrial plant using measured data (without knowing dynamical models of the unit process) [303], data-driven approaches for nonlinear process monitoring under the framework of locally weighted learning [304], using radial basis function networks [313], as well as adaptive process monitoring and fault diagnosis through recursive slow feature analysis [314].
Data classification is an active research problem in the industrial data mining and machine learning communities and spreads horizontally over all technological enablers [306]. Deep learning, as one of the most important tools of current industrial computational intelligence, achieves high performance in predicting numerous parameters and attributes of industrial applications. However, it is a nontrivial task to train a deep learning model efficiently since the deep learning model often includes a great number of parameters. In [307], the authors introduce an efficient deep learning model to predict cloud virtual machines workload for industrial NCS deployments. In [308], the authors employ deep learning of semisupervised process data with a hierarchical extreme learning machine on a soft sensor industrial application. Spatiotemporal features from sensors can also be learnt through deep neural networks [309]. In [312], the authors propose a deep learning network to learn features adaptively from raw mechanical data without prior knowledge.

10) Big Data Analytics:
The enormous amount of real-time data is used for the analysis of various industrial applications has led to a trend in I4.0 environments pointing to the use of big-data as a relevant element in the development of next generation industrial systems. Big data analytics offer many opportunities to evaluate data in all layers of the industrial installations, for example, to identify preferences from endusers, to better understand technological enablers' behaviors, or to relate issues derived from a combined and statistical processing of data. The common trend in many current industrial applications is to transfer IIoT data from the physical locations where they are generated to some global cloud platform, where knowledge is extracted from raw data and used to support IIoT applications. Moreover, as [319] notes, several big data processes (such as deep learning) require expensive computational resources including high performance computing units and large memory to train a deep computation model with a large number of parameters, limiting its effectiveness and efficiency for industry informatics big data feature learning. Consequently, real-time delay constraints might require that data elaboration or storage is performed at the edge, i.e., close to where it is needed, rather than in remote data centers. However, there are concerns whether this approach will be sustainable in the long run. For this reason, decentralized generic big data framework for industrial edge deployments like the one displayed in Fig. 11, as they is envisioned in recent approaches, such as [323], [317] and [318], are becoming more and more common. It is visible that the I4.0 trends push towards computation decentralization mainly from the standpoint of data ownership, as well as wireless network capacity.
Some representative examples of this computation decentralization and of maintaining the data at the edge for distributed operations are the following. In [315], the authors design and test a real-time big data gathering algorithm based on indoor WSANs for risk analysis of industrial operations. In [316], the authors show different approaches that a classical manufacturing systems company can take into account when applying data mining techniques to address the requirements which come with the IIoT technological enabler. In [318], a distributed and parallel big data analytics system for modeling and monitoring large-scale plant-wide processes is introduced. In [320], the authors explore the development of an industrial big data implementation able to improve computing performance by splitting the analytic into different segments that may be processed by the engine in parallel using a hierarchical model. Of course, there are also hybrid big data approaches which employ two kinds of computation and data communication: both localized real-time processing and global offline computations. In [317], a manufacturing big data solution for active preventive maintenance in manufacturing environments is implemented. Another hybrid approach is [321] which introduces a concentric computing model paradigm composed

Computation and data analytics Articles
Concentrated (cloud / offline) [319], [322], [324] Distributed (edge / real-time) [315], [316], [318], [320] Hybrid [317], [321], [323] of sensing systems, outer and inner gateway processors, and central processors for the deployment of big data analytics applications in IIoT. In [323], the authors analyze the relationship between the data processing and the energy consumption through investigating the content correlation of the captured data. Traditional centralized approaches are presented in [322], where the authors develop a big data toolbox for manufacturing prediction tasks to bridge the gap between machine learning research and concrete industrial requirements, and in [324], where the authors use big data services in order to design a new method for product design, manufacturing, and service driven by digital twin. Table VI displays the extent of centrality that the various recent approaches have adopted, in terms of computation for big data analytics. 11) Ontologies / Semantics: In industrial automation, ontology services encompass a representation, formal naming, and definition of the categories, properties, and relations between the data and entities that substantiate various industrial processes. This will lead to the further automation of many tasks in the life cycle of the industrial systems from design to commissioning and operation [342]. Those services frequently rely on synergies of industrial standards, such as IEC 61850 [336] and IEC 61499 [339], which are used to represent specifications and resulting software models. Due to the fact that semantic data modeling usually deals with data irregularity and diversity, sophisticated dynamic modeling methods have been derived [338]. With regards to IIoT and ICPS, OPC-UA and semantic web technologies are able to achieve integration at various levels [345]. UML-based approaches can fully automate the generation process of the IIoT-compliant layer that is required for the cyber-physical components to be effectively integrated in the shop-floor [333]. In order to achieve rapid response to changes from both high-level control systems and plant environment, self-manageable ontological agents can improve flexibility and interoperability [337] and automate the process engineering using a knowledge-based assistance system [341]. IIoT gateways have already been integrated with dynamic and flexible rule-based control strategies [344]. Model-driven NCS enable increased usability [325] and model checking [326]. In the assembly line, knowledge based ontology services can assist complementary content customization [327], mechanical design knowledge [328], and semantic web service composition [329]. Recognition, semantic annotation and calculating the spatial relationships of a factory's digital facilities [330], as well as the model based synthesis of its automation functionalities [331] are other emerging topics of interest. Ontology services also come covert attack for service degradation quantification of the impact of cyberattacks on the physical part legal aspects blockchain-based remote user authentication with fine-grained access control certificateless searchable public key encryption with multiple keywords WSAN [354] intercept behavior in the presence of an eavesdropping attacker NCS [353] [355] [357] energy efficient intrusion detection lightweight secure authentication mechanism for broadcast mode communication dynamic cybersecurity risk assessment Industrial Robots -Assembly Line -M2M communication [363] [362] application-layer traffic filtering sensor-cloud trust-based communication handy in cloud manufacturing and take advantage of semantic links to enable automated integrating and distributed updating in resource service clouds [335]. Ontology services can also support the development of global production network systems [332] and business integration [343] in a more general sense, as well as CAD assembly model retrieval (using multi-source semantics information and weighted bipartite graph [340]) and visual exploration systems [334].

12) Human-in-the-loop:
Human-in-the-loop services, will be an indispensable component of most I4.0 approaches and applications related to the large scale ICPS and assembly line networked environments. This is because large and complex industrial environments necessitate advanced planning and scheduling, careful coordination, efficient communication and reliable activity monitoring, ingredients essential for productivity and safety purposes. A notable relevant area of interest to the researchers recently is human tracking and localization in the industrial facilities. There is a diverse variety of approaches in this field, in terms of generated and used volumes of data. In [346], the authors propose an approach that leverages the inertial sensors embedded in smartphones, uses WiFi fingerprints based on the angle-of-arrival and exploits the ubiquitous presence of diverse data to assist in human localization, thus utilizing data of small volumes. Similarly, in [347], the authors propose a real-time system for human body motion sensing with special focus on joint body localization and fall detection. The proposed system continuously monitors and processes ambient data propagated by industry-compliant radio devices through supporting M2M communication functions. In [349], the authors propose a positioning system for tracking people in highly dynamic industrial environments, such as construction sites. The proposed system leverages the existing CCTV camera infrastructure installed in the industrial environment, along with radio and inertial sensors within each worker's smartphone to accurately track multiple people. Consequently, in this case the data's volume varies according to the data generation source. Even larger volumes of data are used in [351], where the authors employ video analytics in order to implement motion detection framework through motion blobs and successfully provide a features-based person tracking system. Other human-in-the-loop concepts are mobile apps developed to support the customer integration in the product design phase and subsequently the design of the manufacturing network [348], cross-disciplinary mobile crowdsensing of pervasive sensor data applied in industrial processes [350], as well as automated methodologies for worker path generation and safety assessment [352].
13) Security: Security aspects in factory automation and industrial operations have become a hot topic in the last years since monitoring and control tasks are more and more complex. Also, ICPS are vulnerable to external attacks due to the tight integration of cyber and physical parts. In fact, security incidents such as targeted distributed denial of service (DDoS) attacks on power grids and hacking of factory NCS are on the increase [359]. Data management in such systems is crucial, as the increased scalability of the deployments can frustrate effective management of security risks, partly due to the complexity of managing the large volumes of data and risks manifesting across interdependent systems. Security has been recently studied across most of the technological enablers presented in this article. Table VII displays the services that have been presented for security provisioning across the different technologies. In [356], a covert attack for service degradation of ICPS is proposed, which is planned based on the intelligence gathered by another system identification attack. In [358], a risk assessment method is presented targeting the quantification of the impact of cyberattacks on the physical part of ICPS. The proposed method helps carry out appropriate attack mitigation measures. In [360], the authors establish a secure remote user authentication with finegrained access control for IIoT, by proposing a blockchainbased framework. The proposed framework leverages the underpinning characteristics of blockchain as well as several cryptographic materials to realize a decentralized, privacypreserving solution. In [361], the authors design a secure channel-free certificateless searchable public key encryption with multiple keywords scheme for IIoT. In [354], the authors study the intercept behavior of an industrial WSAN consisting of a sink node and multiple sensors in the presence of an eavesdropping attacker, where the sensors wirelessly transmit their sensed data. In [353], the authors present an energy efficient intrusion detection and mitigation system for NCS security. The system is data oriented in the sense that it employs data-based selective encryption to reduce energy consumption, and to detect when an attack starts and ends. In [355], the authors present a lightweight secure authentication mechanism for broadcast mode communication in NCS. In [357], a fuzzy probability bayesian network approach for dynamic cybersecurity risk assessment in NCS is proposed. In [363], the authors present a performance model for industrial M2M communication, able to perform advanced applicationlayer filtering of traffic generated by protocols widely used in industrial deployments (Modbus/TCP). In [362], the authors investigate trust-based communication for industrial deployments, devoting attention to sensor-cloud communication. They propose three types of trust-based M2M communication mechanisms for sensor-cloud. Furthermore, with numerical results, they show that trust-based communication can greatly enhance the performance of sensor-cloud. 14) Energy Management: Energy management for the IIoT and WSANs has naturally received significant attention, as in many cases the devices operate on limited battery supplies (Table VIII). On the IIoT part, there have been energy efficient improvements on QoS-aware services composition [372] (similarly for the ICPS [386]), robust authentication protocols [392], routing and data collection [393], [394], as well as resource allocation and utilization [395] (similarly for the ICPS [398]). On the backbone of the IIoT networks, in the cases where Ethernet is used as an enabler, energy efficiency has also been a timely topic [375]. Specifically, in [366], the authors investigate the IEEE 802.3az amendment, known as Energy Efficient Ethernet (EEE) and address its application to Real-Time Ethernet (RTE) networks in factory automation. Additionally, in [367], the same authors expose some data service aspects of the EEE/RTE interplay.
On the WSAN part energy efficiency is focused on specific data intensive operations. Industrial low power WSAN protocols are one of the key enablers of that revolution but still energy consumption is what is limiting ubiquitous deployments of perpetual and unattended devices [370]. Realtime usage data as well as historical data can help identify whether various WSAN components are functioning properly [379]. Routing and data collection is traditionally assisted energetically, either through joint data transmission and wireless charging [369], or through adjustable data sampling rates [396] and distributed and collaborative sleep scheduling [377]. Other energy efficient approaches include integrity check in the network [373], node localization [376], data loss minimization [378], and connected target coverage [381]. Energy efficient approaches for WSANs of particular interest with respect to the data management mechanisms employed are the following: In [371], the authors apply compressed sensing in order to break the redundant data collection (and thus save significant amounts of energy), by differentiating the available sensed data in principal and redundant, through an online learning component and a local control component. In [374], the authors derive both global and local data storing in the WSAN, and expose the inherent difficulties of each case (data importance degrees definition and data stream reading ability).
Energy optimization of industrial robotic cells and assembly lines is also essential for sustainable production in the long term. A holistic approach that considers a robotic cell as a

Data enabling technology Articles on energy management
IIoT / ICPS [372], [386], [392], [393], [366], [367], [375], [394], [395], [398] WSAN [369]- [371], [377], [396], [373], [374], [376], [378], [379], [381] NCS -Industrial Robots [384] Assembly Line [62], [364], [365], [383], [385], [387], [6], [388]- [391], [399] M2M Communication [368], [397] whole toward minimizing energy consumption is proposed in [384]. Dynamic low-power reconfiguration [364] and machine energy consumption minimization [365] are key objectives of novel assembly lines. In [62], the authors discuss how dynamic energy management in manufacturing systems can not only solve the current technical issues in manufacturing, but can also aid in the integration of additional energy equipment into energy systems. The significantly important role of data in this process is demonstrated in [383] where the collected data are shown to improve energy consumption awareness and allows the manufacturing energy management systems to make further analysis and to identify where to take actions in the manufacturing process in order to reduce the energy consumption. There have been several energy management and energy consumption optimization methods for the assembly line in the recent literature, with the most notable focusing on production control [385], forecasting models with neural networks [387], mobile service composition [388], real-time demand bidding [389], ontological modeling [390], process parameter modeling [391], machine energy consumption profiling [6], and concurrent energy data collection [399]. Methodologies and a models which reliably dimension energy scavenger properties to M2M communication requirements and network needs, allowing industries to optimize the adoption of that technologies while keeping technical risks low [368]. MAC layer power management schemes which achieves the user specified reliability with minimal power consumption at the node are also of interest to the M2M communication community [397]. Interestingly enough, there no significant contributions on energy management issues have been found for the data enabling technology of NCS.
15) Cloud: Cloud manufacturing has lately gained a fair share of attention from the automation and manufacturing communities. Cloud manufacturing transforms manufacturing resources, capabilities and data into manufacturing services, which can be managed and operated in an intelligent and unified way to enable the full sharing and circulating of manufacturing resources and manufacturing capabilities. Cloud services in the supply chain can greatly reduce time and costs incurred in deploying automation systems, which are quite complex and require large human effort to build [410]. Cloud manufacturing can be divided into two categories. The first category concerns deploying manufacturing software on local or global clouds, i.e., a "manufacturing version" of cloud computing. The second category has a broader scope,  [409] manufacturing resources global Assembly Line [402] manufacturing services Assembly Line [403] shared memories WSAN [401] mobile network nodes NCS [406] network services NCS [405] virtual resources hybrid IIoT [407] network devices local cutting across production, management, design and engineering abilities in a manufacturing business. Unlike with classic computing and data storage, manufacturing involves physical equipment, monitors, materials and so on. In this kind of cloud manufacturing, both material and non-material facilities are implemented on the cloud, in order to support the whole supply chain. The great majority of recent works can be classified in the first category. Cloud manufacturing solutions can be categorized according to the locality of the cloud. In the vast majority of the recent literature the cloud infrastructure is centrally placed, with large public clouds delivering data usually over the internet. In Table IX, the types of data sources and cloud locality in cloud manufacturing are displayed. As shown in the table, a large portion of works employ global clouds. In [400], the authors target manufacturing resource composition and propose an approach that can better cope with the temporal relationship between the resource services in a business process. In [404], the authors design a cloud resource sharing based on the Gale-Shapley algorithm and analyze it in the context of fluctuating resource supply and demand. In [408], the authors present an agent-adapterbased method of for manufacturing clouds to enable manufacturing with various physically connected machines from geographically distributed locations over the Internet. In [409], the authors suggest a multi-granularity resource virtualization and sharing method for cloud manufacturing. In [402], the authors introduce service clustering network-based service composition. In this approach, services are first clustered into abstract services, and then a clustering network of the abstract services is established. In [403], the authors design an effective load-adjusted allocation algorithm for enhancing memory reusability and improving the performance of servers by balancing their workloads. In [401], the authors consider industrial WSAN with mobile nodes and propose a fixedpath mobile node handover strategy, assisted by cloud services and an ants-colony algorithm. In [406], the authors propose a cloud-based decision support system for self-healing in distributed automation systems using fault tree analysis. Some fewer recent works employ hybrid or local clouds. In [405], the authors study the problem of how to maximize the profit of a local (private) cloud in architectures of a combination of local and global (hybrid) clouds while guaranteeing the service delay bound of delay-tolerant tasks. In [407], the authors suggest an embedded cloud database service method for distributed IIoT monitoring.

VI. OPEN RESEARCH CHALLENGES
In this section, we identify some open research challenges on data management in industrial networked environments and their inherent tradeoffs. Subsequently, we focus our attention on a wide variety of thematic topics pertaining to the requirements of data management, as presented in the previous sections. These notes provide crisp insights for the design of future data management applications.

A. Energy efficient data delivery with small delays
Ensuring energy efficient, low-latency data delivery in industrial networked environments is of capital importance and is currently receiving more and more attention in academia and industry. However, in current industrial configurations, the computation of the data exchange and distribution schedules is quite primitive and highly centralized. Usually, the generated data are transferred to a central network controller using wireless or wired links. The controller analyzes the received information and, if needed, reconfigures the network paths and the data forwarding mechanisms, and changes the behavior of the physical environment through actuator devices. Traditional data distribution schemes are usually implemented over relevant industrial protocols and standards, like WirelessHART, 802.15.4e and 6TiSCH. Those entirely centralized and offline computations regarding data distribution scheduling, can become inefficient in terms of end to end latency. Additionally, in industrial environments, the topology and connectivity of the network may vary due to link and sensor-node failures. Also, very dynamic conditions, which make communication performance much different from when the central schedule was computed, possibly causing sub-optimal performance, may result in not guaranteeing energy requirements. These dynamic network topologies may cause a portion of industrial nodes to malfunction. With the increasing number of involved battery-powered devices, industrial networks may consume substantial amounts of energy; more than needed if local, distributed computations were used. In order to address those emerging challenges of the I4.0, novel data management layers have to be engineered over the device and networking planes of the industrial deployments. Those layers have to operate independently from and to complement the routing process, targeting at distributing the data in the networks in a decentralized manner, while at the same time respecting the strict I4.0 requirements. In fact, not all data need to be transferred to central network controllers prior to delivery to the data consumers (as traditional industrial routing approaches usually impose); in fact, data can be also stored managed locally at selected data cache nodes (Fig. 12), exploiting, when needed, additional levels of information.

B. Data distribution in local and mobile clouds
As shown in Table IX the most common current approach for collecting and processing large volumes of data for cloud manufacturing purposes is based on the assumption that some network infrastructure is able to support the collection and delivery of all these data toward the cloud, which is intended to be the back-end aimed at processing and getting value from such data. In general IIoT/ICPS environments, this backbone is a wideband cellular network such as LTE. In the case of manufacturing environments this may also be the case, or more localized wideband infrastructures such as WiFi may be used. In any case, an approach relying exclusively on global cloud providers to provide holistic industrial data services has limitations from two main standpoints. On the one hand, wideband wireless networks may not provide sufficient bandwidth so support the data traffic demand. On the other hand, relying only on global clouds deployed may make manufacturing stakeholders to loose control on their data, as data will be transferred to data centers without any control of the data owner. In addition, meeting the manufacturing stakeholders requirements in terms of storage and computation capacity may have a significant impact on the cost incurred by the stakeholders for ICT services, which, if reduced, could be more profitably invested in the core production process. In order to overcome these issues there is a need of a paradigm shift in the way the gathered data is managed and processed. To this end, the employment of local and mobile cloud technologies as a way to implement a multi-layer cloud infrastructure would be necessary (Fig. 13). This will enable the exploitation of not only global cloud services, but also local resources available at the stationary and mobile devices of the industrial deployments. In such environment, a number of mobile devices (e.g. the devices of various operators working at the manufacturing premises) are available, and typically their computation and storage resources are underutilized. Instead of relying exclusively on storage and computation services provided by a global cloud provider, the storage and the computation tasks can be distributed among those local devices, that will therefore form a local (and in some cases mobile) cloud. In this paradigm, global cloud services can be used only when (i) global information is needed in order to better analyze the status of the production process, or (ii) local resources are saturated and additional capacity is needed. For example, storage available at local devices would be enough only for storing information about parts produced in a limited time window in the past. Older data may be stored on a global cloud storage service, possibly in an encrypted form. However, data related to most recently produced parts would still be available locally, and could be accessed without transferring back and forth them between local devices and global cloud data centers. The resulting solution will be a multi-layer cloud platform, whereby global resources and local resources will be used elastically and in a synergic way, depending on the need of the virtual metrology service.
C. Distributed, real-time data security for industrial robots and assembly line As shown in Table VII, there is a lot of work already implemented in terms of data security for IIoT/ICPS, WSANs, NCS and M2M Communication. However, the absence of security mechanisms for the technological enablers of the assembly line and the industrial robots is notable. More than that, the decentralization of the production process, the integration with IIoT technologies (the nature of which makes them vulnerable) and the introduction of open and ubiquitous data, leaves the assembly lines and robots further exposed to external threats. To date, security has not been a concern for the (in many cases legacy) assembly lines and industrial robots. Yet, practitioners have recognized that the open and uncontrollable nature of the M2M communication enabler opens these systems to a variety of possible security threats and vulnerabilities. Security solutions will also need to be operated in a distributed manner, because centralized solutions require transmitting data to the central controller, which may result in data loss and delay to the threat detection decisions, particularly in large-scale deployments. In contrast, distributed solutions are much more agile and robust to data transmission failures and, more importantly, scale to larger sizes. For example, industrial anomaly detection for malicious attacks (e.g., false data injection) can be performed either at the central controller or at local distributed devices [269]. Finally, following the same example, since real-time information is critical and even a single abnormal security behavior may lead to a catastrophic cascade of failures throughout the whole system, abnormalities should be detected as early as possible to minimize the possibility of potential damage. To achieve this, real-time data security solutions will be able to provide online threat detection is needed. Those solutions should be able to identify the anomaly condition of each observation, as soon as the local data observations are collected.

D. Convergence between industrial / automation / manufacturing and communication / networking / computation
NCS currently provide deterministic services for the assembly line and the industrial robots, while the IIoT and the WSANs provide best effort services for the entire automation pyramid. Also, as it was demonstrated in Table III, the recent architectural trends for assembly line and industrial robot installments are focusing on centralized data management, while the trends for IIoT and WSANs are pushing towards decentralization, mostly due to the emerging data ubiquity. It has already been argued that a convergence should occur, and that future converged industrial deployments should support both best effort and deterministic services, with very low latency and jitter [78]. This convergence is motivated even more and will be further extended with the pervasiveness and the variety of different data sources in the shop-floor. Consequently, industrial automation providers face a challenge and can significantly benefit from communication/networking technologies and services. If they are not able to find powerful and flexible computing services that would enable them to store and process "as required" the manufacturing information they have generated, they will never be able to leverage on faster and more complete control of the production process in the digital domain to gain a competitive advantage. If they remain to perform the analysis as they currently have to perform, i.e. on the physical domain, they will continue suffering a negative impact on production yield and costs.

VII. CONCLUSIONS
In this survey article we reviewed the recent literature (2015-2018) on data management as it applies to networked industrial environments. Of particular interest to our review have been the data enabling technologies and the data centric services that both the Communications/Networking/Computation field and the Industrial/Manufacturing/Automation field are providing, in order to boost the production performance and address the emerging I4.0 requirements. We focused the survey at first on recent practical use cases and emerging architectural trends, where we made a note on the convergence that should occur between the two scientific fields, so as to enable an efficient future data management approach. Then, we performed an exhaustive survey on the most relevant and acclaimed research journals and came up with a taxonomy of the recent works in technologies and services. Finally, after this holistic research, we identified several interesting open challenges for the future; energy efficient data delivery with small delays, data distribution in local and mobile clouds, distributed, realtime data security for industrial robots and assembly line, and convergence between the two main scientific fields.