Data Fusion for Intelligent Crowd Monitoring and Management Systems: A Survey

Intelligent Crowd Monitoring and Management Systems (ICMMSs) have become effective resources for strengthening safety and security along with enhancing early-warning capabilities to manage emergencies in crowded situations of smart cities and massive gatherings events. The main advantage of such systems is their ability to detect multiple features associated with the crowd gathering, as they enable multi-source sensors, multi-modal data, and powerful intelligent and analytical methods. Unlike traditional crowd monitoring systems, which make use of simplex forms of different data types, data and information associated with crowded scenarios can be collected, fused, processed and analyzed in large quantities for accurate global assessment and enhanced decision making processes in an ICMMS. Therefore, data fusion is introduced as an enabler to decrease data quantity, reduce data dimensions, and improve data quality. In this paper, we first survey the literature on data fusion application in crowd monitoring systems as we are developing a state-of-the-art ICMMS with data fusion as a major platform enabler. Next, we discuss some popular data fusion architectures and classifications from different perspectives. Based on this, we propose a multi-sensor, multi-modal, and dimensional ICMMS architecture based on data fusion. Then, we identify the data fusion processes in the ICMMS and classify them into sensor fusion, feature-based data fusion, and decision fusion. Relevant algorithms, applications and examples of three classes are elaborated. Finally, future data fusion research directions are discussed.


I. INTRODUCTION
With the development of sensor technology, communication technology, and big data science [1], smart city-oriented intelligent applications [2] have become important services in human life. People's living standard has increased with improved infrastructure and intelligent applications such as smart home furnishing, smart building, VR/AR experiences. With the increasing expansion and prosperity of urban business zones, some people choose to go shopping and seek entertainment in business zones. These large central business zones have become representative of the city image and are the zones with the most economic vitality. Besides, there are also different scales of crowd gatherings for special events The associate editor coordinating the review of this manuscript and approving it for publication was Zhibo Wang . such as theater performance, concerts, music festivals, religious events, etc. These events hide serious potential safety hazards and crowd management challenges, including stampedes, abnormal behaviors [3], and abnormal incidents (fire, adverse weather, poisonous gas, and explosions). Once an emergency happens where crowds are gathered, risks are enlarged. Targeted emergency response, treatment methods and rescue measures take place in a timely manner.
Crowd monitoring is a way to guarantee crowd safety. The main function of crowd monitoring technology is to acquire important information, such as crowd density and the number of people. By estimating the number of people, the degree of crowd gathering can be judged for accurate and effective management and planning at a monitored site [4].
Traditional crowd monitoring systems (CMSs) depend on vision-based monitoring technology [5], the most commonly VOLUME 9, 2021 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ used method of which is closed-circuit television (CCTV) monitoring. CCTV is an image communication system [6] that can transmit video flow from specific areas and broadcast videos to fixed-loop devices. In other words, the signal can be transmitted from the data source to a prearranged specific broadcast device connected to the source. With the development of monitoring technology, vision-based crowd monitoring technology has experienced the following development stages: 1) in ''one-to-one'' monitoring, the monitor corresponds to CCTV one-by-one. Devices in this mode are fixed and inflexible; 2) in circuit switching for monitoring, wiring and operation are complex, while network expansion performance is poor; and 3) in multimedia monitoring, the video can be switched smoothly while the visualization can be well controlled. At present, the widely used CCTV video monitoring system has the following obvious disadvantages: 1) the type of sensors is simplex. Usually, vision-based monitoring technology only uses video or image data, which means that some ground environment data (such as temperature, humidity, gas, and sound waves) are omitted. Simplex data result in uncertain analysis; 2) artificial subjective analysis is required. The subjective artificial judgment and assessment of the video flow data collected by the camera and monitor consume too much manpower. Quantitative analysis cannot be made on important indexes including crowd density grade, crowd flow speed, people counting, or abnormal incidents; and 3) there are video surveillance blind areas. Usually, to avoid disturbing crowd activities when capturing crowd situations in a certain zone, the monitoring camera is installed in a high position. However, when the crowd condition of a certain block needs to be captured, the fixed installation and resolution of camera make it difficult. Besides, in cases of obstacles or bad weather in the monitored area, a camera cannot clearly capture abnormal behaviors or crowd incidents. Sensors on the ground (ultrasonic, temperature, humidity, and smoke sensors) and in the air (far-infrared and near-infrared cameras, UAV-based LiDAR and cameras) can be effective for monitoring blind areas [7]. 4) There is insufficient timeliness and intelligence for decision-making. Some abnormal incidents, such as crowding, trampling, fights, fire, hail, and violence attacks, depend on real-time monitoring and a high-efficiency crowd evacuation and management mechanism using artificial intelligence (AI) and communication technologies. Thus, diversified sensors, deep data analysis, and high-speed decision transmission are core parts of CMS [8].
Therefore, intelligent crowd monitoring and management systems (ICMMSs) are effective means to strengthen public security, innovate social governance, raise the management level, and improve the ability of early emergency warnings to recognize and analyze behaviors of crowds in different areas intelligently. ICMMS is also an important part for developing a smart city.
However, as mentioned above, there are many isometric sensors in an ICMMS. This means the system analyzes a large quantity of multi-source and multi-modal data deeply and accurately in real time making an effective and intelligent decision in a short time. It is especially important to process and analyze data in such an autonomous system. Data fusion is a kind of information fusion technology that associates, correlates, and combines the information from multiple sensor to obtain more timely and accurate decision-making support. Since the 1970s, data fusion has been widely used in many fields, including automatic manufacture, battle field commanding, resource management, and smart cities. From low-level data collection to high-level services, data fusion offers feasible and high-efficiency support for deep fusion and mining massive multi-source data in heterogeneous networks.
The remainder of this paper is organized as follows: Section II introduces popular autonomous applications based on data fusion that have significant application in ICMMSs. Section III introduces commonly used data fusion architectures and technology classifications. Section IV proposes our data fusion-based, multi-sensor, multi-modal, and multi-dimensional ICMMS architecture, as well as describing its requirements and challenges. Section V describes the detailed technologies and some practical algorithms of sensor fusion, feature-level data fusion, and decision fusion in ICMMS. Section VI introduces open issues and future research directions for data fusion. Section VII summarizes the paper.

II. DATA FUSION FOR AUTONOMOUS APPLICATIONS
At present, data fusion is widely used in various intelligent applications, and studies on it have emerged one after another. In the field of automation, multi-sensor data fusion has become a potential technical support for smart phones, portable devices, Internet of Things in large-scale fine operation and intelligent decision-making services. Starting from ICMMS applications, this paper discusses data fusion in cutting-edge automation applications that can be parts of ICMMSs, as shown in Table 1.
• Robotics and industrial control: Industrial control is very important for Industry 4.0. For applications such as robots [12] and smart factories, which need powerful control systems, data fusion can help to receive signals and data [9] from multi-source sensors (vision [11], force, touch, etc.), and analyze and identify them [10]. According to the decision-making action, the mechanical device can be guided to carry out subsequent precision operation.
• Unmanned vehicles and autonomous driving: Traditional vehicles rely on the environmental perception ability of the eyes [14], behavioral decision-making ability of the brain, and vehicular control ability of the driver's limbs when being driven on the road [13]. Autonomous vehicles fully automate the above three human capabilities. This requires the deployment of a large number of devices on the vehicle for environmental awareness (cameras, LiDAR [16], GPS positioning devices, ultrasonic sensors), locating [15], decision-making (machine learning and deep learning [29] computing units) and mechanical control devices. With the support of data fusion technology, these multi-modal data can provide more accurate analysis and decision-making services, and help ensure driving safety.
• Unmanned aerial vehicles: UAVs are widely used in aerial photography, logistics transportation, disaster rescue, air monitoring, and other fields [17]. In order to implement tasks smoothly, a UAV usually needs to be equipped with a camera, LiDAR [20], an image transmission module, computer vision module, etc. In many functions, such as target recognition [18], path planning [19], and obstacle avoidance, depth analysis based on multi-modal data fusion can help to make more accurate judgments.
• Automatic target recognition and situation awareness: This is the basis of many automation applications and an important function of ICMMS [21]. The innovative application of data fusion technology in different fields (UAVs' perception and obstacle avoidance [24] in the surrounding environment; complementary image target recognition [23] between CCTV and UAV cameras, CCTV and infrared cameras, CCTV and LiDAR, and multi-source sensing of large-scale sensors [22]) is an important guarantee for the intelligent development of ICMMS [21].
• Military automation: The emergence of high-precision weapons marked a qualitative leap in the speed and scale of military operations [28]. However, it also brought challenges such as complex combat processes, dynamic changes of battlefield situation, and difficulties in real-time command and control [27]. Therefore, the whole process from intelligence collection and combat data collection, to information transmission and command communication [26], to precise command and situation feedback relies heavily on military automation [25]. Using data fusion technology, the military command system can filter, screen, integrate, and mine more effective information in massive intelligence, VOLUME 9, 2021 greatly improve the efficiency of data processing, and reduce error cost.

III. DATA FUSION ARCHITECTURE AND TECHNOLOGY CLASSIFICATION
With the development of computer technology including hardware, computing, communication, and storage, data fusion has become widely used. Since 1975, when data fusion was found to have a significant influence on target detection, tracking, and positioning in the military field, researchers have explored data fusion technology and classification. Currently, there are many widely used data fusion architectures and accepted classification standards, as shown in Fig. 1. This section introduces the popular classification methods in detail.

A. JOINT DIRECTORS OF LABORATORIES CLASSIFICATION FOR DATA FUSION
In 1984, the United States Department of Defense founded the Data Fusion Joint Directors of Laboratories and proposed their JDL model. Through gradual improvement and popularization, the model has become the real standard of defense information fusion system in America. White proposed an extended JDL model [30] in 1991. He divided the data fusion process into five levels as follows: • Level 0 -Source preprocessing: This is the lowest level data fusion process, aiming at preprocessing of source (sensor) data (dimensionality reduction, normalization, interpolation, and denoising) to provide high quality and low data-volume preparation for subsequent steps.
• Level 1 -Object refinement: The data output from level 0 are further optimized at this level. The process includes data classification, object refinement, positioning, and recognition. Commonly used object refinement methods include spatiotemporal information alignment, correlation, clustering, grouping, state estimation, error elimination and reduction, and feature fusion or combination. The output information will have a consistent data structure through this stage.
• Level 2 -Situation assessment: According to the information provided by Level 1 and the observed events, higher-level reasoning and evaluation of the current environment and situation can be carried out. Specifically, the spatial-temporal relationships between sensors or data can be used to identify events and activities, widen the global perspective of the monitoring environment, determine the importance of the entity objects in the environment, and ultimately carry out situation assessment.
• Level 3 -Impact assessment: Based on a large amount of uncertain information and possible actions, Level 3 evaluates the output of Level 2 and analyzes the advantages and disadvantages of various actions. Specifically, it assesses the influence of the output result of level 2 (including activities, incidents, environment, and conditions) on the system, predicts the result, and analyzes the risks (including predicting the future state and estimating the probability of risks and vulnerabilities).
• Level 4 -Process refinement: Level 4 optimizes and improves the whole process from Level 0 to Level 3, including resource management, task scheduling, and priority ranking. Level 4 is a repeating process. It monitors system performance, recognizes potential information sources, and completes the optimal sensors deployment in the whole fusion process.
Some auxiliary support components are also defined in the JDL model [31], [32] as follows: • Sources: Responsible for providing various input data for the system, including local or distributed sensor data, prior knowledge in different professional fields, a massive database, and human feedback information.
• Human-computer interaction (HCI): This part is essential for the complete operation of the system. HCI realizes the process of inputting information and obtains feedback from system operators (or users), including information queries, operation instructions, results, and decision information.
• Data management: Usually, the database management system can meet the requirements of data storage, fusion result storage, and rapid interaction. The processing modules from Level 0 to Level 4 interact with data management module constantly to realize the functions of diversified data retrieval, access, security, backup, and compression.

B. LUO AND KAY'S ABSTRACT LEVEL CLASSIFICATION FOR DATA FUSION
In 1988, Luo and Kay researched problems relevant to a multi-sensor integrated system, described an all-purpose paradigm and method of high-efficiency integration and intelligent application, and defined the concept, potential superiority, and challenges of multi-sensor fusion [33], [34]. According to their hierarchical fusion scheme, the serial process of multi-sensor perception, analysis, and decision making is divided into the signal level, pixel level, feature level, and symbol level [35]. The purpose is to transform sensor data from their original form into high-quality useful information and assist in decision making and evaluation.
• Signal level: The signal level refers to the direct input and fusion output of the sensor data/signal. The input data of this level must be signals under the same specification or mode (such as ultrasonic sensor data, acoustic data, and LiDAR data), which are converted into high-precision signal data through denoising, filtering, and other operations. As a low-level fusion, it can be applied in real-time scenarios or signal preprocessing.
• Pixel level: This level of fusion is usually applied when the input data type is an image. Common image sensors include HD cameras, infrared cameras, and remote sensing cameras. Pixel-level fusion of the image data collected by these sensors is helpful for image preprocessing, segmentation, classification, searching, and target extraction.
• Feature level: The feature level takes the features extracted from the original data after specific preprocessing as input, then outputs more accurate or more complete high-level features through feature fusion.
The new features help to improve the accuracy and intelligence of system decisions. Some commonly used feature fusion techniques include IHS transform, artificial neural network, and image feature registration.
• Symbol level: Similar to decision fusion in Section III-C, by taking features or data as input, this level of fusion can obtain symbol-level state or event representation. It can also combine or fuse the decisions based on multiple data sources to obtain a more comprehensive or intelligent decision. As the highest level of fusion, the new decision is helpful to improve the performance of the system in prediction, evaluation, classification, action, and other functions.

C. DASARATHY CLASSIFICATION FOR DATA FUSION
To provide an unambiguous classification standard for many uncertain or ambiguous data fusion types, Dasarathy divided data fusion architecture into five processes based on I/O characterization and categories of data, features, and decision-making in 1997 [36], [37]. The characteristics of these five categories are summarized according to the nature of the input entity and output results for each data fusion process. Dasarathy's classification method [38], [39] is widely used at present.
• Data in-data out (DAI-DAO) fusion: Corresponding to sensor fusion, DAI-DAO is the lowest level of data fusion. The output result is obtained after multi-source fusion of the input raw data. The result is still data, but the quality is high. That is, the reliability, integrity, and consistency of the data are improved.
• Data in-feature out (DAI-FEO) fusion: In this mode, the input source data are deeply fused to extract features. These features (unique or universal) can describe different situations of a system or different forms of scenes and entities.
• Feature in-feature out (FEI-FEO) fusion: Features from the previous layer or different data sources are further fused or combined in this layer to obtain new features, namely, feature-level fusion. This obtains high-level description of features or more accurate features.
• Feature in-decision out (FEI-DEO) fusion: It is not enough to only describe the characteristics of the objects. As the interaction interface between the system and the user, the output of a decision is also a very important component. At this level, the input features (simple or high-level) are processed and analyzed as the basis of system decision-making. At present, the data fusion of most systems mainly involves the three processes of DAI-DAO, DAI-FEO, and FEI-DEO.
• Decision in-decision out (DEI-DEO) fusion: An intelligent system should provide both simple low-level decisions and high-precision global decisions. At this level, different decision sources from the previous level (system evaluation and the decision of a single event or state) can be fused or combined to obtain higher-level decisions. VOLUME 9, 2021

D. CLASSIFICATION OF DATA FUSION IN DIFFERENT STRUCTURES
With the increase of data fusion applications and the complexity of the heterogeneous Internet of Things, the choice of fusion location is very important. Therefore, some research has been conducted on data fusion classification according to different system structures [7], [40], [41]. There are 4 main classification modes as follows: • Centralized architecture and fusion: In the era of a single-chip microcomputers, the data quantity demand is small. The sensors are very close to the CPU and the fusion module is directly located in the CPU to obtain the sensor data for centralized fusion and calculation. This is helpful for integrated fusion and computing. However, with the commercial use of 5G, the development of communication technology has brought about an increase in data demand. This has a significant impact on channel bandwidth, data preprocessing (alignment, registration, correlation, denoising, and so on), storage space, transmission, and calculation delay. Therefore, it is not reliable to use a centralized fusion architecture directly in a complex system.
• Decentralized architecture and fusion: Different from centralized fusion, in decentralized fusion, each node carries out peer-to-peer communication with other nodes in a decentralized structure. Nodes do not process the data before sending, but fuse their perceived data with data received from the other nodes. One commonly used decentralized fusion method is Fisher and Shannon measurements. However, this fusion method is not conducive to communication resources, node computing performance and cost, or system robustness.
• Distributed architecture and fusion: With the popularity of cloud computing and edge computing [42], an increasing number of systems have adopted distributed computing and fusion. The advantage of this fusion method is that each source node performs simple data fusion operations (such as signal level fusion, data association, and state estimation) before sending data. This greatly reduces the amount of data, ensures data quality, and reduces communication costs. When the new data are transmitted to other sensor nodes or fusion nodes, the system can further complete higher-level fusion and analysis, obtain a global environmental perspective, and make dynamic system-level decisions.
• Hybrid architecture and fusion: Data fusion in the hybrid mode includes the three methods mentioned above, and is more suitable for intelligent systems and applications with a wide range, large amounts of data, and complex functions. Through cloud control, distributed nodes receive data from decentralized nodes or single-source sensors and perform a data fusion process to complete hierarchical fusion schemes with different levels.

E. LAU'S MULTI-PERSPECTIVE CLASSIFICATION FOR DATA FUSION
Lau [43] has a deep understanding of the above classification methods, and investigated data fusion literature in recent years in the field of smart cities. Most articles have their own understanding of data fusion classification due to the different focuses of fusion, which are very difficult to define. Therefore, Lau believes that it is necessary to classify data fusion from different perspectives, which will help to broaden definition of data fusion. In 2019, he proposed a very comprehensive universal multi-perspective classification [2]. The advantage of this classification method is that it can comprehensively evaluate the literature or application from all aspects of data fusion and quantitatively evaluate whether it has the depth and width data fusion in a temporal and spatial scale [44].
• Data fusion objectives: Data fusion technologies are classified according to the objectives of the smart city applications, including fixing problematic data, improving data reliability, extracting higher-level information, and increasing data completeness. These goals are also the advantages of data fusion. Therefore, after adding other goals, such as improving decision intelligence, optimizing storage, and transmission efficiency, this goal-based classification standard is generally applicable to other data fusion applications.
• Data fusion techniques: This category includes data association, state estimation, decision fusion, classification, prediction/regression, unsupervised machine learning, dimension reduction, statistical inference and analytics, and visualization. It covers most of the commonly used low-level data fusion technologies and data-mining technologies. Low-level data fusion is mainly used to generate high-quality data at the same level. High-level data fusion can fuse simple inputs from multiple data sources to create abundant high-level information.
• Data input and output types: This category directly uses the I/O characterization-based classification mode of Dasarathy (Section III-C).
• Data source types: There are four kinds of common data sources in smart city applications: physical data sources, cyber data sources, participatory sources, and hybrid data sources. They are classified according to data source without considering communication media. This method aims at data classification in specific fields. In different applications, the data fusion classification method based on the type of data source can make it clearer what data are used in a system or application, although this method is not universally applicable.
• Data fusion scales: This is a classification of data fusion scale in the city, including sensor-level fusion, building-wide fusion, inter-building fusion, and city-wide fusion. • Platform architectures: This category is made according to the position of computing nodes [45], including edge computation, fog/mist computation, cloud computation, and hybrid computation. In all systems that require communication, the location of hardware devices or computing nodes has a great impact on data storage, analysis, and services. Therefore, this method is universal for data fusion applications classified by communication and computing methods.

IV. ICMMS ARCHITECTURE AND CHALLENGES BASED ON DATA FUSION A. DATA FUSION-BASED ICMMS ARCHITECTURE
Compared with a single-information-source system, ICMMSs with data fusion [46] have great advantages in many aspects, such as space coverage, monitoring time span, data redundancy, data source reliability, system robustness, data complexity, storage resources, computing performance requirements, and application services [47]. However, existing CMS data fusion architectures are not comprehensive enough. For example, [7] designed a perception system based on homogeneous data fusion and heterogeneous data fusion for multiple-sensors including GPS, LiDAR, IMU, and stereo vision cameras. [48] proposed a method for analyzing crowd flow characteristics among multi-scale public places based on multi-source data fusion. [49] designed an estimation algorithm for pedestrian flow rate based on real-time Wi-Fi traces. Most designs only focus on simplex sensor fusion, i.e., data fusion without data perception or decision-making.
To solve these problems, we propose a multi-sensor, multimodal, and multi-dimension ICMMS based on a three-layer data fusion structure including sensor fusion, feature-based data fusion and decision fusion. The system architecture is shown in Fig. 2.
More specifically, multi-source perception and sensor fusion can expand the temporal and spatial coverage range of ICMMSs. Cleaning, reducing the dimension of, and integrating data can eliminate data redundancy and guarantee the completeness of the data [50], and lower the complexity of computing. Data feature fusion helps to reduce the requirements of application services for system storage resources and computing performance, and it can provide additional complete and in-depth features [51]. Decision-level fusion or deep information mining and prediction provide users with high-level intelligence of the system [52].

B. REQUIREMENTS AND CHALLENGES
The architecture of an ICMMS contains multi-sensors, multi-functions, and man-machine interaction decisions. This means that the system's data fusion procedure in this system must cooperate with the intelligence and automation of multi-source and multi-modal data, from the bottom sensor to the upper services. We have summarized the requirements and challenges of data fusion in an ICMMS.
• Network temporal and spatial expansion: To enhance the robustness of an ICMMS, the entire hardware network architecture must satisfy the functions of distributed and historical data collection, which requires multi-source data fusion to support the spatial expansion of a single sensor. Furthermore, based on how storage and computing capabilities of the edge cloud and cloud improved, the integration of all the historical data further realizes the temporal expansion of the system [53].
• Data reliability: In an ICMMS, data reliability includes the sensor node's reliability, data authenticity, and VOLUME 9, 2021 system security [54]. The security measures of sensor nodes in the system (such as gas sensor, ultrasonic sensor, CCTV camera, infrared camera, LiDAR, and UAV equipped with a high-definition camera) include antiinterference, anti-intrusion, and anti-failure features. System security can be guaranteed by anti-attack measures, and authority control, for example. Data security measures usually include data encryption and decryption transmission along with storage, data backup, digital certificates, and so on.
• Data consistency: An ICMMS uses distributed data sensing and centralized data storage and analysis. For accurate data analysis, it is necessary to guarantee 1) the consistent perception of different sensor data sources to the same incident, 2) data consistency [55] in transmission from the data source to edge cloud, and 3) the consistency or normalization of data assessment indexes.
• Data integrity: The data collected by sensors may be missing, redundant, and noisy. For incomplete or defective data, it is necessary to complete or estimate them through other data sources or context information collected by the sensors [56].
• Deep information hiding: The data subset in a distributed system may describe well the status of the subsystem or characters of incidents in a single area. However, the information description is very limited from a global perspective. The ICMMS must allocate global resources, give early warnings, and make decisions according to the monitored dynamic crowd status. Therefore, the fusion of local features in the data analysis stage can help to deeply search the hidden information for the system's decision-making layer.
• Decision accuracy: The data fusion mentioned above helps to search deep information, while decision-making layer fusion is closer to applications and provides accurate and effective decisions to users [57]. In a ICMMS, accurate decisions include precise alarms for fire disasters, evacuation suggestions, early warnings for abnormal incidents, and early warnings for abnormal behaviors.
• Service intelligence: For users, whether the system is intelligent directly affects the service experience. How to present the accurate decision of the ICMMS more intelligently is the most important thing for user-oriented services and applications. In the decision fusion stage [58], visualization and simulation methods such as a video monitoring screen, sensor data chart display, 3D modeling of the site and the crowd, future crowd prediction, and a simulation dynamic diagram, will further improve the system intelligence.

V. DATA FUSION TECHNOLOGIES FOR AN ICMMS
Aiming at the complexity of ICMMS scenarios, this paper proposes a 3-layer data fusion architecture based on the location, purpose, and characteristics of data fusion, comprising sensor fusion, feature-based data fusion, and decision fusion, as shown in Fig. 3. Regarding each fusion process, this paper introduces the commonly used algorithms for different functions. Table 2 concludes the comparison of different data fusion layers [7], [48], [59]. Sensor fusion uses original signal or pixel data and requires distributed algorithms to conduct processes in the infrastructure layer close to the IoT device. The processed data or features can be used further in feature-based data fusion processes to obtain high-level features or decisions. This type of architecture can be centralized or decentralized. The output of the decision fusion layer can provide the highest level of decisions for applications. However, the data pre-processing/pre-fusion requirements and information loss level are increased. Examples are given below.

A. SENSOR FUSION TECHNOLOGIES
A variety of sensors are deployed in the ICMMS to collect different types of sensor data. The signal or data quality and other issues must be considered in the device terminal to achieve more efficient data sensing, data acquisition and time synchronization. Sensor fusion technologies [60] can be introduced to preprocess and prepare data before data analysis, including the following: • Time registration: When the data source (sensor) performs the acquisition task, the other related components in the whole communication system keep the synchronous change of data so that the data transmission time is consistent.
• Data cleaning: The measurement parameters are combined to improve the classification of target detection and increase the accuracy of situation estimation. Multi-sensor data usually provide complementary information on the monitoring area. The purpose of sensor fusion is to obtain more accurate and complete data based on these raw data to provide more complex and detailed scene representation.
• Redundant data deduplication: Aimed at simplifying the redundant information (environmental monitoring data, event status, retransmission misinformation, and so on) caused by multi-sensor acquisition, fusion methods, such as the neighbor algorithm and data association, can be used to obtain the correlations between data sources and remove redundant data. Thus, the data consistency and integrity are maintained and the size of the data packet as well as communication costs are reduced.
• Noise and error elimination: In an unstable state or abnormal environment, fusion methods, such as Kalman filter, IHS transform, wavelet transform, principal component transform (PCT), and K-T transform, can be used to remove noise and eliminate errors.
In the following subsections, we will introduce applied examples of sensor fusion technologies in ICMMSs: noise elimination, redundant data deduplication, and low-quality data filtering.

1) NOISE ELIMINATION
Noise usually appears in signal or image data, both of which require different noise elimination methods. 1) Signal data noise elimination: At first, Fourier transform was used to process signals. However, it is only applicable to periodic (approximately periodic) data. Thus, new noise elimination algorithms have been developed, such as filter methods and wavelet transform. 2) Image data noise elimination: The image data of an ICMMS contain different types of noise. They can be divided into Gaussian noise, Rayleigh noise, gamma noise, exponential noise, and uniform noise according to the probability density of the noise. Therefore, spatial filtering (neighborhood averaging, median filtering, and low-pass filtering), transform domain filtering (Fourier transform, Walsh-Hadamard transform, cosine transform, K-L transform, wavelet transform, etc.), partial differential equations, variation methods, and morphological noise filtering can be used.
In practice, data collected by different sensors have different resolutions. Therefore, it is necessary to solve this multi-resolution data fusion problem to make better use of the complementary information from data with different resolutions and achieve a better fusion effect. This subsection combines Kalman filtering and wavelet transform as an example to introduce multi-source data fusion for eliminating noise and removing redundancy.
Kalman filter [61], [62] is mainly used to fuse low-level real-time dynamic multi-sensor redundant data and eliminate noise. It is especially suitable for cases in which the error between systems and sensors conforms to the Gaussian white noise model. A Kalman filter has strong estimation ability for non-stationary signals, and it can process all frequency components of the signal at the same time. Its characteristics mean that the system does not need to deal with much data storage and calculation. Therefore, it is very suitable for terminal sensor fusion. The time complexity of Kalman filter is O(m 2.376 + n 2 ), where m represents the dimension of observation and n represents the number of states. However, when a single Kalman filter is used for data statistics of a multi-sensor combination system, the reliability and real-time performance are not good. Therefore, wavelet transform can be introduced. The characteristic of wavelet transform is that it has a high resolution for different data. It can focus on any detail of the analysis object by gradually fine-tuning the step size of the time and frequency domains for high-frequency components. Therefore, the combination of wavelet transform and Kalman filter can achieve a good fusion.

2) REDUNDANT DATA DEDUPLICATION
The image data sources in an ICMMS can include CCTV cameras, infrared cameras, and cameras on UAVs.   The information collected by sensors whether the same or different kinds, may be from the same scenarios. When some large size data arrive at the data center from the edge, it will occupy significance communication resources and cause communication delays. Therefore, redundant data can be deduplicated at the edge network, and duplicate data can be removed at the near device side. The redundant data can be removed by using Perceptual Hashing Algorithm, where they are retrieved and eliminated using the low-frequency information of an image. The algorithm for redundant image retrieval is shown in Fig. 4.

3) LOW-QUALITY DATA FILTERING
Monitoring images are significant to visual detection in an ICMMS. This section introduces the method of building an image quality evaluator to filter low-quality image data. The BRISQUE algorithm [63] can be used to train an offline evaluator and is configured at the software layer for calling. The BRISQUE algorithm is an image quality evaluation algorithm in the spatial domain without reference. The principle of it is to extract mean subtracted contrast normalized (MSCN) coefficients from an image. The MSCN coefficients are fitted into an asymmetric generalized Gaussian distribution (AGGD). The extracted Gaussian distribution features are input into a support vector machine (SVM) for regression, and the evaluation results for the image quality can be obtained. The process of establishing the evaluator is shown in Fig. 5. Regarding the overall computational complexity of BRISQUE with other algorithms (PSNR, BLIINDS-II and DIIVINE), it only takes 1 second to compute each quality measure on an image of resolution 512 × 768 on a 1.8 Ghz single-core PC with 2 GB of RAM.

B. FEATURE-BASED DATA FUSION TECHNOLOGIES
In an ICMMS, different data from different sensors are also called multi-modal data. We can use the following commonly used stochastic methods and artificial intelligence methods to integrate these data at high quality and mine the information deeply [64]: • Stochastic methods: Stochastic methods include the weighted average method, Kalman filter method, multi-Bayesian estimation method, evidence reasoning, and production rules. The multi-Bayesian estimation method is a common way to fuse high-level information of multi-sensor in a static environment. It combines multi-modal information according to probability, measures its uncertainty, and expresses it as conditional probability. Then, the final fusion value is provided, which is a feature description of all of the environment fusing information. The sensor data can be fused directly when the observation coordinates of the sensor group are consistent. But in most cases, the sensor measurement data should be fused indirectly by Bayesian estimation.
• Artificial intelligence methods: Artificial intelligence methods include regression, classification, Bayesian network, clustering, dimension reduction, fuzzy logic theory, neural networks, rough set theory, expert systems, deep learning, reinforcement learning and label-less learning [65]. With the rapid development of communication and computing technologies, the amount of data has increased. New data fusion technologies based on artificial intelligence is likely to play an increasingly important role in multi-modal data fusion.

1) DATA FUSION FOR CAMERA AND LIDAR DATA
Due to the deployment of a high-definition camera and LiDAR in an ICMMS, it is necessary to fuse the two kinds of data to obtain consistent information. A prior work [66] researched the output fusion from a LiDAR scanner and a wide-angle monocular image sensor for free space detection. The spatial resolutions of the output of a LiDAR scanner and an image sensor are different. They should be aligned with each other. A geometric model can be used to align the two sensor outputs in space, and then a resolution matching algorithm based on Gaussian process (GP) regression can be used to interpolate the missing data with quantifiable uncertainty. This has reference significance for ICMMS, which deals with uncertain sensing of free space detection scenes.
To solve the problems of high sparsity, irregular distribution, occlusion, and the fuzzy structure of 3D point cloud obtained by mobile LiDAR, a prior work [67] studied how to effectively detect 3D objects in point clouds in a large-scale building environment without pretraining a 3D CNN model. The authors fused the vision and range information into the probability framework based on a truncated cone and projected the image-based target detection results and LiDAR SLAM results onto a three-dimensional probability map to optimize the target location and boundary box estimation.
Another study [68] converted the sparse depth map of LiDAR data into a dense depth map so that the two sensors were aligned with each other at the data level; then, they used the YOLOv3 real-time target detection model is used to detect color images and dense depth maps. Finally, a data fusion method based on boundary box fusion and improved D-S evidence theory was constructed. The results of the previous steps were fused to obtain the final location and distance information. The detection speed of the proposed fusion method was 0.057 s using an Intel Xeon E5-2670 CPU and an NVIDIA GeForce GTX 1080Ti GPU, which was 7, 35, 53, and 73 times faster than MS-CNN, SubCNN, 3DOP and Mono3D methods, respectively.

2) DATA FUSION FOR VISIBLE AND NEAR INFRARED CAMERAS
In an ICMMS scene, weather has a significant impact on a HD camera. Therefore, infrared cameras often are deployed to obtain images of scenes in extreme weather, such as fog, rainstorms, and blizzards. However, infrared image data does not provide color information. Combining these two kinds of cameras is necessary to obtain complete details of the global scene. This requires high-precision fusion of the multi-modal image data. Prior work [69] researched a method of fusing sensor data in the visible spectrum (VS) and near-infrared (NIR) wave band. The method sufficiently uses complementary details offered by VS and NIR images. While reserving VS image spectrum, the lost space details are injected into the VS images in self-adaption. The VS and NIR data are fused as follows: 1) The NIR wave band is compared with IRGB data from the VS images. The lost space details in the VS images are confirmed as F(x) = max(0,LC(I NIR (x))−LC(I RGB (x)))
2) Space details from the NIR spectrum are extracted. A highpass filter g is used to extract radio-frequency components (space details) of the INIR, which is, g = δ − h. The unit pulse filter h is a prototype Gaussian filter with a radial cut-off frequency period / image height (c/ph). The kernel size is k × k.
3) The image spectrums are fused and injected into VS images. Finally, the space details are weighted according to fusion graph F and injected into the VS images to obtain enhanced images, that is, J RGB (x) = I RGB (x) + F(x)(g × I NIR )(x). The proposed method takes only about 0.7 s to fuse an image of size 682 × 1024, which is 2.5 times faster than the color-transfer method.
In other related works, the authors [70], [71] researched two other image data fusion methods. Aiming at the research of depth prediction in the field of automatic driving, one study [70] constructed a common feature fusion subnet, a full feature fusion subnet, and a high-resolution reconstruction subnet to make full use of the complementary details of visible and infrared images. An infrared visible image fusion network based on CNN (IVFuseNet) was proposed. Another study [71] made visual and infrared images become the real and imaginary parts of the complex function and proposed an algorithm to fuse images from vision and infrared cameras.

C. DECISION FUSION TECHNOLOGIES
There are three subsystems in an ICMMS: the infrastructure, AI, and visualization. Each subsystem or detection area has a small range of analysis, evaluation, and action decisions. However, one-sided decision-making is not conducive for the edge cloud or cloud to grasp the global state and the advantages and disadvantages of the algorithm. Therefore, both subsystem decision fusion and global decision fusion [72], [73] are indispensable for a complex system to complete intelligent services and applications.
The commonly used decision fusion methods can be divided into three categories: 1) statistical-based fusion methods, including probabilistic reasoning, Bayesian reasoning, and D-S evidence; 2) information theory-based fusion methods, including parameter template matching, clustering analysis, adaptive neural network, and information entropy; and 3) cognitive model-based fusion methods, including logic template matching, fuzzy logic theory, and expert systems. Some of these methods overlap with feature-based data fusion because the data fusion results can be used as the final decision in some simple systems. For example, in our ICMMS, the hazardous gas or smoke data collected by gas sensors can be used to obtain an assessment and alarm information from the surrounding environment after probabilistic reasoning or parameter template matching and fusion. In this section, simple examples of the above three decision fusion methods are discussed.
Bayesian inference is a commonly used statistical-based fusion algorithms. The decision fusion method used in a multi-sensor detection task in a prior study [74] is very suitable for ICMMS scenarios. The authors used simultaneous interpreting of different types of sensors to detect activities on event domains. The array of sensors makes their field of vision overlap, so the sensors can not identify specific activities alone. However, a sensor set can isolate specific activities by fusing multiple sensor detection. They organize and maintain a variety of assumptions about activities in the area monitored by sensors. To avoid the drawback that the fusion rules based on likelihood estimation cannot be solved or are difficult to solve, it is an important research direction to optimize the probabilistic reasoning algorithm and achieve universality in complex situations When dealing with unknown (uncertain) signal/noise data. From this perspective, [75] described distributed detection scenarios based on Bayesian inference, which is also an important reference for our work. They studied the design of fusion rules for distributed detection problem, described the problem using a hierarchical model, and proposed a Gibbs sampler to realize fusion based on posterior probability and a fusion rule design method based on a Bayesian inference tool. The whole process has a manageable high computational complexity.
Regarding the information theory based fusion methods, one study [76] researched a multi-source classification method based on a neural network and statistical modeling. The first scheme separated the single data source by using a statistical method for modeling and applied multiple decision fusion schemes to combine the information from each data source. The second scheme described in the study used weighted consistency theory. The weight of a single data source reflects its reliability. A prior study [76] optimized the weight to improve the accuracy of the combination classification. Other decision fusion schemes are based on the two-stage method. Voting is used in the first stage. If most classifiers of the data source do not agree with the classification of samples, the samples are rejected. In the second stage, the samples are classified by a neural network. The method is applied to the classification of multi-source and super dimensional data sets.
In the cognitive model-based fusion methods [77], a prior study [78] used fuzzy logic to dynamically change the weight of multi-source data features [79]. In addition, considering the changes in the data acquisition process (such as lighting, noise, and user device interaction), a multi-biometric authentication system based on fuzzy logic decision fusion was proposed, which can be used in real-time dynamic data acquisition.

VI. OPEN ISSUES AND RESEARCH DIRECTION
Although detailed studies have been carried out on the three aspects of data fusion (i.e., sensors, computing nodes, and cloud applications), these methods are not complete in ICMMS scenarios that involve sensing and collecting a large amount of data with efficient, low-cost real-time transmission; safe and stable storage; high-precision analysis, deep information mining, and intelligent application. Therefore, in the future, in-depth research can be conducted on data fusion technology and methods in each stage of the data value chain [80], as shown in Fig. 6.
Research is needed into the following topics: • Data fusion for data generation: Research and discussion about data generation, multi-modal sensors, and data types, including radar, ultrasonic, infrared / thermal imaging cameras, CCTV cameras, and Global Positioning System (GPS).
• Data fusion for data acquisition: Some sensor-fusion algorithms exist for cleaning and filtering data, includ-ing the central limit theorem, Kalman filter, Bayesian networks, and Dempster-Shafer.
• Data fusion for data transmission: A centralized and decentralized architecture, requires data fusion algorithms for data transmission, including time synchronization and data integrity.
• Data fusion for data storage: It is very important to build a data fusion server on the edge cloud and remote cloud in complex system, including data integration along with a storage system (data synchronization in the database), data security, and privacy mechanisms.
• Data fusion for data analytics: Due to the multi-modal and missive ammounts of data obtained by an ICMMS, extended research for data-fusion based analysis algorithms is necessary, including data preprocessing, AI-based data analysis, and data-fusion enhanced AI algorithms for data analysis.
• Data fusion for data applications: Regarding the objective-level classification, we will research more visualization and simulation-based data fusion technologies for data display, heterogeneous smart IoT applications, and human enhanced fusion.

VII. CONCLUSION
This paper surveys data-fusion based ICMMSs, where we first introduced the motivation for fusing data in an ICMMS, including its advantages, applications, requirements, and challenges. Then, we investigated five popular data fusion classification architectures: JDL classification, abstract-level classification, I/O characterization-based classification, architecture-based classification, and multi-perspective classification. This has enabled us to explore different perspective and data fusion such as locations, purposes, technologies, and architectures for data fusion. Based on the widely used general architectures, we proposed a multi-sensor, multi-modal and multi-dimensional ICMMS architecture based on data fusion. The data fusion process in ICMMS can be divided into three processes: sensor fusion, feature-based data fusion, and decision fusion. For each fusion process, aiming at the challenges of network spatial-temporal expansion, data reliability, data consistency, data integrity, deep information hiding, decision accuracy, and service intelligence, this paper classifies the commonly used fusion technologies and gives practical use cases from the algorithm process to the result output (noise elimination, redundant data deduplication, low-quality data filtering, multi-modal data fusion, multi-source decision fusion, etc). Finally, this paper summarized the future research directions and open issues in the field of data fusion for data generation, acquisition, transmission, storage, analytics, and application. He has led more than ten national research projects and coauthored more than 45 research articles in peer reviewed journals and conferences. His current research interests include WSN, ICN, and secure content routing.