Design and Performance Evaluation of an AI-Based W-Band Suspicious Object Detection System for Moving Persons in the IoT Paradigm

The threat of terrorism has spread all over the world, and the situation has become grave. Suspicious object detection in the Internet of Things (IoT) is an effective way to respond to global terrorist attacks. The traditional solution requires performing security checks one by one at the entrance of each gate, resulting in bottlenecks and crowding. In the IoT paradigm, it is necessary to be able to perform suspicious object detection on moving people. Artificial intelligence (AI) and millimeter-wave imaging are advanced technologies in the global security field. However, suspicious object detection for moving persons in the IoT, which requires the integration of many different imaging technologies, is still a challenge in both academia and industry. Furthermore, increasing the recognition rate of suspicious objects and controlling network congestion are two main issues for such a suspicious object detection system. In this paper, an AI-based W-band suspicious object detection system for moving persons in the IoT paradigm is designed and implemented. In this system, we establish a suspicious object database to support AI technology for improving the probability of identifying suspicious objects. Moreover, we propose an efficient transmission mechanism to reduce system network congestion since a massive amount of data will be generated by 4K cameras during real-time monitoring. The evaluation results indicate that the advantages and efficiency of the proposed scheme are significant.


I. INTRODUCTION
In the war of counterterrorism, the international community is still facing a series of serious new challenges and new issues that need to be resolved. In the face of the technological and ideological trends of international terrorism, the use of advanced technologies such as artificial intelligence (AI) [1], 5G [2], [3], information-centric The associate editor coordinating the review of this manuscript and approving it for publication was Zhenyu Zhou . networking [4]- [6], blockchain [7], Internet of Things (IoT) technology and other means to counteract terrorism is a top priority, and the focus should be on improving antiterrorism systems' early warning mechanisms, rapid response mechanisms and consequence processing mechanisms. At present, a new generation of equipment for criminal investigation, technical investigation, anti-explosive measures, identification and security, continuous upgrading of computer hardware and software, and efficient wireless and satellite communications is enriching the front line of antiterrorism operations. Among them, suspicious object detection systems are an effective way to respond to global terrorist attacks and should be given high priority.
Taking public transportation security checks as an example, there are several different methods [8], [9] of performing security checks of both people and carry-on items, such as metal detector gates, X-ray checkers for carry-on items, explosive detectors and human body imagers. Human body imagers can be divided into different categories depending on the different kinds of technology they use, including X-ray fluoroscopy, X-ray backscatter imaging, and millimeter-wave (MMW) imaging technologies. X-ray fluoroscopy is widely used in the medical field and is a very effective way to obtain real-time moving images of the interior of an object such as the human body. However, it requires a large amount of radiation and thus is not an ideal method for security checks. X-ray backscatter imaging is an advanced X-ray-based imaging technology that is much safer than fluoroscopy. This technology has been widely used in security checks in public areas for many years. MMW imaging also follows a similar imaging principle as that of backscatter machines. However, MMW scanners emit a special class of microwaves instead of X-rays; these microwaves pass through clothing and bounce off the person's skin as well as any potentially threatening objects, allowing them to be discovered. These traditional solutions require people to line up and wait to be checked one by one. This process greatly increases the waiting time and creates a bottleneck in the security check area. Therefore, with the increasing worldwide population and the expansion of international communication, it is necessary to develop a suspicious object detection system that does not require people to stop at a particular point.
To address this need for a ''no-stop'' suspicious object detection system, many countries began to develop fast and enhanced suspicious object detection technologies for security checks several years ago [10], [11]. An early study showed the importance of avoiding the interruptions caused by X-ray screening operations at security checkpoints [12]. To this end, new technologies are required to improve the security detection accuracy and the system performance. IoT technology [13], sensor networks [14], [15] and machine learning technologies [16], [17] were all applied in the early work in this field. Moreover, current technologies such as MMW imaging and AI networks are expected to be key elements to improve suspicious object detection [18].
With the increasing numbers of nodes in IoT sensor networks [19] and the higher resolution of surveillance images [20], the networks for suspicious object detection systems are facing pressure in terms of network resources and processing power. The aim for future suspicious object detection systems is to achieve high performance and low congestion. Therefore, one of the important points in developing a new type of suspicious object detection system is to build it on the basis of a low-congestion principle to efficiently control network traffic [21].
The contributions of this paper are listed as follows. 1) We design and implement an AI-based W-band (75GHz-110GHz) suspicious object detection system for moving persons in the IoT paradigm. Compared with the traditional solutions, which require security checks to be performed one by one at the entrance of each gate and consequently result in bottlenecks and crowding, the proposed system enables automatic, no-stop inspections on the basis of W-band unidentified object detection for locations with large numbers of people, such as subway/airport lobbies, shopping malls, and concert venues.
2) A suspicious object database based on simulations and active/passive imagers has been established to support AI-based suspicious object recognition technology. A performance evaluation shows that it can significantly increase the probability of identifying suspicious objects.
3) We propose a low-congestion suspicious object/person network system to ensure efficient and safe data transmission for real-time monitoring data. This system can automatically track a suspicious person between different areas in real time. Moreover, an efficient transmission mechanism is proposed to reduce network congestion. Evaluation results show that the advantages and efficiency of the proposed scheme are significant.
The remainder of this paper is organized as follows. Section II presents previous related studies, including research on surveillance network systems, W-band unidentified object detection, artificial intelligence for image recognition, and low-congestion video transmission for the IoT paradigm. In section III, we present our proposal of a suspicious object detection system for moving persons, including the objective, the system architecture, the suspicious object database to support AI-based recognition technologies, and the suspicious object/person network system. Then, in section IV, we evaluate this suspicious object detection system from two perspectives: AI-based suspicious object recognition and the suspicious object/person network system. Finally, we conclude our work.

A. SURVEILLANCE NETWORK SYSTEMS
As an essential technology that is currently changing our lives, IoT devices such as sensor networks have been widely researched in recent years. A video surveillance network is generally divided into three levels: front-end access, media exchange, and user access. More specifically, such a network is composed of a front-end coding unit, central business platform, network recording unit, client unit, and decoding unit. Historically, the security surveillance industry has experienced four stages of development: the analog surveillance era, the digital surveillance era, the network high-definition era, and the current intelligent surveillance era. Each industry update relies on upstream technological innovation and component cost reduction.
Compared with traditional video surveillance, modern surveillance network systems make it more convenient for computers to perform automatic processing of video information, such as compression, storage, analysis and display, to achieve automated operation. This enables remote monitoring through such a network platform, even from thousands of kilometers away. The ability to use advanced digital software systems to complete a large amount of data analysis in a few minutes improves monitoring efficiency and yields more realistic and clearer digital images, facilitating more convenient and practical monitoring management and maintenance.

B. W-BAND UNIDENTIFIED OBJECT DETECTION
The MMW band is generally defined as the frequency band between 30 GHz and 300 GHz, which lies between the infrared and microwave regions in the frequency spectrum. Compared with visible light and infrared light, this spectral band offers a certain penetrability of most nonmetallic objects while also providing a resolution that microwaves cannot; therefore, technology operating in this band has recently become a focus of research in academia and industry. MMW body imaging technology is an advanced technology in the global security field. It has been used in passenger security screening at airports in the United States, the United Kingdom, the Netherlands, Australia, and Japan, but previously, only the United States and the European Union had access to related technology standards. Such a device can effectively detect suspicious objects hidden on various parts of the human body under the cover of clothing without requiring direct contact with the body, especially nonmetallic items, and the shapes, sizes, and positions of suspicious objects can be determined from the acquired images. In addition, MMW body imaging equipment is harmless to humans and has strong penetrating power. Its transmission power is less than one-thousandth of the electromagnetic wave radiation of mobile phones. It can accurately identify objects carried on the human body, effectively improving objectivity and accuracy, reducing the labor of security inspectors and improving security efficiency.
There are two main approaches to MMW imaging: one is passive, and the other is active. Passive imaging uses millimeter waves emitted by the human body itself to perform imaging. Such imaging is effective in an outdoor setting, but it is difficult to obtain a satisfactory image indoors because the temperature differences between the millimeter waves are limited. The principle of active imaging is similar to that of the flash of a camera. A MMW source emits radiation onto an object to be detected, and the field strength reflected by the object is detected for imaging. This approach can produce more satisfactory results for indoor imaging. Such MMW cameras can be widely used in airport security inspection, airborne imaging radar, industrial inspection, and other fields.

C. ARTIFICIAL INTELLIGENCE FOR IMAGE DETECTION
As AI devices become increasingly involved in our everyday lives, machine learning is enabling improvements in image tagging and object identification. Computer vision is the science of computer systems recognizing and analyzing different images and scenes. A key component of computer vision is object detection. Object detection is used to perform various AI tasks, such as facial recognition, vehicle detection, security scanning, and self-driving. Successful object detection has been demonstrated using several AI algorithms. The most well-documented algorithms related to object detection processes include Regions with Convolutional Neural Networks (R-CNN), Fast R-CNN, and Faster R-CNN. Given an image, R-CNN uses selective search to generate approximately 2000 region proposals from which to compute features by means of a convolutional neural network (CNN). Region proposals are regions that include potential objects. Each region proposal is wrapped as a 227 × 227 RGB image patch for input to a CNN. Feature extraction is then performed in the CNN layers, and the results are passed to multiple binary classifiers to determine the class of each particular region. A CNN is designed to focus on pixels within an image that are located next to each other. Each image is passed through the network as an input and is then sent back as an output in which each object is classified. A challenge in AI systems is to achieve accurate object detection in the case of fast-moving objects such as vehicles.
Based on future improvements in imaging technology and deep-learning-based object detection, we anticipate future suspicious object detection systems with the ability to alert about hidden threats and help security groups prevent terrorism [22].

D. LOW-CONGESTION VIDEO TRANSMISSION FOR THE IoT
Many kinds of data are collected and exchanged in a surveillance network, such as image data, video data, and processed information data. The largest-scale and least effective data are streaming video data. A video stream is a continuous stream of data that contains mostly useless information. Therefore, reducing the volume of video data is an easy way to save network bandwidth resources and improve overall network efficiency.
There are currently many ways to reduce the data volume involved in video transmission, such as different types of video encoding. Reducing the raw video data volume was the legacy approach used when people were still required to manually surveil all video and extract information. Now, with the increased usage of AI technologies [20] and IoT devices, it is easier and more efficient to process video data in a decentralized network [23] for application in a surveillance system. In addition, a previous study has shown that it is more efficient to allow IoT-based edge nodes to process the raw data [24].

III. PROPOSED SYSTEM DESIGN A. OBJECTIVE
The objective of this paper is to design and implement an AI-based W-band suspicious object detection system for moving persons in the IoT paradigm. First, we attempt to use current AI-based suspicious object recognition technology to improve the recognition rate of the proposed suspicious object detection system. To achieve this goal, we build a suspicious object database by means of simulations and experiments. Then, we design and implement a low-congestion suspicious object/person network system. It can effectively transmit the monitoring information generated by visible light cameras and hybrid imagers. Moreover, the proposed system can effectively reduce network congestion.
In recent years, the threat of terrorism has spread throughout the world, and the situation has become dire. To enhance the safety of public places, highly efficient suspicious object detection should be automatically performed. However, it is not wise to perform security checks one by one at the entrance of each gate since this will result in bottlenecks and crowding, especially in subway/airport lobbies, shopping malls, concert venues, etc. It is therefore necessary to perform automated suspicious object detection on moving people. As illustrated in Figure 1, the goal of this paper is to develop sensing/imaging technology that operates in the W-band to recognize suspicious objects and visualize humans or hidden suspicious objects from a distance. This system can be used in places with large numbers of people to automatically detect suspicious objects while greatly reducing the processing time.   presents an overview of the proposed suspicious object detection system for moving persons, which has two main components: (1) Suspicious object recognition by means of various MMW imagers. This is achieved by combining a suspicious object database with various recognition methods, such as AI technologies. Suspicious objects can be identified by means of both active and passive MMW imagers. (2) A suspicious person/object network system. This system will continue tracking a suspicious person's movements through different areas by means of surveillance cameras. During the monitoring process, the MMW images generated in (1) will be associated with the surveillance camera images generated in (2). This allows security personnel to clearly see both the suspicious person's facial features and the suspicious objects he or she is carrying.

B. SYSTEM ARCHITECTURE
To achieve the goal of ''ensuring a sufficient security level without stopping the flow of people,'' the proposed suspicious object detection system uses a staged screening method (primary screening/secondary screening) to recognize suspicious objects concealed by humans ( Figure 3). In the primary screening stage, a W-band radar is used to measure suspicious objects approximately 15 meters away, and for the secondary screening stage, a W-band imager will be developed to measure suspicious objects concealed by humans within 5 meters. By integrating the sensing/imaging information with the existing visual surveillance camera images during these two screening processes, suspicious objects concealed by moving persons can be visually recognized.

1) PRIMARY SCREENING
In the primary screening stage, we use W-band active radars in combination with multiple visible light cameras to recognize whether a suspicious person is hiding suspicious objects (metal, etc.) from up to 15 meters away. If so, our system will provide the security personnel with an image of the suspicious person's face and other related information, and the suspicious person will be guided to a secondary screening point.

2) SECONDARY SCREENING
In the secondary screening stage, hybrid passive and active imagers are used to identify what kinds of suspicious objects a suspicious person is carrying.
During this two-stage screening process, each suspicious person is tracked by surveillance cameras, and corresponding face and whole-body images are recorded by our system to allow security personnel to quickly obtain the location of a suspicious person and request a direct inspection. In brief, the entire safety inspection process using the proposed suspicious object detection system is as follows. First, visible light cameras should be able to detect each person, and active radars are used to conduct a primary screening of the people in the area to check whether they are carrying metal objects. Second, the system tracks each suspicious person, and the identified suspicious people are guided to secondary screening points. Then, it is necessary to associate the MMW images collected in the secondary screening stage with the corresponding visible light images. Finally, the suspicious objects are identified from the MMW images obtained during secondary screening; during this process, AI technologies are utilized to improve the identification probability. Moreover, this system supports the tracking of suspicious people in different areas.
The two main contributions of this paper are the suspicious object database and the suspicious object/person network system. Specific technical details will be described in later sections. Since a large number of 4K cameras will be used in this suspicious object detection system, massive amounts of data will be generated during real-time monitoring. To improve transmission efficiency, we propose an efficient transmission mechanism for use in the suspicious object/person network system to reduce network congestion.

C. SUSPICIOUS OBJECT DATABASE TO SUPPORT AI-BASED RECOGNITION TECHNOLOGIES
For this suspicious object detection system, it is crucial to establish a suspicious object database to assist AI-based recognition technologies to improve the identification probability. First, it is necessary to build a suspicious object database by means of simulation and the collection of images from active/passive imagers.

1) IMAGE GENERATION VIA SIMULATION
To improve public security, passive and active imaging technologies have become a key solution for overcoming the challenges faced by suspicious object detection systems, such as concealed objective detection. Imaging technologies can be classified into passive and active sensors. In this section, we present a passive imaging simulator. Since the amplitude of the detected radiation depends on the target object's emissivity and temperature, these two parameters are considered the main parameters for the radiometric calculation. Then, we implement the preamp receiver, which includes a preamplifier for direct detection. Finally, we evaluate the simulation results by using raw data images from a real imager. The detailed simulation process is described in Figure 4.
First, we design the parameters for the input data [25], which consist of images that include the human body and various objects, to be used as training data for image recognition by a neural network and further research related to this project. Here, we consider the radiometric parameters as key parameters for passive imaging. The parameter values  are expressed in Table 1, where T is the temperature of the human body or a metallic object, ε is the emissivity, ρ is the reflectivity, and T room is room temperature. In this research, we consider knives, scissors, forks and bottles as the concealed objects represented in the input data or training data. After designing the parameters, we create the dataset for the subsequent simulation process. The created dataset includes 245280 images, and the dataset size is 21.8 gigabytes. All input images are settled in frame size 192(H) × 512(V) pixel resolution by transforming in grayscale that simulates the actual image size of the passive imager. Second, we implement the passive imaging receiver using MATLAB. The receiver is capable of direct detection with a preamplifier, which includes a radio frequency amplifier (RF Amp), a square-law detector and a low-pass filter (LPF). First, we design the parameters for the receiver, which we call ''Receiverpream,'' such as the receiver gain (20 dBi), the noise figure (5 dBi), the reference temperature (290 K), the sampling rate (1e6) and the carrier frequency (300 MHz). During the receiving process, the LNA amplifies the brightness temperature of the sensed object and passes it to the square-law detector. Then, the detector adds noise to the received signal before converting it into AC and DC components. Finally, the LPF discriminates the received signal from the noise signal by attenuating frequencies higher than the preferred cutoff frequency. In this process [25], the radiation temperature Tr(i, j) will first be calculated based on the physical method for passive imaging simulation ( Figure 5): where ρ(i, j) is the reflectivity of the object, T 1 is room temperature and T 2 is body temperature. After that, according to equation 1, the lens blur T o (i, j) is calculated as shown below: where m(x, y) is a mask operator. Then, the output of the receiver with the amplifier, l(i, j), can be expressed as shown below: where a is the coefficient for the amplitude of the amplifier, k(i) is the variation parameter of the amplitude for sensor i, off (i) is the variation parameter of the offset for sensor i and r is Gaussian noise.
As the final step, the received signals are converted into the original image by using an image processing technique. We adopt a backprojection algorithm for image reconstruction based on the Radon transform, the inverse Radon transform and the projection-slice theorem. Here, the process of image reconstruction by calculating the average over N frames, J (i, j), can be expressed as shown below. Finally, the adjusted value L(i, j) is obtained to fit the digital data for the output image.
where N is the number of frames.
where J low is the minimum temperature value and J high is the maximum temperature value. After the simulation, we can compare the results obtained using a dataset collected from a real imager with the results obtained using our created dataset. We analyze the results obtained using the two different input datasets, i.e., the raw dataset generated by the real imager and our created dataset, as shown in Figure 6. According to the results, the proposed passive imaging simulator can produce good results that are very similar to the original images for each dataset. On the other hand, the created dataset yields significantly improved simulation results.

2) SUSPICIOUS OBJECT DATABASE
AI-based suspicious object recognition technology can be used to improve the identification probability for suspicious objects in the proposed suspicious object detection system. To achieve this purpose, a suspicious object database should be built for network training and evaluation through both simulation and image collection with active/passive imagers.

a: GENERATION OF A SUSPICIOUS OBJECT DATABASE VIA SIMULATION
Passive sensor images can be generated via simulation to build a suspicious object database. There are two kinds of parameters that should be considered, and a prototype is illustrated in Figure 7. First, we should consider the whole simulation environment, including temperature, reflection, blur, variation, and noise. Second, the parameters of each suspicious item should be considered, including the suspicious object's type and size as well as various rotations and transformations. Accordingly, as summarized in Table 2, we generated 44 kinds of bottles, 41 kinds of forks, 37 kinds of knives and 45 kinds of scissors via simulation, resulting in a total of 10516, 9799, 8843, and 10755 samples, respectively.

b: GENERATION OF A SUSPICIOUS OBJECT DATABASE BY MEANS OF ACTIVE/PASSIVE IMAGERS
In addition to simulation, we can also generate real experimental images through active or passive MMW imaging for suspicious object databases. As shown in Figure 8, we built a simple anechoic chamber with active/passive imagers (left panel) to generate image data. The right panel shows an example of generated passive imager data. Using this chamber, we performed an experiment with subjects carrying different VOLUME 8, 2020   numbers and types of suspicious objects (knives, bombs, guns, liquids, phones, etc.). The relative positions of the subjects and the orientations of the suspicious objects were also varied. Usually, suspicious objects are hidden in clothes or bags; therefore, we also used different kinds of items (cloth, cotton, etc.) to wrap the suspicious objects during this experiment. Accordingly, as summarized in Table 3, we generated 52 samples using the active imager and 1009 samples using the passive imager. In this MMW imaging system, passive imager, based on the prototype of [26], can shoot 1 image per 2 seconds. In contrast, we use a very primitive prototype of the active imager that consists of a single antenna, a receiver and mechanical scanning structure. It will take 6 hours to generate one sample from the active imager. Therefore, the number of samples between active imager and passive imager is significantly different.

3) AI-BASED SUSPICIOUS OBJECT RECOGNITION ALGORITHM
To improve the probability of identifying suspicious objects in the proposed suspicious object detection system, we use AI technology based on our developed suspicious object database. Among the available AI technologies, CNNs represent one of the main categories of deep learning for performing image recognition and image classification. Two critical features of CNNs that distinguish them from other neural networks are their reduced computational complexity and translational invariance. A CNN comprises two main sections. The first section is used to extract features and includes convolutional layers, pooling layers, and batch normalization layers. The second section works in the same way as a traditional neural network and is used to perform classification; it includes one or more flatten layers and one or more fully connected layers. In this paper, we directly use CNN technology, meaning that the performance of the AI component depends on the CNN. In addition, we consider the application of AI with and without a noisy environment in this work. Because CNNs are a mature technology for image recognition and this paper focuses on the construction of the suspicious object database rather than the CNN, the technical details of the CNN are not described here.

D. SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM 1) SYSTEM STRUCTURE
In recent years, the threat of terrorism has spread all over the world; hence, strengthening suspicious object detection technologies has become an urgent issue. To this end, we are developing a low-congestion suspicious object detection system for moving persons that integrates various sensors and imagers. Figure 3 has already described the whole system architecture, which includes AI-based suspicious object recognition technology and a suspicious object/person network system. In this section, we will introduce the suspicious object/person network system in detail. As shown in Figure 9, the operation process of this system mainly includes the following five steps: a) person detection, b) tracking, c) association, d) suspicious object recognition, and e) tracking through different areas.

a: PERSON DETECTION
This step is performed only in the primary screening stage. The purpose of this step is mainly to detect people by means of visible light cameras (4K cameras) and to initially recognize suspicious persons who are carrying metal objects by means of active radar detection. At the same time, each suspicious person's location is recorded. In this step, corresponding information such as the ''area ID,'' ''person ID,'' ''image,'' ''location (x,y),'' ''time,'' ''suspicious person type'' and ''suspicious person confidence'' is generated.

b: TRACKING
A large amount of traffic will be generated in the network system if we transmit the original 4K video. Therefore, it is necessary to develop a content-based extraction and production process for the video data in the suspicious object/person network system to reduce network congestion. In this step, person-only regions are cropped from different RGB images in multiple frames and integrated to reduce network traffic. Meanwhile, corresponding information such as ''groups of   Suspicious persons detected to be carrying metal in the primary screening stage will be guided by security personnel to a secondary screening point. The MMW images generated during secondary screening are then associated with the corresponding camera RGB images. Meanwhile, ''MMW images'' are recorded as new information for each suspicious person.

d: SUSPICIOUS OBJECT RECOGNITION
By means of the suspicious object database in combination with AI-based suspicious object recognition technology, each suspicious object is accurately identified, and the information of the ''dangerous object type'' and ''suspicious person confidence'' is recorded.

e: TRACKING THROUGH DIFFERENT AREAS
Since a suspicious person's movements may pass through different monitoring areas, this system supports monitoring a given suspicious person in different areas. The overall scope of this system is to design and implement a suspicious object/person detection and tracking system. This section describes the recognition, tracking and communication platform, which is a central part of our project. As shown in Figure 10  be designed with flexible structures to handle a maximum of 100 RGB cameras, 100 infrared cameras, 100 external suspicious person detection systems, and 100 hybrid imagers. For instance, it should be possible to add additional PCs for the Person Detection & Tracking process, as shown in Figure 10. Figure 11 presents the detailed function modules of the suspicious object/person network system. Moreover, we also provide an example of information flow in the proposed suspicious object/person network system (Figure 12).
To track a suspicious person through different areas, a local person ID will be generated when the person is first detected (for example, at the airport entrance). At the same time, the end point of tracking (for example, the boarding gate) is predetermined. A local person ID exists independently in each area, but the global person ID must be unified; for this purpose, it is necessary to associate local person IDs in different areas. Therefore, each pair of adjacent areas is defined with some overlap for the association of local person IDs. Facial recognition or some other image-feature-based identification method is needed to handle complicated situations. However, these approaches are not in the scope of the first system. Additionally, location data should be defined using both a global location and a local location. For the example of an airport, the global location indicates the absolute position in the whole airport, and the local location indicates the position within the surveillance range of one camera. Given the area ID and local location, the global location can be calculated using the predetermined allocation parameters. In the table in Figure 11, only two (local) areas are described. However, overall location data should be recorded that can be acquired from every frame of camera data.

2) CONTENT-BASED ASSOCIATION OF VARIOUS SENSORS IN THE SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM
In the suspicious object detection system, a large number of sensors will be used. It is vital to associate different types of sensor data acquired from the same object. To address this issue, we study a method of estimating a person's position from each type of sensor data. Figure 13 illustrates a scenario similar to our system. It includes an RGB camera and a MMW imager, which can detect a person from distances of 6 m∼15 m and 1 m∼5 m, respectively. Our purpose in this section is to integrate the data  Face detection [27] is used to estimate the person's position from an image. As shown in Figure 14, based on the initial vertical position V face of the rectangle obtained via face detection and the height L of the face rectangle, the foot position V foot of the detected person in the image can be calculated as shown in equation 6 below.
where k is the proportion coefficient. Based on a previously published human dimension database [28], the average height of adolescents is 1699.1 mm, and the average morphological face height is 121.1 mm. Therefore, k can be simply considered to be 14.
Here, the morphological face height is the distance from the nose to the lowest point of the lower jaw. In this study, this distance is estimated from the face detection area. It is considered equivalent to the distance from the eyebrows to the mouth. The horizontal position of the foot in the image, h foot , is taken to be the midpoint of the face rectangle. Then, a projection transformation matrix is obtained for the image captured by the visible light camera, and the position (h foot , V foot ) in the image can be converted into the position (x, y) from the camera.

3) REDUCTION OF NETWORK TRAFFIC VOLUME FOR THE SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM
To reduce risks for soft targets in public areas, a suspicious object detection system that can handle a large number cameras and various sensors, including 4K cameras and MMW imagers, is needed. In this section, we study an efficient transmission mechanism to reduce the network congestion in the system since a massive amount of data will be generated by the 4K cameras during real-time monitoring.
As illustrated in Figure 3, the operation process of the suspicious object detection system mainly includes five steps: person detection, tracking, association, dangerous object recognition and tracking through different areas. Network traffic reduction is implemented in the person detection and tracking steps. Many related kinds of research have already been developed and proven to be useful for the current network system [29]. We have reviewed and selected several effective methods to reduce network traffic during data transmission.   Figure 15 illustrates the flow chart for the reduction of network congestion. This system uses 4K RGB cameras in combination with a large number of sensors to detect suspicious objects and track suspicious persons. The 4K RGB cameras generate 3840 × 2160 resolution images at 5 fps. Then, these images are passed to the ''Person Detection Function'' (Figure 16(a)) and the ''Tracking Function'' (Figure 16(b)). We set several options for each process to seek an effective method of network congestion reduction. For the Person Detection Function, we provide three methods of reducing network traffic: 1) crop a person-only region from the whole image, 2) compress the image obtained from 1), or 3) resize the cropped image from 1) to dimensions smaller than 64 × 256 pixels and then compress the resulting image. The reason why we consider resizing the cropped image is that the size of a cropped person-only image varies with the distance between the camera and person. Resizing is an effective way to reduce the image size, especially when the person is close to the camera. For the Tracking Function, we propose two methods of alleviating network congestion: a) select consecutive frames but remove images with noise and detection errors or b) in addition to a), select only the latest frame as a representative image if a group of images look sufficiently alike.

IV. EVALUATION A. EVALUATION OF AI-BASED RECOGNITION TECHNOLOGY
This section describes the implementation of an image recognition system using a CNN. Figure 17 shows the configuration of the CNN used to evaluate the proposed suspicious object detection system. In addition to the input and output layers, it includes five main types of layers: convolution, pooling, batch normalization, flatten, and fully connected. The network parameters for each layer are given in Table 4. In this experiment, we used four types of suspicious objects for the training and evaluation of the CNN: bottles, forks, knives and scissors. All images had dimensions of 32 × 32 pixels resolution to fit the image size of the actual passive imager with three types of deterioration: normal, blurred and noisy (examples are shown in figure 18). Table 5 specifies the  numbers of images used for the training and evaluation of the CNN. We used a total of 39913 images for training, including 10516 bottle images, 9799 fork images, 8843 knife images and 10755 scissors images. For evaluation, we used 44 bottle images, 41 fork images, 37 knife images and 45 scissors images, for a total of 167 images. Training was performed by using images with different deterioration types (normal, blurred, and noisy), and the image recognition performance was evaluated using the evaluation images for each type of deterioration. Table 6 shows the evaluation results for different deterioration types. According to these result, the recall is best (92.2%) when normal images are used for both training and evaluation. Generally, the recall is higher when the same deterioration type is used for both training and evaluation, but the noisy case is different. When we use noisy images for training, the recall is 75.4%, 74.9% and 73.1% when evaluation is performed using normal, blurred, and noisy images, respectively. In actual suspicious object detection systems, most of the obtained images will be blurred or noisy. Table 7 lists the object identification results obtained with blur and noise. In this case, the CNN was trained on blurred and noisy images, and the evaluation was also performed using blurred and noisy images. The recall and precision were determined for each object category separately. As seen from this table, the average recall is 80.2%, and the average precision for the four categories is 81.5%. Therefore, the target accuracy of at least 50% was achieved.
Moreover, the ROC curve calculated based on Table 7 was used to evaluate the system. The calculation method is as follows. When an image is input for recognition, the system will output a similarity for each of the four different kinds of suspicious objects (knives, bottles, forks, and scissors). The similarity for the category that is the same as the input category is set to OK, and the similarities for the other categories are set to NG. From the recognition results for multiple input images, the probability density functions of the sets of OK and NG are calculated, and the threshold value of the probability is varied to set the false positive rate and true positive rate ( Figure 19). Figure 20 shows the ROC curve. In this figure, the horizontal axis is the false positive rate, and the vertical axis is the true positive rate; the values from Table 7 are plotted in this coordinate system to obtain the ROC curve. Three cases are considered. (a) The conditions of the evaluation images are the same as those of the training images, as is the case in Table 7. (b) The images used for evaluation have a background gray value of 130, an object gray value of 100, and a noise level 20 dB or 10 dB stronger than that in (a) and are subjected to a 21 × 21 blur filter to enhance the blur. (c) The images used for evaluation have a background gray value of 255, an object gray value of 0, a noise level of 10 dB, and enhanced blur due to a 21 × 21 blur filter. From Figure 20, we can obtain the following conclusions. In case (a), if the true positive rate (accuracy) of the recognition results is approximately 90%, the false positive rate (rate of incorrect answers that are regarded as correct) is approximately 15%.  Similarly, if the true positive rate of the recognition results is approximately 50%, the false positive rate is reduced to approximately 2%. With different thresholds, this system can provide corresponding true positive rates and false positive rates. The threshold should be chosen based on the specific needs of different situations. With higher image degradation, as represented in (b) and (c), the accuracy of recognition will decrease. When the true positive rate is 50%, the false positive rate is 2% in (a), 20% in (b) and 24% in (c). According to these results, to achieve high accuracy, the suspicious object detection system should be used to evaluate images acquired under conditions as similar as possible to those of the images used for training and with as little deterioration as possible.

B. EVALUATION OF THE SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM 1) SYSTEM EVALUATION
In the suspicious object/person network system, three types of processing time (Ta, Tb, and Td) were measured to evaluate the system's performance, as shown in figure 21.   The processing time Ta is the time required for person information detection and the integrated and aggregated packet generation for a detected person for (a) tracking processing. The processing time Tb is the time required to aggregate the hybrid imager data into an integrated packet associated with the same person for (b) association processing. The processing time Td is the time required for the suspicious object recognition result data to be processed by the AI-based recognition server, received, saved and distributed for (c) data management. Tc is the processing time by suspicious object detection server, which is out scope of network system and assumed maximum is 1 second. Since the network system starts operating when a person reaches a sufficiently close distance of 5 m, i.e., the working distance of the hybrid imagers, under the assumption of an average walking speed of 1 m/s, the target total time from Ta to Td should be within 5 seconds. We used a 4K-resolution clip of 6 people walking for this evaluation. The whole process generated 8 tracking packets, and Ta, Tb and Td are shown in Figure 22. Although there is one Ta sample exceeding 1 second, the average time remains within 1 second (0.826 s). The results for Tb and Td are all within 1 second; thus, the overall evaluation time is acceptable for the suspicious object/person network system.

2) CONTENT-BASED ASSOCIATION OF VARIOUS SENSORS IN THE SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM
Regarding the integration of data from various sensors, we evaluate the accuracy of position estimation for visible   figure 23 (b) indicates the correct distance value obtained from the perspective projection transformation of the evaluated video. It is compared with the estimated distance value calculated using the proposed method to evaluate the distance estimation error. Figure 24 illustrates the distance estimation errors for persons at different distances from the camera. The six persons are represented by differently colored dots.  Table 8 shows the root mean square errors (RMSEs) of the estimated positions for the six persons. These results confirm that the position of a walking person can be estimated via face detection with an error of approximately 1 m. In this study, we investigated how to estimate the position of a person from the coordinates obtained through face detection based on an average height and face size. We believe that it is possible to correlate the outputs of different sensors based on a person's position.

3) REDUCTION OF NETWORK TRAFFIC VOLUME FOR THE SUSPICIOUS OBJECT/PERSON NETWORK SYSTEM
We used a 20-second, 34.6 megabyte (MB) mp4-encoded 4K (3840 × 2160 resolution) video depicting six persons walking (samples from this video are shown in figure 16(a)) to evaluate the proposed transmission mechanism for network traffic reduction. In a total of 96 frames, the ''person detection'' process yielded 207 extracted regions, and in the ''tracking'' process, these regions were integrated into 14 groups based on different persons. As a result, 90 regions identified as noise during the tracking process were eliminated. The evaluation results for network traffic reduction are shown in Table 9. Instead of sending the full-length 4K video (34.6 MB), the packet size can be reduced to 0.5 MB if we resize and compress the cropped images in addition to selecting only the latest frame of each group as a representative image. These results show that the network traffic can be efficiently reduced for the suspicious object/person network system.

V. CONCLUSION
In this paper, we have designed and implemented an AI-based W-band suspicious object detection system for moving persons. By means of W-band unidentified object detection, this system can perform no-stop suspicious object detection automatically, making it is suitable for densely populated places such as subway/airport lobbies, shopping malls, and concert venues. With the goal of using a CNN to improve the probability of identifying suspicious objects, a suspicious object database has been established by means of simulations and the collection of images from active/passive imagers. Moreover, we have proposed an efficient transmission mechanism to reduce network congestion in the system since a massive amount of data will be generated by the 4K cameras during real-time monitoring. The evaluation results show that the designed suspicious object detection system can achieve good performance in suspicious object detection for moving persons. In addition, the established suspicious object database can support CNN analysis to significantly increase the probability of identifying suspicious objects, and the proposed low-congestion transmission mechanism can improve the network transmission efficiency.