Laser-Based Algorithms Meeting Privacy in Surveillance: A Survey

Privacy of people is a key factor in surveillance systems. Video camera brings us well-off color information. How would the privacy be secured then? Besides, privacy protection should not create a hindrance for finding of objects or people under specific cases. Laser scanner takes way affluent color information. It functions with eye-safe and invisible laser beam. Yet, it provides us robust object recognition map. Images can be interpreted by humans, but laser-based systems need software applications to explain the data. Camera-based surveillance system does not focus on the problem of private life conservation. On the contrary, laser-based surveillance system ensures privacy of people inherently, as it does not record real world videos except laser scanned data points. In this paper, first, the privacy issues of people for both surveillance systems have been compared to realize their significance. Second, a qualitative performance comparison between laser-based and RGB camera-based systems has been made to hint that laser-based algorithms should be used instead of common RGB cameras. Third, a succinct survey of laser-based detection and tracking algorithms of movers has been conducted. Final, a superiority measure of the leading laser-based people-vehicles related algorithms has been performed on the basis of statistical test scores deeming the ineffectualness metrics (e.g., errors and failures) of each algorithm.


I. INTRODUCTION
Detection and tracking movers (e.g., pedestrians, vehicles, and etc.) should be an important issue for surveillance systems and traffic analysis in conurbations. Surveillance systems on both public and private spaces often expect to detect and track unusual activities [1] or behaviour of movers [2] to ensure high degree of security and safety. A nonautomated and human functioned surveillance system is very expensive and erroneous. But those existing problems can be reduced by an automated surveillance system. Any kind of automated surveillance system demands smart algorithms to process data obtained by sensors and to prepare informative information for making fruitful decisions. Due to algorithmic assumptions and large amount of data processing, an existing smart algorithm cannot attend to its all desire level of applicabilities. Thus, a smarter algorithm is developed. Henceforth, the series of developing smarter algorithms to handle high quality of surveillance keeps on continuing.
The associate editor coordinating the review of this manuscript and approving it for publication was Jiachen Yang . Nowadays, like home automation [3]- [5], traffic automation became one of the key factors for a smart city [6]. The sidewalk occupancy is a serious problem in the urban life [7], [8]. If sidewalk occupancy will occur, then pedestrians will tend to walk on the street which would lead many potential traffic hazards. A civil engineer would like to realize how the sidewalks along with streets can be built to give the maximum comfort and safety to the dwellers. A smart city planner would design roads for autonomous cars and fair traffic flows. In smart cities, connected cars can pair with automated traffic management systems to provide a flawless driving experience for the commuters. Getting precise trajectories of movers from the surveillance system is one of the key requirements for the accomplishment of such tasks smoothly. Indeed, it is a challenging effort to get the workable quality of trajectories for individual person and vehicle with a view to studying traffic and vision related activities from an automated surveillance system with sundry video cameras or laser (Light Amplification by Stimulated Emission of Radiation) scanners. Images from video camera-based surveillance system can be interpreted by any human. Nonetheless, this option is missing in laser-based surveillance system, where software applications are needed to explain the associated data. Seemingly, a smart surveillance system with laser scanners would be more competent and commodious than that of video cameras.
In essence, surveillance requires proper identification and searching of objects by law and enforcement agencies. A crucial issue in a surveillance system is the privacy of people. In principle, a surveillance system should be smart as well as it should protect privacy of people. However, privacy protection should not create a hindrance for the identification of objects or people under specific conditions (e.g., crime scenes, searching a stolen vehicle, and a missing person) by the law and enforcement agencies. A camera is a popular image sensor for recording visual images. In surveillance many different kinds of cameras (e.g., action, infrared, and so on) can be used. The human eye is tactful to red, green, and blue (RGB) bands of light. Many surveillance cameras can capture the same RGB bands as what our eyes see for producing colorful images to be analyzed by human and/or software. An RGB camera uses a standard CMOS (Complementary Metal Oxide Semiconductor) sensor through which the colored images of persons and objects are obtained [9]. The majority of surveillance cameras today feature RGB and infrared (IR) sensors as standard [10]. Surveillance cameras mostly work on IP (Internet Protocol) networks, which can link the cameras from the remote area to the assigned security location. Beyond cameras and lasers, other sensors including RADAR (RAdio Detection And Ranging) [11]- [14], IMU (Inertial Measurement Unit) [15]- [17], GPS (Global Positioning System) [18]- [20], GNSS-R (Global Navigation Satellite System -Reflectometry) [21]- [23], SONAR (Sound Navigation And Ranging) [24]- [26], DMC (Digital Magnetic Compass) [27], fiber optic [12], [28], [29], and temperature measuring devices [30]- [32] are used in surveillance systems. Yet, based on the availability and adoption of movers monitoring algorithms, the surveillance of crowds and/or vehicles can be roughly divided into video camera-based and laser-based surveillance systems.
Almost all video camera-based surveillance systems grant us rich color information under a fixed condition of light illumination alternations. How would the privacy of people be secured from such systems? Intuitively speaking, such systems contribute a very limited privacy of people using so-called privacy masking. On the other hand, laser-based surveillance systems hand over the solution of these existing problems in good way. The trademark of common laser scanners includes SICK [33], Velodyne [34], IBEO [35], and Hokuyo [36]. For example, Fig. 1 (a) shows two devices of SICK namely LMS-511 and LD-MRS. The LD-MRS has 110 • scanning range. It has 4 layers to scan with various heights. Its maximum recognition-distance is 250 meters. Its angular-resolution can be 0.125 • , 0.25 • or 0.5 • [33].
Laser scanner functions with eye-safe laser beam. Human eyes are unable to see the laser beam. Fig. 1 (b) explains hypothetically the emission of laser beam from two LMS-511 devices and hits on human legs. Laser scanner does not give color information as a camera does. Still, it equips solely data points of objects from heads, chests, hands, legs, trees, walls, vehicles (e.g., Fig. 1 (c)), bicycles, or other region of interest (ROI). Hence, data processing becomes not only quicker and easier as compared to video cameras but also it shows special advantages in protecting privacy of people.
During the past two decades, an enormous amount of research has been dedicated to propose sundry laser-based algorithms for recognizing and/or tracking movers from laser scanned data points using various laser scanners. Accordingly, several short survey reports can be found in the literature. For examples, Zhao et al. [37] surveyed the suggested rules for designing secure communication systems using chaotic lasers; Bianchini et al. [38] compared between laser scanner surveys and low-cost surveys; Wan et al. [39] fascinated a survey adjustment method for laser tracker relocation; Zhong et al. [40] addressed a combination of stop-and-go and electro-tricycle laser scanning systems for rural cadastral surveys; Barbarella et al. [41] focused on uncertainty in terrestrial laser scanner surveys of landslides; Deng et al. [42] discussed a panorama image and three-dimensional (3D) laser point cloud fusing method for railway surveying; and Wang et al. [43] hinted a survey of mobile laser scanning applications and key techniques over urban areas. Nevertheless, due to the prompt progress of the field such surveys are VOLUME 9, 2021 to a fixed extent outdated. Additionally, a developed algorithm may function well for a specific surveillance plan, but it might be dysfunctional for other applications. Henceforth, it is extremely difficult to find a generic algorithm for solving many problems in diverse applications. Still, a superiority measure based on statistical tests of existing state-of-the-art laser-based tracking algorithms can help to understand which algorithms would be fitting better in ascending or descending order of performance for solving certain kind of problems. Be that as it may, the existing surveys in the literature do not attract any attention to measure such superiority among the available laser-based algorithms.
The aim of this paper, first, is to focus on privacy issues of people for both video camera-based and laser-based surveillance systems. Its second aim is to make a qualitative performance comparison between laser-based and RGB camera-based systems. Such comparison helps to establish the fact that one system is conditionally superior to its alternative. Its third aim is to provide a thorough overview of the advances of algorithms concerning the laser-based system for detecting and tracking of movers. Its final aim is to work out a superiority measure of the dominant-alternative people-vehicles laser-based algorithms deeming statistical tests by employing unfulfillment metrics (e.g., see TABLE 7 and Fig. 11) of algorithms. Errors of each selected algorithm have been considered using identical dataset (explicitly Galip et al. [7]), whereas the failure metrics have been referenced from the data analysis and the manuscript of each selected algorithm. To conduct statistical tests, we have used available statistical-software applications from University of Granada [44].
The main scope of this paper is focused on applications that seek to smart cities [45], [46], urban environment monitoring [47], [48], autonomous vehicles [49], [50], advanced driver assistance systems (so-called ADAS) [51], [52], robotic vision systems [53], [54], visual sensor systems [55], risk analysis [56], [57], intelligent traffic flow and analysis [6], [58]. This paper is designed as follows. Section II focusses on significance of camera-based and laser-based surveillance with privacy; Section III qualitatively compares the performance of laser-based and RGB camera-based systems; Section IV surveys briefly the state-of-the-art algorithms; Section V qualitatively discusses selected people-vehicles related algorithms; Section VI estimates ineffectualness metrics of those algorithms; Section VII makes superiority measure using statistical tests; Section VIII hints some future works and challenges; and Section IX concludes the paper.

II. JUXTAPOSITION OF TWO SURVEILLANCE SYSTEMS
Surveillance, crowd control, and privacy are three key things for crowd analysis [59]- [66]. The surveillance system should be smart. It should protect privacy. Surveillance plays a huge part in today's society with cameras all around us. Our regular lives are experiencing higher levels of security each day.
Roughly, surveillance systems of crowds and/or vehicles can be classified into two elite groups: (i) Camera-based surveillance and related privacy of people, and (ii) Laser-based surveillance and associated privacy of people.

A. SIGNIFICANCE OF CAMERA-BASED SURVEILLANCE
An early-warning camera system could anticipate dangerous situations as they arise when large crowds gather. Surveillance cameras (e.g., CCTV, PTZ, etc.) have, and will prevent many crimes. Nowadays, CCTV (closed circuit television) is used as a generic term for a variety of video surveillance technologies. Surveillance cameras keep our personal property safe. CCTV system protects against property theft and vandalism. It is very difficult to get away with stealing something if there are cameras filming all times. So, the thief will often get caught. CCTV system will catch the thief before, or during the process of committing the crime. The police can identify criminals recorded with cameras. Through surveillance cameras, the police can both prevent crimes from happening and can quickly solve criminal cases with material evidence. CCTV system may reduce fear of crime and increase public participation in public space. Other benefits, beyond a reduction in crime, would be accrued from a CCTV system, including aid to police investigations, provision of medical assistance, place management, and information gathering. Gips [67] and Hess [68] stated a trend toward local jurisdictions legislating CCTV use. For example, in Chicago and Milwaukee, bars and nightclubs are required to post surveillance cameras on their premises. Baltimore County has required all shopping centers to install CCTV. In El Cerrito, California, an ordinance has been proposed that would require 73 local businesses, including liquor stores, convenience stores, takeout restaurants, banks, shopping centers, check cashing establishments, pawnshops, and secondhand brokers and firearms dealers to install surveillance cameras at all structural entrances and exits to park areas, customer and employee parking areas, and entrances and exits to parking areas [67], [68]. Moreover, the National Violent Death Reporting System [69] shows that ''in the United States more than seven people per hour die a violent death''. Usually, CCTV system helps to reduce violence notably.
However, some people say that we should not have surveillance cameras in public places because of the violation of privacy. We should consider the impact of a CCTV system from a societal point of view. It has been suggested that ever-increasing surveillance can make the local environment a less pleasant place to live [70]. Benjamin Franklin (17 January 1706 -17 April 1790), one of the founding fathers of the United States, once said [71]: ''Those who would give up Essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety.'' This quote frequently comes up in the context of new technology and concerns about government surveillance. In the United States, privacy issues related to the use of CCTV surveillance are first and foremost in regard to the Fourth Amendment of the United 92396 VOLUME 9, 2021 States Constitution, which protects a citizen from unreasonable searches and seizures by law enforcement and other government agencies. Some possible solutions of this debt include privacy masking and laser-based monitoring. The privacy masking method concerns each surveillance camera with privacy masking capability can selectively block portions of the video image for the purpose of protecting privacy. For example, PTZ (pan-tilt-zoom) cameras may be used to monitor a parking lot adjacent to an apartment building with the images of the windows in the building masked. This is a feature of the system configuration (software or hardware) and can be very complex and costly. Besides, in spite of the primacies of employing cameras with variable (PTZ) or large (omnidirectional) fields of view, cameras have still restricted applicability on large-area surveillance as well as they are still prone to the occlusion problem due to their fixed optical centers [72].

B. SIGNIFICANCE OF LASER-BASED SURVEILLANCE
Although some problems of private life safekeeping can be solved by CCTV systems after making appropriate masks, there are several major problems remain: (a) Normally they take photos of the whole objects (except masked regions if applicable) and hence they need high speed processor for data processing; (b) If light illumination changes, then the quality of videos change dramatically; (c) Sometimes masking is extremely difficult, and hence CCTV cannot be placed in general everywhere to monitor activities of people.
On the other hand, laser-based monitoring systems administer the solution of these existing problems in good way. Laser scanners hand over information of objects (e.g., heads, hands, legs, trees, walls, vehicles, and etc.) such as distance and angle between device and echo-pulse width. They scan two dimensional (2D) area by sending beams and then each beam hits objects. They return distances with angles that the laser beams hit. We cannot see any laser beam with human eyes. Laser beam is not harmful for our eyes. Laser scanners do not record real world videos except scanned data points and henceforth, data processing becomes faster and easier. They also solve the problem of private life conservation. They can be placed everywhere to monitor activities of people and objects. For instance, we cannot monitor a heart disease affected person easily by putting a CCTV camera in his/her bath room. But we can monitor such a patient by using laser scanners. The most general argument proposed against installing CCTV cameras in bars and clubs pertains to privacy issues. Businesses cannot install CCTV cameras in explicitly private areas (e.g., restrooms). Many people feel that the entire bar or club should be deemed as private. Patrons claim that they go out to have a drink and relax, and they have trouble relaxing if CCTV camera is watching and recording. Even so, a system with laser scanners can solve this problem easily. Besides, a system with laser scanners is more convenient and efficient than that of cameras.

III. QUALITATIVE PERFORMANCE OF LASER-BASED AND RGB CAMERA-BASED SYSTEMS A. BLINDNESS AND GHOST OBJECTS
A smart vision or surveillance system may consist of either laser-based technology or RGB camera-based technology or a hybrid. Ideally, such system should detect and track all occurring events within its range. Basically, in two key ways such system can go wrong namely false negatives (so-called blindness) and false positives (also called ghost objects). In case of false negative, the system cannot detect an event or an item, but in reality that must be detected to keep away from any potential hazard. For example, with a false negative a self-driving car would be unable to safely avoid hitting an obstacle in its way. In case of false positive, the system sees an event or an item, but in reality that is totally absent there. For example, a false positive may cause a self-driving car to jab on the brakes or swerve. This is very annoying to its occupants. It may cause some possible injurious conditions of its occupants if they do not use seat belts. It may also cause accidents if the vehicle swerves dangerously or brakes very hard. Generally, these kinds of potential problems end up the safety and reliability of the system. If they occur very frequently, then its users give up on the system. Normally, a good system should almost never get any false negative or positive.

B. LOCALIZATION OF OBJECTS
The laser-based systems can work regardless of the natural illumination. They can accurately localize objects via their 3D reflections. Routinely, they require vast data processing in software to create images and identify objects. They are monochromatic and cannot differentiate objects based on color. Besides, for far-way objects the laser may have few beams intersecting the object, thus creating reliable detection problematic. Unlike laser-based systems, standard RGB camera-based systems can make detection decisions based on texture, shape, and color. RGB stereo cameras can be used to detect and estimate 3D positions of objects. Still, stereo cameras need extensive processing and repeatedly have problem for estimating depth if objects lack textural cues. The most existing models to calibrate depth and the relative pose between a depth camera and an RGB camera are not universally applicable for sundry RGB-D cameras [73]. Usually, the RGB-D camera has awkwardness in getting depth data of shiny and dark surfaces as IR rays reflected from these surfaces are weak or scattered. This fact results in lost pixels in a depth map [74]. In addition, RGB-IR cameras together suffer from three common problems namely pixel multiplexing, channel crosstalk, and chromatic aberrations [75]. An RGB camera-based system can be used to accurately localize objects in the image itself (e.g., find out bounding boxes and categorize objects) [76]. Even so, the resulting localization projected into 3D space is poor as compared to the laser-based system [76]. VOLUME 9, 2021 C. RELIABLE DETECTION Lasers are not fooled by shadows, bright sunlight, the oncoming lights of other sources, day, and night. The laser-based systems have been hailed for being able to see objects even in bad lighting conditions, but they may not always reliable. As a laser scanner sees only parts of the object currently facing the scanner, when the object moves it is usual to get different moving point clouds from the same object for detecting and tracking. This issue may lead to a significant degradation of tracking performance. Besides, due to the laser absorbtion by glass like surfaces or any occlusion, an object can be divided into few segments. This matter makes object detection and tracking much harder, specially when dealing with objects merging and tracking groups [77]. A defined shape of an object can keep down this problem, but that can face limitations when applied on others [78], [79]. For example, a defined geometric shape of an object (e.g., two dots [78] as a pedestrian) may be detected correctly from a pool of its shape-like objects, but it cannot work well when the shape is changed (e.g., three dots [78] as a car). Analogously, if we employ the motion of laser point clouds (e.g., [80]) to segment and track vehicles of various types, it does not work well for pedestrians due to the slowly-moving pedestrians which do not bestow enough motion cues. Wang [81] discussed an example that in case of a 2D laser scanner mounted on a moving platform, occlusion and viewpoint alternations give the appearance of dynamic behaviours even in a purely static scene. This confusion creates the reliable detection of the true dynamic objects arduous without giving high false alarm rate [81], [82]. Mertz et al. [83] suggested that a good prediction algorithm (e.g., Kalman filter or particle filter) can solve any temporarily occlusion problem of an object. On the other hand, pedestrian detection at night using an RGB camera provides with insufficient information [84]. Numerous surveillance systems take in applications of autonomous vehicles, headcounters, search and rescue operations. Yet, these systems freeze themselves in night surveillance due to the use of RGB cameras [85].

D. COST AND PRIVACY CONCERNS
Interference and jamming are two potential problems with laser-based systems. For example, in a smart city application if a large number of autonomous vehicles would generate laser beams simultaneously, it could cause interference and potentially blind the vehicles. In consequence, manufacturers will need an extra effort to prevent this latent interference. In addition, RGB camera-based systems are far better suited for reading street signs and interpreting colors. The laser-based systems are already getting cheap. Yet and setting-aside, RGB cameras are much less expensive than laser scanners. The laser-based systems are currently very bulky as compared to the RGB camera-based systems. For instance, to capture and share images and video of a crash or other safety related incident with the automaker, the RGB camera-based systems as implemented on current Tesla vehicles are almost invisible. Nonetheless, Tesla's in-car cameras have heightened privacy concerns [86].

E. TECHNOLOGY FUSION
One feasible solution to the debate for the employment of laser-based and RGB camera-based systems is to combine both technologies. Such hybrid systems would cut back on privacy concerns to some degree. To a certain extent, such hybrid systems would be helpful for specialized identification of things including birds, traffic lights, traffic cones, and road debris. For example, if a flock of birds will appear in the way of a self-driving car, the car will not be immediately slowed down. The laser will see the birds and the RGB camera will give extra information about what to do.
Recently, hybrid systems that cooperatively use tracking along with semantics and soft computing have been successfully proposed to support the data explanation and help object detection and event interpretation. For examples, Cavaliere et al. [87] built ontological knowledge on the tracking and environmental data to support the comprehension of the video scenes, and Gomez-Romero et al. [88] improved tracking results by exploiting ontology reasoning on contextual information. Bozorgi et al. [89] integrated data obtained from 2D laser and 3D camera for tracking human trajectory. Zhao et al. [78] integrated a video camera with their LMS291 laser scanner to evaluate their processing results for tracking and classifying moving objects. Azim et al. [90] performed detection and classification of moving objects from 3D laser data. They used images from their camera to manually label the data for training the classifiers. Mertz et al. [83] applied both laser scanners and video cameras for moving object detection. Their employed video cameras helped a lot to analyze collected data. Even so, those cameras were not involved in creating warnings (e.g., for the bus driver). Besides, sometimes a malfunction of their retraction mechanism misaligned the laser scanner and resulted in hundreds of false alarms. Kim et al. [91] installed an IBEO LUX2010 and a camera on a Kia K900 car for object segmentation. They aimed to compensate the drawbacks of the laser scanner and also improve the recognition accuracy. On the average, they confronted a failure rate of about 20%.
However, hybrid systems expect further efforts to reach their high level of applicabilities. This is widely due to their algorithmic assumptions, calculation of stable features, lofty computational cost, higher hardware requirements, reasoning about the geometry of occlusions, and fusing data from multiple sensors.

F. A DIFFICULT CHOICE BETWEEN TWO OPTIONS
It is interesting to note that the most used modalities, both laser scanners and RGB cameras, are two completely contrasting sensors with their own strengths and weaknesses. For example, laser-based cameras play an important role for obstacle detection and tracking, but they are very sensitive to heavy rain, snow, and fog; whereas RGB cameras are often used to get a semantic interpretation of the scene, but they are immensely sensitive to ambient light, night, day, clouds, shadows, sun, and sunlight. These issues can cause significantly large potential false positives and/or false negatives in both laser-based and RGB camera-based systems with respect to their associated ground truths. Subsequently, the installed algorithmic performances (e.g., time efficiency, space efficiency, complexity theory, function dominance, and asymptotic dominance) of both systems become the common influential factors. Both systems can use artificial intelligence techniques to analyze data with a high level of accuracy. As the employed algorithms get better, the obtained results show high accuracy and precision in object detection and tracking. For example, with a smarter algorithm a self-driving car can make better decisions to spell the difference between an accident and safe driving. Based on the complexity of the employed algorithms, such decision may be made faster in the laser-based system as compared to the RGB camera-based system. The car with surrounding information every moment laser-based system requires huge data processing on-board software to create 3D maps and identify objects. This provides a 360 degree view that helps the car-drive in any type of condition. On the other hand, RGB camera-based systems are identical to how our brain processes the stereo vision from our eyes for calculating distance and location. Explicitly, RGB camera-based systems should first ingest the images and then analyze those images to calculate the distance and speed of objects, demanding far more computational power. Some smart surveillance systems are based on RGB cameras, which can only cover a small area; however, due to the occlusion occurred by their fixed optical centers [72], it is very difficult for them to work robustly in real world exceptionally crowded scenarios including subway stations, public squares, and intersections [92]. Unlike a camera or a radar, a laser scanner can be used as the sole sensor for some systems (e.g., ADAS) without being combined with other sensors [91].
One of the key supremacies of laser is its accuracy and precision. Laser is extremely accurate as compared to RGB cameras. In fact, RGB cameras provide all visual images, and they do not rely on ranging and detection as the laser does. Anyhow, critics say that RGB cameras still cannot see well enough to avoid hazard, mainly when weather conditions are demanded. RGB cameras should be able to exactly see in any type of condition as a human does for avoiding remarkably huge false positives and false negatives. In general, laser-based algorithms have been proposed to avoid the limited range and field of view of video cameras. Besides, when the issue of privacy protection comes into the spotlight, the laser-based algorithms gain an extra credit over their alternative RGB camera-based algorithms. Therefore, the laser-based algorithms should be used instead of the common video recording RGB cameras. In the same vein, nowadays the leading automotive manufacturers (e.g., Waymo, Uber, and Toyota) are implementing laser-based systems in their vehicles [93]. In the vein of defensive countermeasures, laser-based technology revolutionized the entire paradigm of destructive weapons by starting a wider range of airborne and ground-based weapons with skills to precisely carry large-scale destruction to electronic systems, combat troops, optical devices, high-speed approaching missiles, and even physical installations [94].

IV. REVIEW OF STATE-OF-THE-ART LASER-BASED ALGORITHMS
Laser scanners are mostly eye-safe, compact, light-weighted, and with full-circle fields of view. Mobile laser scanning (MLS) systems can be mounted on vehicles, trolleys, boats, robots, and backpacks [43], [95]. The main components of such system include 3D laser scanners, global navigation satellite system, inertial measurement unit, and cameras. The SICK laser range measurement devices send a laser beam every 0.25 • within their respective scanning planes, which yielded to 761 measurements in one time frame since they scan between −5 • and +185 • [33], [96]. The sensors of Velodyne have a range of up to 300 meters. They can be used for immediate object detection without additional sensor fusion [97]. The IBEO LUX laser scanner is a unique full-range sensor applied for object detection and classification to support ADAS applications [98]. Mostly, Hokuyo laser scanners are used in automated guided vehicle (AGV), unmanned aerial vehicle (UAV), and mobile robot applications [36].
However, the existing miscellaneous algorithms for detecting and/or tracking objects from laser scanner data points can be roughly categorized into four groups as shown in Fig. 2. TABLEs 1, 2, 3, and 4 summarize them. The common abbreviation of N/A in those tables elaborates to either notavailable or no-answer.

V. PROMINENT PEOPLE-VEHICLES RELATED ALGORITHMS A. SELECTED ALGORITHMS AND FLOW DIAGRAMS
Detection and tracking of moving vehicles with a laser scanner is interesting for autonomous driving applications. Yet, people-vehicles detecting and/or tracking algorithms are more interesting in wide range of surveillance than solely either people or vehicles tracking algorithms. In this subsection, we have focused on people-vehicles related algorithms in TABLE 1 rather than TABLE 2  or TABLE 3, or TABLE 4. However, all algorithms in TABLE 1 have not taken into account due to mainly three problems: (i) Accuracy and precision [173] of algorithms are not explicitly provided by the authors of associated manuscripts (e.g., Wang et al. [100], Lindstrom et al. [103], Asvadi et al. [105], and Kanaki et al. [106]); (ii) Implementation difficulties (e.g., Lehtomaki et al. [104]); and (iii) The computational complexity of N × N post-hoc nonparametric procedures to calculate p-values will go with a comparatively higher order polynomial for the augmentation of high number of algorithms and datasets. As a result, we have chosen key eight algorithms related to the people-vehicles from TABLE 1 for our results analysis and superiority measure. Fig. 3 compares their simplified flow diagrams. It is noted that VOLUME 9, 2021  interesting readers would get the detailed of each algorithm in respective reference. As a sample, Fig. 4 views the graphical abstract of the algorithm of Sharif [8]; where (a) points to the laser scanner of LMS-511 and a real world video frame; (b) depicts the obtained blobs (as colored in blue) for all laser scanned data points per frame; (c) denotes the foreground data points as colored in red; (d) hints the extracted movers as marked by white points and the L-shaped structure belongs to a vehicle, while others are most likely pedestrians; (e) displays recognized record of SVM; and (f) shows trajectories of movers for several frames.

B. QUALITATIVE DESCRIPTION OF ALGORITHMS
Galip et al. [7] used Hungarian method [2], [174]- [176] and Kalman filter [177] to get trajectories of movers from their own laser scanned dataset. But detection of movers was done based on various thresholds. Estimation of multiple thresholds is often a daunting task. Azim et al. [90] suggested an algorithm to detect moving objects (e.g., bus, car, bike, and pedestrian). In spite of this, their algorithm cannot separate individual pedestrians walking together in a group. Trees, light poles, and street signs were often wrongly detected as moving objects. To overcome the threshold estimation problem of Galip et al. [7], Sharif et al. [99] relied on supervised learning based methods (e.g., SVM) along with Hungarian method and Kalman filter to recognize and get better trajectories of movers from the dataset of Galip et al. [7]. Zhao et al. [78] tracked and classified moving objects at intersection using spatially and temporally processing on laser scanned data points. Moving objects are classified into  pedestrians (0-axis object), bicycles (1-axis object), vehicles (2-axis object). They claimed that the performance of their algorithm reached a successful ratio of above 95% for tracking and classification on a 10-minute laser data at an intersection. Through their experiment, it was reported that the classification results of 1-axis objects are rather sensitive to the definition of the likelihood measure. This problem should be solved through further study. There are some reported failure cases. For example, when heavy vehicles run across the intersection and pedestrians wait for signal blocked the measurement to another vehicle. Mertz et al. [83] detected and tracked successfully several movers from laser scanned data points. Notwithstanding, the main errors of their algorithm include over-segmentation and under-segmentation, association problems, false and missed detections. Their algorithm fails to detect a target if it is occluded, or if it has poor reflectivity, or if objects are very close to each other and it is not clear whether to segment the data as one or more objects.
Both Galip et al. [7] and Sharif et al. [99] used Kalman filter and identical data set of Galip et al. [7]. Kalman filter is a linear quadratic estimator. It may be the best to estimate linear system having Gaussian noise. It has low computational requirements. But if the system does not suit nicely into a linear model or if the sensor uncertainty [4] does not fit with Gaussian model, then performance degradation occurs drastically. If the linearity or Gaussian conditions do not exist, its variants (e.g., Extended Kalman filter, Unscented Kalman filter) can be used. However, those variants cannot give a reasonable estimate for highly nonlinear and non-Gaussian problems. Besides, movers data points of laser scanners behave very differently in some regions than others. In such case, Kalman filter is not a good choice. The particle filter [178] is a better solution. Nonetheless, particle filter gets exponentially worse if a model has many state variables. Even so, a particle filter can handle almost any kind of model by digitizing the underlaying problem into separate particles. Each particle is one possible state of the model. A sufficiently large number of particles can handle any kind of probability distribution. Inspired by these facts, Sharif [8] proposed SVM along with Hungarian method and particle filter to get trajectories of movers. On the same dataset (e.g., Galip et al. [7]), the algorithm of Sharif [8] reported the best minimization of error rates.
Wang et al. [82] formulated a unified framework that jointly estimated the pose of the sensor with the focus on detection and tracking of moving objects. They applied EMST-EGBIS (Euclidean Minimum Spanning Tree -Efficient Graph Based Image Segmentation) clustering technique to produce perceptually coherent clusters. Only instantaneously moving objects (no parked or no instantaneously stationary vehicles) can be detected and tracked by their system. Two modes of failure can be reported in their algorithm. A recoverable case, where despite initial tracking failure, their system can recover from the incorrect states. An unrecoverable case, where an object is erroneously tracked or missed until it moves out of the field-of-view of the sensor. If an unexpected object is observed or if the object class would not be detected with confidence, then the system can fall back to model-free tracking. Kim et al. [91] separated objects using techniques of segmentation and outlier elimination. Their algorithm worked some how good under complex urban road conditions. Still, when outliers happen (e.g., during raining, car goes uphill, etc.) frequently, the algorithm can fail in eliminating them. The inlier survival ratio is a sensitive factor of their algorithm. Because if an inlier is accidently removed by the algorithm, then it will lead to a serious accident.

VI. ESTIMATION OF INEFFECTUALNESS METRICS A. LABELED DATASET
Galip et al. [7] used Ethernet cable for the connection between laser scanners (both LMS-511 and LD-MRS) and computer. Data were captured by SOPAS Engineering Tool, which is a program developed by SICK AG (Aktiengesellschaft). There were more than one laser scanners, thus those coordinates of points were changed by taking a laser scanner as reference. Afterwards, those distances were converted into X-Y coordinates [1] as well as their timestamps using MATLAB. At the end, Galip et al. [7] employed a total of 550 ground truth images to conduct their experiment. A total of 258 pedestrians and 292 vehicles were leveled properly.

B. CODING AND PARAMETERS
Algorithms were implemented by using MATLAB. An 8 GB RAM HP 64-bit workstation with an Intel Core i5-7200U CPU utilizing Windows 10 Pro was used throughout the experimentation to evaluate various algorithms. Standard parameters of each algorithms, if applicable, were employed. For example, in case of Sharif [8], randomly 25 pedestrians and 25 vehicles were selected for training and the rests for testing purposes. Polynomial kernel with order 3, Gaussian radial basis function kernel with a scaling factor of 1, and multilayer perceptron kernel with scale [1 1] were deemed.

C. GROUND TRUTHS AND ALGORITHMIC OUTPUTS
The Listing 1 demonstrates sample tracking output of each algorithm for pedestrians (Ped) and vehicles (Veh) with respect to ground truths (GrdTrh) of each frame from the first 500 frames of Galip et al. [7] dataset. It describes the ground VOLUME 9, 2021 truths and the outputs of a frame for each algorithm starting from the line 3 to the line 102 by taking a multiple of 5 frames (i.e., frame 1 at line 3, frame 5 at line 4, frame 10 at line 5, frame 15 at line 6, etc.). Thus, we may analyze and reduce the result from 500 frames to (500/5 = ) 100 frames without loosing significant performance. The data of the Listing 1 have been depicted in Fig. 5 for pedestrians and Fig. 6 for vehicles. Basically, Figs. 5 and 6 portrait the outcomes of the mainstream laser-based people-vehicles algorithms on an identical ground. Seemingly, these algorithms failed to correctly identify a number of objects as compared to ground truth. The main reasons for this shortcoming include that the existing laser-based algorithms usually use segmentation of laser point clouds or use bounding-boxes of laser segments to represent objects. It is noticeable that average algorithmic performance of vehicles detection and/or tracking is better than that of pedestrians. This might be a reason that vehicles are rigid bodies and cannot be mixed up as human does.  5 describes the qualitative and quantitative analysis of data in Listing 1, where number of true positive movers (t p ), number of false positive movers (f p ), number of false negative movers (f n ), number of true negative movers (t n ) with t n = 0, recall rate (R r ) with R r = t p /(t p + f n ), precision rate (P r ) with P r = t p /(t p + f p ), accuracy (ACC) with ACC = (t p + t n )/(t p + f p + f n + t n ), and the area under the receiver operating characteristic curve (AUC) with trapezoidal numerical integration method [179]. The values of R r , P r , ACC, AUC for pedestrians and vehicles are   many laser-based applications including smart cities, ADAS, and intelligent traffic analysis. Nonetheless, future developments would take into account their existing algorithmic assumptions and other shortcomings to propose smarter algorithms.

D. ESTIMATION OF ALGORITHMIC ERRORS AND FAILURES
To estimate conventional errors from Figs. 5 and 6, we have performed several statistical measures, e.g., RMSE ⇒ Root Mean Squared Error, CV(RMSE) ⇒ Coefficient of variation of the root mean squared error, MAE ⇒ Mean Absolute Error, and MAPE ⇒ Mean Absolute Percentage Error. Their formulae are formulated in Eqs. 1 and 2 as: 2 100 ; where G, A, and i indicate ground truth, algorithmic detection, and number of frame, respectively.   TABLE 6. The failures of R r , P rate , ACC, and AUC achievements are defined as: F r r = 1−(pR r + vR r )/2, F p r = 1 − (pP r + vP r )/2, F acc = 1 − (pA c + vA c )/2, and F auc = 1 − (pA u + vA u )/2, respectively using data in TABLE 5. The failure of achievement of an algorithm (F aa ) is defined by dint of (3), as shown at the bottom of the page 17.
For example, the accuracy of the algorithm of Azim et al. [90] is 86% (as the authors claimed), thus its F aa will be (100% − 86%)/100 = 0.14 and so on. From data in TABLE 7, it is extremely difficult to say accurately which algorithm outperforms its alternative.

VII. SUPERIORITY MEASURE USING STATISTICAL TESTS
A. MULTIPLE COMPARISON WITH STATISTICAL TESTS Fig. 11 depicts performance evaluation of various algorithms deeming the numerical values of the ineffectualness metrics from TABLE 7. From this graph, it is extremely hard to rank each algorithm. How would it be possible to demonstrate that one algorithm is superior to its alternative algorithms?
Statistically, it is possible to show that one algorithm is better than its alternatives.
Usually, multiple comparisons with a control algorithm can be employed to statistically demonstrate that one algorithm is better than its alternatives in areas related to computer science and engineering [180]. The key concept of applying the non-parametric tests [181] includes that they can deal with probabilistic and non-probabilistic methods without imposing any circumscription. We have considered data from TABLE 7 to conduct statistical tests for multiple comparisons along with a set of post-hoc procedures to compare a control algorithm with others (i.e., 1×N comparisons) and to perform all possible pairwise comparisons (i.e., N × N comparisons). For these purposes, we have used the open source statistical software applications from University of Granada [44].
To conduct a statistical test of significance, the p-value of test statistic and the level of significance α play an important role. Both p-value and α might be misdirected. Because both of them are indeed probabilities, i.e., values between zero and one. The p-value states directly how extreme that statistic should be by using data from TABLE 7. The α gives evidence of how extreme observed results should be to reject the null hypothesis of a significance test. A smaller p-value expresses briefly that the observed sample is more unlikely. In statistical significance testing, the p-value is the probability of obtaining a test statistic result minimum as drastic as the one that was in effect observed by taking into account the null hypothesis is not false [182]. Flacks of p-values say that the circumstances employed to determine statistical significance is based on any option of level (e.g., p = 0.05) [183]. If a significance testing is applied to hypotheses that are known to be not-true in advance, then a non-significant result will plainly cogitate a deficient sample size. Any p-value remains in a certain state exclusively on the information obtained from a fixed experiment.
Friedman test [184] and its derivatives (e.g., Iman-Davenport test [185]) are usually referred to as one of the most well-known nonparametric tests for multiple comparisons. Consequently, we have performed the Friedman test [184]. An available characteristics of the Friedman test [184] is that it takes measures in preparation for ranking of a set of algorithms with performance in descending order. Notwithstanding, it can solely inform us about the appearance of differences among all samples of results under comparison. As a result, its alternatives e.g., Friedman's aligned rank test [186] and Quade test [187] can give us further information. Thus, we have performed both Friedman's aligned rank test [186] and Quade test [187]. They express opposition through rankings. They would provide a better results based on the features of a given experimental study. After rejecting null-hypotheses, we have continued to post-hoc procedures to find the special pairs of algorithms which give idiosyncrasies.
In the case of 1 × N comparisons, the post-hoc procedures make up of Bonferroni-Dunn's [188], Holm's [189], Hochberg's [190], Hommel's [191], [192], Holland's [193], Rom's [194], Finner's [195], and Li's [196], procedures; whereas in the case of N × N comparisons, they consist of Nemenyi's [197], Shaffer's [198], and Bergmann-Hommel's [199] procedures. In the case of Bonferroni-Dunn's procedure [188], the performance of two algorithms is substantially divergent if the corresponding mean of rankings is at least as large as its discriminating divergence. A better one is Holm's procedure [189]. It examines in a sequential manner, where all hypotheses ordered based on their p-values from inferior to superior. All hypotheses for which p-value is less than α divided by the number of algorithms minus the number of a successive step are rejected. All hypotheses having larger p-values are upheld. Holm's procedure [189] adjusts α in a step-down manner. Similarly, both Holland's [193] and Finner's [195] procedures adjust α in a step-down method. But the Hochberg's procedure [190] works in the opposite direction of Holland's procedure [193]. It compares the largest p-value with α, the next largest with α/2, and so on, until it encounters a hypothesis that can be rejected. The Rom [194] suggested   Table 5.
a modification to Hochberg's step-up procedure [190] to intensify its power. In turn, Li [196] recommended a two-step rejection procedure.
Hochberg [190], Holland [193], Rom [194], Finner [195], Li [196], Shaffer [198], and Bergamnn et al. [199] tests as well as adjusted p-values. The Nemenyi's procedure [197] is the easiest one for all possible pairwise comparisons. It deliberates that the value of α is adjusted in a single step by dividing it only by the number of comparisons performed. It is easy but less practical. The Shaffer's static routine [198] adopts the Holm's step-down method [189]. At a given stage, it rejects a hypothesis if the p-value is less than α divided by the maximum number of hypotheses VOLUME 9, 2021 FIGURE 10. Plotting of errors occurred for vehicles in Table 6.
which can be true provided that all previous hypotheses are false. The Bergmann et al.'s [199] procedure provides the best performance, but it is very sophisticated and computationally expensive. It consists of finding all the possible exhaustive sets of hypotheses for a certain comparison and all elementary hypotheses which cannot be rejected. The details of the procedure are described in Bergmann et al. [199], Garcia et al. [200], and the rapid algorithm to conduct this test in demonstrated in Hommel et al. [192].

B. AVERAGE RANKING OF ALGORITHMS
To achieve the test results, Friedman [184], Friedman's aligned rank test [186], and Quade [187] nonparametric statistical tests are applied to the obtained results of eight algorithms in TABLE 7. Explicitly, statistical tests are applied to a matrix of dimension 8 × 9, where 8 belongs to the number of algorithms and 9 corresponds to 9 parameters (as 9 datasets while applied to the statistical software environment [44]) of each algorithm. TABLE 8 shows the average ranking computed by using Friedman [184], Friedman's aligned rank test [186], and Quade [187] nonparametric statistical tests. The aim to apply Friedman [184], Friedman's aligned rank test [186], and Quade [187] nonparametric tests is to determine whether there are significant differences among various algorithms considering over the data in TABLE 7. These tests provide ranking of the algorithms for each individual dataset, i.e., the best performing algorithm receives the highest rank of 1, the second best algorithm gets the rank of 2, and so on. The mathematical equations and further explanation of the nonparametric procedures of Friedman [184], Friedman's aligned rank test [186], and Quade [187] can be found in Quade [187] and Westfall et al. [201]. Fig. 12 makes a visualization of the average rankings using the data in TABLE 8. From Fig. 12, it is noticeable that the algorithm of Sharif [8] became the best performing one, with the longest bars of 0.6428, 0.0783, and 0.5844 for Friedman test [184], Friedman's aligned rank test [186], and Quade test [187], respectively. This hints that algorithm of Sharif [8] gives great performance for the solution of underlaying problems of detecting and tracking both pedestrians and vehicles from laser scanned data points. Friedman [184] statistic considered reduction performance (distributed according to chi-square with 7 degrees of freedom) of 40.861111. Aligned Friedman [186] statistic considered reduction performance (distributed according to F-distribution with 7 and 56 degrees of freedom) of 34.336679. Iman-Davenport [185] statistic considered reduction performance (distributed according to F-distribution with 7 and 56 degrees of freedom) 100% − (Accuracy of the corresponding algorithm in TABLE 1) 100 .

VIII. FUTURE WORKS AND CHALLENGES
Laser-based algorithms have been emerged as the alternatives of camera-based algorithms. The solution of privacy problem of people has been embedded into the laser-based detection and tracking algorithms, whereas camera-based algorithms need special masking to maintain privacy. In spite of those facts, one of the major challenges working with laser scanners is the difficulty of recognizing any kind of objects using only the relatively low information that essentially the laser scanners provide. From TABLEs 1, 2, 3, and 4 as well as associated discussion, we can conclude that the detection of objects has been done by clustering laser scanned data points in depth images or 3D laser scans. Future work would go beyond this behavior by proposing news algorithms using other technique rather than clustering. To propose such algorithms is a real challenge for the laser-vision research community.
The existing lased-based tracking algorithms take the brimming benefits of Kalman filter along with its updated versions (e.g., extended and unscented). Accordingly, tracking of movers using Kalman filter has been performed about five times more than that of particle filter in the literature. Particle filter has taken the second position among all filters applied in lased-based tracking algorithms. This is due to the fact that particle filter is generally more computationally expensive than Kalman filter. Even so, particle filter can be used to solve non-Gaussian related problems in a better way. Besides, the most common variants of Kalman filters cannot provide a level-headed estimation for highly nonlinear and non-Gaussian problems. In consequence, additional particle filter based algorithms would be proposed in the long run.
Heretofore, we have performed various nonparametric statistical tests for eight key algorithms of detecting and/or tracking both people and vehicles from TABLE 1. But we have not performed any statistical tests for the only-people tracking algorithms in TABLE 2, the only-vehicle tracking algorithms in TABLE 3, and the diverse-object detection or tracking algorithms in TABLE 4. Therefore, a key question still remains for these tables. Which algorithm would be superior to its alternative algorithms? Our incapacities behind this unworked out problem include mainly the unavailability of common datasets and a lesser extent implementation difficulties of multitude algorithms. Besides, it is not possible to make statistical tests with a single parameter (e.g., accuracy). Different authors used their own defined and suitable datasets with diverse sizes and conditions. As a result, the obtained accuracy of their own algorithms would vary widely based on datasets. An available common data set can help to judge algorithms on a common ground to measure algorithmic performance. Unfortunately, there existed no such datasets up until now. In general, it is a challenging task to build common datasets for test many algorithms on the identical basis. Future work would predominantly highlight this issue. In addition, carefully optimized code can always give a better performance [203]. But the codes of implemented algorithms are not optimized. Consequently, code would be optimized by using manual and software optimization techniques [204] to obtain an optimal execution time of each algorithm.

IX. CONCLUSION
We provided an overview of methods to classify objects using laser scanners instead of common video recording RGB cameras. We pointed up a special feature of laser scanners which cannot see or identify identifiable features of objects and humans. Therefore, laser-based algorithms inherently provide privacy protection, whereas RGB camera-based algorithms await privacy masking. Privacy protection should not make a hindrance for uncovering of objects or people under specific circumstances. It has been suggested that laser-based algorithms should be used instead of common RGB cameras. Kalman filter has been applied widely in laser-based algorithms due to its lower computational cost. We noticed that a common characteristic of the existing laser-based algorithms is that the detection of objects has been performed by clustering laser scanned data clouds in depth images or 3D laser scans. We conducted a quite thorough and exhaustive review of the laser-based detection and tracking algorithms. Covering a variety of solution methods, we also highlighted the comparative strengths and weaknesses of existing approaches. Furthermore, the conducted rigorous statistical analysis helped boosting confidence in the practical results and confirmed their statistical significance. This analysis also helped interpreting the insights in a better way and shed some light on why certain algorithms performed better than others. Future work would widely include proposing new smarter algorithms for laser-based intelligent surveillance and datasets for statistical tests.

ACKNOWLEDGMENT
The author would thank to the anonymous reviewers for their appreciative and constructive comments on the draft of this article.