Evaluation of Thermal Imaging on Embedded GPU Platforms for Application in Vehicular Assistance Systems

This study is focused on evaluating the real-time performance of thermal object detection for smart and safe vehicular systems by deploying the trained networks on GPU&single-board EDGE-GPU computing platforms for onboard automotive sensor suite testing. A novel large-scale thermal dataset comprising of>35,000 distinct frames is acquired, processed, and open-sourced in challenging weather and environmental scenarios. The dataset is a recorded from lost-cost yet effective uncooled LWIR thermal camera, mounted stand-alone and on an electric vehicle to minimize mechanical vibrations. State-of-the-art YOLO-V5 networks variants are trained using four different public datasets as well newly acquired local dataset for optimal generalization of DNN by employing SGD optimizer. The effectiveness of trained networks is validated on extensive test data using various quantitative metrics which include precision, recall curve, mean average precision, and frames per second. The smaller network variant of YOLO is further optimized using TensorRT inference accelerator to explicitly boost the frames per second rate. Optimized network engine increases the frames per second rate by 3.5 times when testing on low power edge devices thus achieving 11 fps on Nvidia Jetson Nano and 60 fps on Nvidia Xavier NX development boards.


I. INTRODUCTION
hermal imaging is the digital interpretation of the infrared radiations emitted from the object.Thermal imaging cameras with microbolometer focal plane arrays (FPA) is a type of uncooled detector that provides lowcost solutions for acquiring thermal images in different weather and environmental conditions.These cameras when integrated with AI-based imaging pipelines can be used for various realworld applications.In this work, the core focus is to design an intelligent thermal object detection-based video analysis system for automotive sensor suite application that should be effective in all light conditions thus enabling safe and more reliable road journeys.Unlike other video solutions such as visible imaging which mainly relies on reflected light thus having the greater chances of being blocked by visual impediments, thermal imaging does not require any external lighting conditions to capture quality images and it can see through visual obscurants such as dust, light fog, smoke, or other such occlusions.Moreover, the integration of AI-based thermal imaging systems can provide us with a multitude of advantages from better analytics with fewer false alarms to increased coverage, provide redundancy and, higher return on investment.
In this research work, we have focused on utilizing thermal data for designing efficient AI-based object detection and classification pipeline for Advance Driver-Assistance Systems.Such type of thermal imaging-based forward sensing (F-sense) system is useful in providing enhanced safety and security features thus enabling the driver to better scrutinize the complete road-side environment.For this purpose, we have used a state-of-the-art (SoA) end-to-end deep learning framework YOLO-V5 on thermal data.In the first phase, a novel thermal dataset is acquired for training and validation purposes of different network variants of YOLO-V5.The data is captured using a prototype low-cost uncooled LWIR thermal camera specifically designed under the ECSEL Heliaus research project [32].The raw thermal data is processed using shutterless camera calibration, automatic gain control, badpixel removal, and temporal denoising methods.
Furthermore, the trained network variants are deployed and tested on two state-of-the-art embedded GPU platforms, which include NVIDIA Jetson nano [23] and Nvidia Jetson Xavier NX [25].Thus, studying the extensive real-time and on-board feasibility in terms of various quantitative metrics, inference time, FPS, and hardware sensor temperatures.
The core contributions of the proposed research work are summarized below: • Preparation and annotation of a large open-access dataset of thermal images captured in different weather and environmental conditions.• A detailed comparative evaluation of SoA object detection based on a modified YOLO-V5 network, fine-tuned for thermal images using this newly acquired dataset.• Model optimization using TensorRT inference accelerator to implement a fast inference network on SoA embedded GPU boards (Jetson, Xavier) with comparative evaluations.

II. BACKGROUND
ADAS (Advanced Driver Assistance Systems) are classified as AI-based intelligent systems integrated with core vehicular systems to assist the driver by providing a wide range of digital features for safe and reliable road journeys.Such type of system is designed by employing an array of electronic sensors and optical mixtures such as different types of cameras to identify surrounding impediments, driver faults, and reacts automatically.
The second part of this section will mainly summarize the existing/ published thermal datasets along with their respective attributes.These datasets can be effectively used for training and testing the machine learning algorithms for object detection in thermal spectrum for ADAS.The complete dataset details are provided in Table I.

A. Related Literature
We can find numerous studies regarding the implementation of object detection algorithms using AI based conventional machine learning as well as deep learning algorithms.Such type of optical imaging-based systems system can be deployed and effectively used as forward sensing methods for ADAS.Advanced Driver-Assistance Systems (ADAS) is an active area of research that seeks to make road trips more safe and secure.Real time object detection plays a critical role to warn the driver thus allowing them to make timely decisions [8].Ziyatdinov et al [8] proposed an automated system to detect road signs.This method uses the GTSRB dataset [20] to train on conventional machine learning algorithms which include SVM, KNN, and Decision Trees classifier.The results proved that SVM and Knearest neighbour (k-NN) outperforms all other classifiers.Autonomous cars on the road require the abilities to consistently perceive and comprehend their surroundings [9].Oliver et al [9] presented a procedure to use Bernoulli particle filter, which is suitable for object identification because it can handle a wide range of sensor measurements as well as object appearance-disappearance.Gang Yan et al [10] proposed a novel method to use HOG to extract features and AdaBoost and SVM classifiers to detect vehicles in real-time.The histogram of oriented gradients (HOG) is a feature extraction technique used for object detection in the domain of computer vision and machine learning.The study concluded that the AdaBoost classification technique performed slightly better than SVM since it uses the ensemble method.Authors in [11], proposed another approach to detect vehicles on road using HOG filters to again extract features from the frames and then classify them using support vector machines and decision tree classification algorithms.Furthermore, SVM achieved 93.75% accuracy, which outperformed decision tree accuracy on classifying the vehicles.These are some of the conventional machine learning object detection techniques used for driver assistance system till date.The main drawback of traditional machine learning techniques is that the features are extracted and predefined prior to training and testing of the algorithms.When dealing with high-dimensional data, and with many classes conventional machine learning techniques are often ineffective [21].
Deep learning approaches have emerged as more reliable and effective solutions than these classic approaches.There are many state-of-the-art pre-trained deep learning classifiers and object detection models which can be retrained and rapidly deployed for designing efficient forward sensing algorithms [22].YOLO (you only look once) object classifier provides sufficient performance to operate at real-time speeds on conventional video data without compromising the overall detector precision [15].Veta et al [12] presented a technique for detecting objects at a distance by employing YOLO on lowquality thermal images.Another research [13] focused on pedestrian detection in thermal images using the histogram of gradient (HOG) and YOLO methods on FLIR [7] dataset and computed performance with a 70% accuracy on test data using the intersection over union technique.Further, Rumi et al [14] proposed a real-time human detection technique using YOLO-v3 on KAIST [5] thermal dataset, achieving 95.5% average precision on test data.Authors in [16] proposed a human detection system using YOLO object detector.The authors used their custom dataset recorded in different weather conditions using FLIR Therma-CAM P10 thermal camera.
Focusing on road-side objects, authors in [17] used YOLO-v2 object detection model to enhance the recognition of tiny vehicle objects by combining low-level and high-level features of the image.In [18], the authors proposed a deep learningbased vehicle occupancy detection system in a parking lot using a thermal camera.In this study authors had established that YOLO, Yolo-Conv, GoogleNet, and ResNet18 are computationally more efficient, take less processing time, and are suitable for real-time object detection.In one of the most recent studies [24], the efficacy of typical state-of-the-art object

B. Object Detection on Edge Devices
AI on edge devices benefits us in various methods such that it speeds up decision-making, makes data processing more reliable, enhances user experience with hyper-personalization, and cuts down the costs.While machine learning models have shown immense strength in diversified consumer electronic applications, the increased prevalence of AI on edge has contributed to the growth of special-purpose embedded boards for various applications.Such type of embedded boards can achieve AI inference at higher frames per second (fps) and low power usage.Some of these board includes Nvidia Jetson Nano, Nvidia Xavier, Google Coral, AWS DeepLens, and Intel AI-Stick.Authors in [26][27] proposed a raspberry pi-based edge computing system to detect thermal objects.Sen Cao et al [28] developed a roadside object detector using KITTI dataset [29] by training an efficient and lightweight neural network on Nvidia Jetson TX2 embedded GPU [28].
In another study [30] authors proposed deep learning-based smart task scheduling for self-driving vehicles.This task management module was implemented on multicore SoCs (Odroid Xu4 and Nvidia Jetson).
The overall goal of this study is to analyse the real-time performance feasibility of Thermal-YOLO object detector by deploying on edge devices.Different network variants of yolo-v5 framework are trained and fine-tuned on thermal image data and implemented on the Nvidia Jetson Nano [23] and Nvidia Jetson Xavier NX [25].These two platforms, although from the same manufacturer provide very different levels of performance and may be regarded as close to current SoA in terms of performance for embedded neural inference algorithms.

III. THERMAL DATA ACQUISITION AT SCALE FOR ADAS
This section will mainly cover the thermal data collection process using the LWIR prototype thermal imaging camera.The overall data is consisting of more than 35K distinct thermal frames acquired in different weather and environmental conditions.The data collection process includes shutterless camera calibration and thermal data processing [36], using the Lynred Display Kit (LDK) [1], data collection methods, and overall dataset attributes with different weather and environmental conditions for comprehensive data formation.

A. Prototype Thermal Camera
For the proposed research work we have utilized microbolometer technology based uncooled thermal imaging camera developed under the HELIAUS project [32].The main characteristic of this camera includes low-cost, lightweight and its sleek compact design thus allowing to easily integrate it with artificially intelligent imaging pipelines for building effective in-cabin driver-passenger monitoring and road monitoring systems for ADAS.It enables us to capture high-quality thermal frames with low-power consumption thus proving the agility of configurations and data processing algorithms in real-time.Fig. 1 shows the prototype thermal camera.The technical specifications of the camera are as follows, the camera type is a QVGA long-wave infrared (LWIR) with a spectral range from 8-14 µm and a camera resolution of 640 X 480 pixels.The focal length (f) of the camera is 7.5 mm, F-number is 1.2, the pixel pitch is 17 µm, and the power consumption is less than 950mW.The camera relates to a high-speed USB 3.0 (micro-USB) port for the interface.The data is recorded using a specifically designed toolbox.The complete camera calibration process along with the data processing pipeline is explained in the next section.

B. Shutterless Calibration and Real-time Data Processing
This section will highlight the thermal camera calibration process for shutterless camera configuration along with realtime data processing methods for converting the raw thermal data to refined outputs.Shutterless technology allows uncooled IR engines and thermal imaging sensors to continuously operate without the need for a mechanical shutter for Non-Uniformity Correction (NUC) operations.Such type of technology provides proven and effective results in poor visibility conditions ensuring good quality thermal frames in real-time testing situations.For this, we have used a low-cost blackbody source to provide three different constant reference temperature values referred to as T-ambient1-BB1 (hot uniform scene with temperature value of 40 degree centigrade), T-ambient1-BB2 (cold uniform scene with the temperature value of 20 degree centigrade), and T-ambient2-BB1 (either hot or cold uniform scene but with different temperature value).The imager can store up to 50 snapshots and select the best uniform temperature scenes for calibration purposes.Fig. 2 shows the blackbody used for the thermal camera calibration.Once the uniform temperature images are recorded the images are loaded in camera SDK as shown in Fig. 3 to finally calibrate the shutterless camera stream.Fig. 4 shows the results before applying shutterless calibration and processed results using shutterless algorithms on thermal frame capture through the prototype thermal IR camera.In the next phase, various real-time image processing-based correction methods are applied to convert the original thermal data to produce good-quality thermal frames.Fig. 5 shows the complete image processing pipeline.

Fig. 5. Thermal image correction pipeline
As shown in Fig. 5 image processing pipeline consist of three different image correction methods which include gain correction, bad-pixel replacement, and temporal denoising.The further details of these methods are provided as follows.

1) Gain Correction Automatic Gain Control (AGC)
Thermal image detectors, based on flat panels, suffer from irregular gains due to the non-uniform amplifiers.
To correct the irregular gains, a common yet effective technique referred to as automatic gain control is applied.It is usually based on the gain map.By averaging uniformly illuminated images without any objects, the gain map is designed.By increasing the number of images for averaging provides a good gain-correction performance since the remained quantum noise in the gain map is reduced [1].

2) Bad Pixel Replacement (BPR)
This is used to list bad pixels estimated at the calibration stage.It works by tracking potential new bad pixels by looking at pixel neighbourhood also known as the nearest neighbour method.Once it traces the bad pixels in the nearest neighbour it replaces them with good pixels.Fig. 6 demonstrates one such example.
Fig. 6.Bad pixel replacement algorithm output on sample thermal frame, left side frame with some bad pixels and the right side is processed frame.

3) Temporal Denoising (TD)
The consistent reduction of image noise poses a frequently recurring problem in digitized thermal imaging systems and especially when it comes to uncooled thermal imagers [34].To mitigate these limitations for better outputs different methods are used which include hardware as well software-based image processing methods such as temporal and spatial denoising algorithms.The temporal denoising method is used to decrease the temporal noise between different frames of the video.In commercial solutions, it usually works by gathering multiple frames and averaging those frames to cancel out the random noise among the frames.
In our data acquisition process, this method is used after applying the shutterless algorithm.Fig. 7 shows the sample thermal images in the form of outcomes after applying shutterless algorithms and all the image processing-based corrections methods as shown in Fig. 5.

C. Data Collection Methods and Overall Dataset Attributes
This section will highlight different data collection approaches adopted in this research work.The data is collected in two different approaches.In, the first approach (M-1) the data is gathered in an immobile method by placing the camera at a fixed place.The camera is mounted on the tripod stand at a fixed height of nearly 30 inches such that the roadsides objects are covered in the video stream.The thermal video stream is recorded at 30 frames per second (FPS).The data is recorded in different weather and environmental conditions.Fig. 8 shows the M-1 data acquisition setup.In the second method (M-2) the thermal imaging system is mounted over the car and data is acquired in the mobile method.The prime reason for collecting the data in two different methods is to bring variations and collect distinctive local data in different environmental and weather conditions.For this, a specialized waterproof camera housing case was designed to hold the thermal camera in the correct position and angle to cover the entire roadside scene.The housing case is fixed on a suction-based tripod stand thus allowing us to easily fix and remove the complete structure from the car bonnet.The housing case also contains a visible camera to get initial visible images as reference data thus allowing us to adjust both the camera positions in proper angle and field of view.II.The acquired data comprises distinct stationary classes, such as road signs and poles, as well as moving object classes such as pedestrians, cars, buses, bikes, and bicycles.IV.PROPOSED METHODOLOGY This section will detail the proposed methodology and training outcomes from the various network variants tested in this study.

A. Network Training and Learning Perspectives
The overall training data comprises both locally and publicly available datasets.The complete training data is divided in the ratio of 50% -50% where 50% of data is selected from locally acquired thermal frames whereas the rest 50% of the training data leverages from public datasets.Six distinct types of roadside objects for driving assistance are included in training and validations sets.These include bicycles, motorcycles, buses, cars, pedestrians or people, and static roadside objects such as poles or road signs, as shown in Fig 12.

B. Data Annotation and Augmentation
The overall data annotations were performed manually using an open-source bounding box-based annotations tool LabelImg [31] for all the thermal classes in our study.Annotations are stored in YOLO format as text files.During the training phase all the YoloV5 network variations which include small, medium, large, and x-large networks have been trained to detect and classify six different classes in different environmental conditions.
Large-scale datasets are considered a vital requirement for achieving optimal training results using deep learning architectures.Without the need of gathering new data, data augmentation allows us to significantly improve the diversity of data available that can be effectively used for training the DNN models.In the proposed study we have incorporated a variety of data augmentation techniques which involve cropping, flipping, rotation, shearing, translation, mosaic transformation for an optimum training of all the network variants of the YOLO-V5 framework.

A. C. Training Results
As discussed in subsection A of section IV all the networks are trained using the combination of public as well as the locally gathered dataset.Training data from public datasets are included from four different datasets which include FLIR [7], OST [2], CVC [19], and KAIST [5] datasets.Secondly, we have used thermal frames acquired from the locally gathered video sets using both M1 and M2 methods.The training process is performed on a server-grade machine with XEON E5-1650 v4 3.60 GHz processor, 64 GB of ram, and equipped with GEFORCE RTX 2080 Ti graphical processing unit.It comes with 12 GB of dedicated graphical memory, memory bandwidth of 616 GB/second, and 4352 cuda cores.During the training phase the batch size is fixed to 32 and as an optimizer, both stochastic gradient descent (SGD) and ADAM optimizer were used.However, we were unable to achieve satisfactory training results using ADAM optimizer as compared to SGD thus selected SGD optimizer for training purposes.

V. VALIDATION RESULTS ON GPU AND EDGE DEVICES
This section will demonstrate the object detection validation results on GPU as well as on two different embedded boards.

A. Testing Methodology and Overall Test Data
In this research study, we have used three different testing approaches which include the conventional test-time method with no augmentation (NA), test-time augmentation (TTA), and test-time with model ensembling (ME).TTA is an extensive application of data augmentation applied to the test dataset.It performs by creating multiple augmented copies of each image in the test set, having the model make a prediction for each, then returning an ensemble of those predictions.However, since the test dataset is enlarged with a new set of augmented images the overall inference time also increases as compared to NA which is one of the downsides of this approach.TTME or ensemble learning refers to as using multiple trained networks at the same time in a parallel manner to produce one optimal predictive inference model [35].In this study, we have tested the performance of individually trained variants of the Yolo-V5 framework and selected the best combination of models which in turn helps in achieving better validation results.
After training all the networks variants of yolo-v5, the performance of each model is cross-validated on a comprehensive set of test data selected from the public as well as locally gathered thermal data.

B. Inference Results Using YOLO Network Variants
In the first phase, we have run the rigorous inference test on GPU as well as Edge-GPU platforms on our test data using the newly trained networks variants of yolo framework.The overall test data is consisting of nearly ≈ 31,000 thermal frames.Fig. 16 shows the inference results on 9 different thermal frames selected from both public as well as locally acquired data.These frames have data complications such as multiple class objects, occlusion, overlapping classes, scale variation, and varying environmental conditions.The complete inference results are available on our local repository (https://bit.ly/3lfvxhd).
In the second phase, we have run the combination of different models in a parallel manner using the model ensembling approach to output one optimal predictive engine which can be further used to run the inference test on the validation set.The different combination of these models is shown in Fig. 16.Inference results on nine different frames selected from test data.
With the model ensembling method small and large models (A1) turn out to best model combination in terms of achieving the best mAP, recall, and relatively less amount of inference time per frame thus producing optimal validation results.These results are examined in further parts of this section.Fig. 17 shows the inference results using A1 model ensembling engine on three different thermal frames selected from the test data.

C. Quantitative Validation Results on GPU
The third part of the testing phase shows the quantitative numerical results of all the trained models on GPU.To better analyse and validate the overall performance for all the trained models on test data, relatively a smaller set of test images has been selected from the overall test set.For this purpose, a subset of 402 thermal frames is selected to compute all the evaluation metrics.The selected images consist of different roadside objects such as pedestrians, cars and buses under different illumination and environmental conditions, time of day, and distance from the camera.The objects are either far-field (between 11-18 meters), mid-field (between 7-10 meters) or near-field (between 3-6 meters) from the camera.Fig. 18 shows selected views from the test data for quick reference of the reader.The performance evaluation of each model is computed using four different metrics which include recall, precision, mean average precision (mAP), and frames per second rate (FPS).Table VI shows all the quantitative validation results on GPU.During the testing phase batch size is fixed to 8. Also, three different testing configuration is selected thus having separate confidence threshold values and the intersection of union values at each validation phase.Confidence threshold defines the minimum threshold value, or in other words, it is the minimum confidence score above which we consider a prediction as true.If it's below the threshold value, we consider the prediction as "no".The last row of Table VI shows the best ME results using A1 configuration from Table V with a selected confidence threshold of 0.2 and IoU threshold of 0.4.

E. Real-time Hardware Feasibility Testing
While running these tests we closely monitor the temperature ratings of different hardware peripherals on both Edge-GPU platforms.It is done to prevent the overheating effect which can damage the onboard processor or effect the overall operational capability of the system.In the case of Nvidia Jetson Nano, a cooling fan was mounted on top of the processor heatsink to reduce the overheating effect as shown in Fig. 19.It can be examined from Fig. 20 that by mounting an external cooling fan the temperature rating of various onboard peripheral on Jetson Nano was reduced by nearly 30% thus allowing us to operate the board at its maximum capacity for rigorous model testing.Fig. 21 shows the Nvidia Jetson running at its full pace (with an external fan) such that all the four cores running at their maximum limit (100% capacity) while running the quantitative and inference test by deploying the smaller network variant of the yolo-v5 framework.

VI. MODEL PERFORMANCE OPTIMIZATION(S)
This section will mainly aim at further model optimization using TensorRT [33] inference accelerator tool.The prime reason for this is to further increase the FPS rate for real-time evaluation and on-board feasibility testing on edge devices.Secondly, it helps in saving onboard memory footprints on the target device by performing various optimization methods.
TensorRT [33] works by performing five modes of optimization methods for increasing the throughput of deep neural networks.In the first step, it maximizes throughput by quantizing models to 8-bit integer data type or FP16 precision while preserving the model accuracy.This method significantly reduces the model size since it is transformed from originally FP32 to FP16 version.In the next step, it uses layer and tensor fusion techniques to further optimize the usage of onboard GPU memory.The third step includes performing kernel auto-tuning.It is the most important step where the TensorRT engine shortlists the best network layers, and optimal batch size based on the target GPU hardware.In the second last step, it minimizes memory footprints and re-uses memory by distributing memory to tensor only for the period of its usage.In the last steps, it processes multiple input streams in parallel and finally optimizes neural networks periodically with dynamically generated kernels [33].
In the proposed research work we have deployed a smaller variant of yolo-v5 using TensorRT inference accelerator on both edge platforms Nvidia Jetson Nano and Nvidia Jetson Xavier NX development boards to further excel the performance of the trained model.It produces faster inference time thus increasing the FPS on thermal data which in turn helps us in building an effective real-time forward sensing system for ADAS embedded applications.Xavier is computationally more powerful than the Jetson Nano.Note that due to memory limitations and the lower computational power of the Jetson only the small network variant was evaluated on the Jetson Nano, whereas both the smaller and medium network variants were evaluated on the Jetson Xavier NX. • It was observed that throughout the testing phase, it was important to keep a close eye on the operational temperature ratings of different onboard thermal sensors to avoid overheating, which might damage the onboard components or affect the system's typical operational performance.
Active cooling fans were used on both boards during testing, and both ran at close to their rated temperature limits.• This study also included model optimization using TensorRT [33] inference accelerator tool.It was determined that TensorRT leads to an approximate increase of FPS rate by a factor of 3.5 when compared to the non-optimized smaller variant of yolo-v5 on Nvidia Jetson Nano and Nvidia Jetson Xavier devices.• After performing model optimization, the Nvidia Jetson produced 11 FPS and Nvidia Jetson Xavier achieved 60 FPS on test data.

CONCLUSION
Thermal imaging provides superior and effective results in challenging environments such that in low lighting scenarios and has aggregate immunity to visual limitations thus making it an optimal solution for intelligent and safer vehicular systems.In this study, we presented a new benchmark thermal dataset that comprises over 35K distinct frames recorded, analyzed, and open-sourced in challenging weather and environmental conditions utilizing a low-cost yet reliable uncooled LWIR thermal camera.All the YOLO v5 network variants were trained using locally gathered data as well as four different publicly available datasets.The performance of trained networks is analysed on both GPU as well as ARM processorbased edge devices for onboard automotive sensor suite feasibility testing.On edge devices, the small and medium network edition of YOLO is deployed and tested due to certain memory limitations and less computational power of these boards.Lastly, we further optimized the smaller network variant using TensorRT inference accelerator to explicitly increase the FPS on edge devices.This allowed the system to achieve 11 frames per second on jetson nano, while the Nvidia Jetson Xavier delivered a significantly higher performance of 60 frames per second.These results validate the potential for thermal imaging as a core component of ADAS systems for intelligent vehicles.
As the future directions, the system's performance can be further enhanced by porting the trained networks on more advanced and powerful edge devices thus tailoring it for realtime onboard deployments.Moreover, the current system focuses on object recognition, but it can be enhanced to incorporate image segmentation, road and lane detection, traffic signal and road signs classification, and object tracking for providing comprehensive driver assistance.research team.His research interests include machine learning utilizing deep neural networks for computer vision applications, including working with synthetic data, thermal data, and RGB.
Peter Corcoran (Fellow, IEEE) holds a Personal Chair in Electronic Engineering at the College of Science and Engineering, National University of Ireland Galway (NUIG).He was the Co-Founder in several start-up companies, notably FotoNation, now the Imaging Division of Xperi Corporation.He has more than 600 cited technical publications and patents, more than 120 peerreviewed journal articles, 160 international conference papers, and a co-inventor on more than 300 granted U.S. patents.He is an IEEE Fellow recognized for his contributions to digital camera technologies, notably in-camera red-eye correction and facial detection.He is a member of the IEEE Consumer Technology Society for more than 25 years and the Founding Editor of IEEE Consumer Electronics Magazine.

Fig. 2 .
Fig. 2. Thermal camera calibration a) blackbody source used for LWIR thermal camera calibration, b) uniform scene: temperature set to 40.01 degree centigrade.

Fig. 7 .
Fig. 7. High-quality thermal frames after applying the shutterless calibration algorithm and image correction methods.

Fig. 8 .
Fig. 8. Data Acquisition setup by placing the camera at a fixed place a) camera mounted on a tripod stand, b) complete daytime roadside view, c) video recording setup at 30fps, d) evening time alleyway view.

Fig. 9
Fig.9shows the camera housing case along with the initial data acquisition setup whereas

Fig. 9 .
Fig. 9. Data acquisition setup through car a) camera housing case holding thermal and visible camera, b) initial data acquisition testing phase.

Fig. 10
Fig.10shows the housing case fixed on tripod structure and complete M-2 acquisition setup mounted on the car.The overall dataset is acquired from Galway County Ireland.The data is collected in form of short video clips and more than> 35,000 unique thermal frames have been extracted from the recorded video clips.The data is recorded in the daytime, evening time, and night-time which is distributed in the ratio of 44.61%,

Fig. 10 .
Fig. 10.Complete data acquisition setup mounted on the car a) camera housing fixed on a suction tripod stand, b) data acquisition kit from the front view, c) data acquisition kit from the side view.TABLE II NEW THERMAL DATASET ATTRIBUTES

Fig. 11
Fig.11shows the six distinct sample of thermal frames captured in different environmental and weather conditions using M1 and M2 methods.These samples show different class objects such as buses, bicycles, poles, person, and cars.Most of these objects are found commonly on the roadside thus providing the driver a comprehensive video analysis of car surroundings.

Fig. 11 .
Fig. 11.Six different thermal samples acquired using LWIR 640x480 prototype thermal camera showing various class objects.

Fig. 12 .
Fig. 12. Block diagram depicts the steps taken to evaluate the performance of Yolo v5 on local and public datasets.

Fig. 13 Fig. 13 .
Fig.13shows the class-wise data distribution.In the training phase of the YOLO-V5 framework, a total of 59,150 class-wise data samples were utilized, along with their corresponding class labels Fig. 14 shows the graph results of yolo-v5 large model.The figure visualizes obtained PR-curve, box loss, object loss, and classification loss.During the training process, the X-large model consumes the maximum amount of hardware resources with the largest training time as compared to other network variants with overall GPU usage of 9.78 GB and a total training time of 14 hours.Fig. 15 shows the overall GPU memory usage, GPU power required in percentages, and GPU temperature in centigrade scale while training the largest x-large network variant of yolo-v5 model.

Fig. 15 .
Fig. 15.GPU resource utilization during the training process of x-large network, a) 85% (9.78 GB) of GPU memory utilized, (b) 90% (585 watts) of GPU power required and, (c) 68 C of GPU temperature with the maximum rating of 89 C.

Fig. 18 .
Fig. 18.Test data samples with the object at varying distances from the camera, (a) near-field distance, (b) mid-field distance, (c) far-field distance.

Fig. 19 .
Fig. 19.External 5-volt fan unit mounted on Nvidia Jetson Nano processor heatsink to avoid onboard overheating effect while running the inference testing.The temperature ratings of various hardware peripherals are monitored using eight different on-die thermal sensors and one on-die thermal diode.These temperature monitors are referred to as CPU-Thermal, GPU-Thermal, Memory-Thermal, and PLL-Thermal (part thermal zone).External fans help us in

Fig. 21 .
Fig. 21.Nvidia Jetson Nano running at MAXN power mode with all the cores running at their maximum capacity while running the inference test and quantitative validation test.Fig. 22 shows the temperature rating difference of onboard thermal sensors while running the smaller version of the model on Nvidia Jetson Xavier NX board.Whereas Fig. 23 shows the CPU and GPU usage while running the smaller variant of

Fig. 24 .
Fig. 24.Overall block diagram representation of deployment and running TensorRT inference accelerator on two different embedded platforms.Table IX shows the overall inference time along with FPS rate on thermal test data using TensorRT run-time engine.By analyzing the results from Table.IX we can deduce that TensorRT API supports in boosting the overall FPS rate on ARM-based embedded platforms by nearly 3.5 times as compared to the FPS rate achieved by running the nonoptimized smaller variant on Nvidia Jetson Nano and Nvidia Jetson Xavier boards.The same is demonstrated via graphical chart results in Fig. 25.TABLE IX TENSORRT INFERENCE ACCELERATOR RESULTS

Fig. 25 .
Fig. 25.FPS increment rate of nearly 3.5 times on Jetson Nano and Jetson Xavier NX embedded boards using the TensorRT built optimized inference engine.

Fig. 26
Fig.26shows the thermal object detection inference results on six different thermal frames from the public as well as locally acquired test data produced through the neural accelerator.

Fig. 26 .
Fig. 26.Inference results using TensorRT neural accelerator, (a) Object detection results on public data, (b) Object Detection results on locally acquired thermal frames.DISCUSSION/ ANALYSIS This section will review the training and testing performance of all YOLO-V5 framework model variants.• During the training phase, the large YOLO v5 network outperforms other network variants scoring the highest precision of 82.29% and a mean average precision (mAP) score of 71.8%.• Although the large network variant performed significantly better during the training phase, the small network variant

TABLE II NEW
THERMAL DATASET ATTRIBUTES TableIIIshows the performance evaluation of all the trained models in the form By analysing TableIII, it can be observed that the large model performed significantly better when compared to other models with an overall precision of 82.29%, recall rate of 68.67%, and mean average precision of 71.8% mAP.
Table IV provides the numeric data distribution of the overall validation set.
Table V respectively where 1 indicates that model is in active state and 0 means model is in a non-active state.

TABLE VI QUANTITATIVE
RESULTS ON GPU