A Survey on Approximate Edge AI for Energy Efficient Autonomous Driving Services

Autonomous driving services rely heavily on sensors such as cameras, LiDAR, radar, and communication modules. A common practice of processing the sensed data is using a high-performance computing unit placed inside the vehicle, which deploys AI models and algorithms to act as the brain or administrator of the vehicle. The vehicular data generated from average hours of driving can be up to 20 Terabytes depending on the data rate and specification of the sensors. Given the scale and fast growth of services for autonomous driving, it is essential to improve the overall energy and environmental efficiency, especially in the trend towards vehicular electrification (e.g., battery-powered). Although the areas have seen significant advancements in sensor technologies, wireless communications, computing and AI/ML algorithms, the challenge still exists in how to apply and integrate those technology innovations to achieve energy efficiency. This survey reviews and compares the connected vehicular applications, vehicular communications, approximation and Edge AI techniques. The focus is on energy efficiency by covering newly proposed approximation and enabling frameworks. To the best of our knowledge, this survey is the first to review the latest approximate Edge AI frameworks and publicly available datasets in energy-efficient autonomous driving. The insights and vision from this survey can be beneficial for the collaborative driving service development on low-power and memory-constrained systems and also for the energy optimization of autonomous vehicles.


I. INTRODUCTION
The use of sensors, advanced driver assistance systems (ADAS), and safety features in a vehicle shows a rising trend.The latest progression is towards integrating these sensors with the state-of-the-art deep learning architecture based on the sense, think, and act model, which can assist the driver or replace a driver by offering the highest level of autonomy [158].The highest level of autonomy is described as the execution of driving processes that serve self-driving functionality from a source point to the destination point without any input or control to a vehicle from the human.Full automation can be achieved by integrating multiple sensors, such as camera, LiDAR, global navigation satellite system, radar, and communication modules with software-level solutions, thus providing the automotive driving features or the advanced driver assistance system [140], [16].The automotive industry is already using several simple and complex ADAS features for a long time, which has also improved the overall driver experience with the ultimate objective of providing better road safety [342], [56].Braking assistance, lane departure warning, adaptive cruise control, and global positioning system (GPS) based navigation are some of the features that have been used since its introduction between 1990-2000 [266].The current trend follows incorporating the deep learning and machine learning approaches within autonomous vehicles to provide maximum precision and human-level accuracy.The principle behind these statistically-based learning algorithms is to interpret the drivers surrounding when provided with impartial or neutral data.Based on the characteristics of the provided input, these algorithms classify or predict an output.Some of the machine learning approaches have already replaced traditionally used algorithms in applications such as collision-warning systems [155], [131], vision-based detection [152], [133], path planning, lane change systems [214], [104], and recognition of multiple objects and further classification of them into traffic signs, cars, bicyclists, pedestrian to name few [195], [124].Although deep neural networks encounter the above-mentioned problems, the deployment on embedded and edge devices and related computational factors cannot be neglected.Therefore, this survey reviews the AI algorithms for connected vehicle applications; Edge AI approaches, and vehicular frameworks.From the above-mentioned topics, this survey focuses explicitly on energy-efficient mechanisms and approximate techniques.Figure 1 presents the taxonomy and topics covered in this paper.The outline of the sections in this survey paper is further divided as follows: 1) Motivation and Background: This section illustrates the motivation and research questions targeted in this survey.It also discusses similar surveys and introduces background context for autonomous driving, approximate techniques, Edge AI, and vehicle communication.
2) AI and Autonomous Driving: In this section, the fundamentals such as Machine Learning & Deep Learning approaches, are described.Autonomous driving services such as Perception, Localization, Path Planning, simultaneous localization and mapping (SLAM), and Vehicle-to-everything (V2X) are reviewed and compared based on state-of-the-art architecture and methodologies.
3) Edge AI with Autonomous Driving: This section discusses edge computing and the Edge Intelligence paradigm.This section reviews the articles published on cooperative driving, communication-efficient approaches, federated learning, Edge AI Inference, and Edge AI optimization methods.
4) Enabling Frameworks: This section covers the deep learning framework for autonomous driving and the Edge AI framework on computation, communication, and offloading  Autonomous vehicles and technologies have seen phenomenal growth.However, they are still far from being categorized as fully connected and autonomous systems.The current vehicular technologies need up-scaling and development in efficient communication, computation, reduced carbon emission, collaborative intelligence, and paramount safety.The primary focus and key research areas within the automotive domain were improving the performance parameters and developing baseline models and frameworks in object detection, SLAM, and vehicular communication, respectively.
To show the research trends in the autonomous driving domain, a graph is generated using data collected from the Scopus database.While collecting information from the Scopus database, the search is refined using popular keywords in the automotive domain, publication area (e.g., science, mathematics, information systems, and engineering), year range, and type of publication (e.g., conference paper, journal, books, chapters).The trend in the past decade, as shown in Figure 2, indicates that the primary focus was in the area of perception, specifically on object detection and segmentation, owing to the advancements in neural networks and datasets.SLAM and vehicular communication are becoming popular topics, with the latter catching up because of the recent development in 5G, next-generation cellular, and hybrid communication technologies.Energy-efficient approaches are showing a gradual increase, while the number of publications on energy-efficient methods is relatively less as compared to other subjects.
The remaining of section covers the introduction, classi-

A. Autonomous Driving
Autonomous vehicles and on-board sensors generate a large amount of raw data that needs to be processed on-board by the vehicle computing unit, using DNN architectures and intelligent algorithms to enable driving services and applications.Figure 3 shows an approximate data rate from individual sensors in an autonomous car.The data rate may vary based on the sensor's specification (e.g., generation, bit-rate) and the data quality.Examples of vehicular applications using AI models are adaptive cruise control, object classification and obstacle detection, and SLAM.Studies from [299], [145] suggest that energy consumption from fully connected autonomous vehicles can be separated into three categories: 1) Consumption by an autonomous car (on-board sensors and Computing devices).
2) Energy consumption caused due to Infrastructure sensors involving Vehicular communication and Networking.3) Energy consumption at the backend, for example, Edge servers, the central server maintaining legacy data, and the global DNN model.Studies [185] show that on-board energy consumption is higher than 1000's watts, and overall energy consumption from a single conditional automated diving vehicle combining all three categories could be around 2500 Wh per 100 km of driving [145].High on-board energy consumption is due to the usage of compute-intensive algorithm and the processing devices such as graphics processors, which are essential for perception and visual applications.
Advanced driving assistance systems (ADAS) and features have been prevalent in the past decade.As shown in Figure 2, the research trend in the past decade has primarily been in the area of perception, specifically on object detection and segmentation, due tothe advancement in convolutional neural networks and the releases of autonomous driving datasets.SLAM and vehicular communication are also popular topics, with the latter catching up because of the recent development in 5G, next-generation cellular, and hybrid communication technologies.Energy-efficient approaches are also showing an increase in research trends.However, the number of publications on energy-efficient methods is relatively less as compared to other directions.The on-board computation approaches leading to power consumption [38] demand the design of applications and energy-efficient Edge AI systems for automated driving services.Therefore, this survey paper focuses on identifying currently practiced AI algorithms and computation approaches that lead to high energy consumption.Further, it comprises Figure 3: Data generated by the automotive sensors of review from, design, and implementation of edge computing approaches for the autonomous driving services (for example, Perception, HD Map, SLAM), datasets, edge-assisted techniques, and vehicle-edge frameworks.Lastly, based on the gathered requirement and research gaps, an Edge AI processing pipeline is proposed, which contains the higherlevel abstraction of components involved in service implementation across vehicle-edge settings.In this survey, the levels of autonomy is referred from the International Society of Automotive Engineers (SAE), consisting of six levels of automation in driving, which are as follows: 1) Level 0 -No Automation: All driving tasks are carried out by driver.2) Level 1 -Driver Assistance: Driving tasks are carried by driver with little input from the vehicle sensors, this level introduces driving assist features.3) Level 2 -Partial Automation: Driving tasks can be carried by computing unit placed in car with sensed input from the vehicle surrounding, the features include adaptive cruise control, autonomous emergency braking, however this level still requires the driver to maintain control of driving tasks and regularly monitor the vehicle surrounding.4) Level 3 -Conditional Automation: Some tasks (sensing, actuation and control) are carried out by the sensors and the computing unit placed in the car, however the driver must be able to take control of the vehicle based on demand and situation.5) Level 4 -High Automation: Vehicle is capable of performing all driving tasks by initiating communication with other vehicles under certain conditions, but the driver has the option to take control of vehicle.6) Level 5 -Full Automation: Vehicle is capable of performing all driving tasks by communicating with other vehicles and infrastructure sensors under all conditions, but the driver may have the option to control the vehicle.

B. Approximate Techniques
DNN applications such as 3D object detection and classification or SLAM are usually computationally intensive, memory-consuming, and energy-consuming tasks.The computing complexity increases for real-time applications when these larger-weight DNN are implemented on embedded systems with limited memory and computing power.For example, the currently deployed level 3 autonomous vehicles [148], [242], [18] primarily depend on vision sensors and systems and consume significant resources in terms of memory and energy.The scalability of these embedded systems with fully connected cooperative autonomous vehicles is yet to be known, incorporating full ADAS features.When these features are integrated into the resource [99], [393], [287], [338] and energy-constrained [359], [376], [175], [384] real-time autonomous systems, the following challenges will be encountered: A) When the large volume of sensor data is processed over the DNN algorithms for autonomous driving services, it will directly affect the computing efficiency of the embedded systems with limited memory, which makes it essential to implement compression technique [159], [228], [274], [285] thus approximating the algorithms and embedded device usage and simultaneously optimizing it for better energy-efficiency.B) The computing complexity and low latency for applications such as SLAM makes it necessary to process the sensed data at the on-board computing unit rather than processing it at the Edge-server.Software and architectural approximation techniques such as data aggregation, and early-exit neural networks can help improve the on-board low latency and fast inference.This is comprehensively covered in section IV.

C. Edge AI
Edge AI or Edge Intelligence can be described as the combination of Edge Computing, and Artificial Intelligence [400].It has emerged due to the requirements from the connected ecosystems, developed for the applications that require the processing of algorithms locally on the device or in the nearest available data center or server.The algorithms [281] utilize the data generated by the devices and make independent decisions for real-time applications without needing to connect to the centralized server or cloud for the decision-making process.A fully connected Level 5 autonomous car will result from collaboration between edge-sever communications and computing systems.
The current level 1 -level 3 autonomous vehicle highly relies on the Graphics Processing Unit (GPU chip) for their applications, and the GPU alone can consume up to 300-350Wh [38], [18], [145] of energy per 100 km of driving depending upon the data rate and quality of the sensors.As shown in Table II, the number of sensors increases for the fully-connected autonomous vehicle compared to the current scenario.The information shown in the table is an approximate estimate.Based on OEM and fleets [3], [18], [148], sensor distribution may vary according to the sensor kit and software technologies used in conjunction with it.The estimated power consumption of each vehicle can be from 100's to 1000's watts depending on the type of operation the vehicle is involved in.As per the reports [272] the amount of data transmitted between the vehicle and the cloud can reach 10 exabytes in the future, which is excessive with respect to current practices, and the present cloud and server are not capable of handling and processing this real-time data quickly.Therefore, AI at the Edge can be implemented to process the data for time-sensitive tasks in the autonomous driving ecosystem and offload it.
For low-latency tolerable applications, such as HD map updates or traffic incidents sharing, Edge AI approaches can process the data locally at the edge-server, and later transmit the model or analysis of the result to the cloud or remoteserver, by following energy-efficient mechanisms.In a fully automated driving environment, the Edge AI implementation can help in achieving better end-to-end accuracy by bringing down the current power and energy consumption.The Edge AI deployment process involves sensing, re-training, decision making, and collaborative learning through communication with other edge devices, and servers in the environment.

D. Communications in Autonomous Vehicles
Communication in the vehicular ecosystem is a key to deploying cooperative and collaborative autonomous driving applications.An example of connected vehicles, base stations, road-side units, edge-servers, infrastructure and remote cloud is shown in Figure 4. Several use-cases presented within the context of vehicle communication [217], [251], [22], [94], [210], [54], discuss directly benefiting the perception, planning, and control related use-cases and subsequently impacting the energy consumed by the vehicle.Communications in vehicle can be further categorized as: Inter-Vehicle Communication [47], [19], [50] & Intra-Vehicle Communications [240], [225].VI.
The most important factor for the high use of BLE technology is relatively low power consumption [182], [327] and it has a large installed base and a guaranteed latency, as well as a stable specification.Automobile components and modules, normally connected by electrical signal wires, are increasingly being replaced by wireless signals.A reduction of 50% in the number of signal wires is the goal of the automotive industry.Typically, an automobile contains about five kilometers of wiring, so there would be many wireless signals.A hybrid practice that uses both, wired clusters of automobile components and wireless inter-cluster connections is becoming more common.The infotainment panel at the vehicle dashboard is such an example.For Inter-Vehicle communication, the present human-driven or semi-autonomous vehicles are equipped with communication and radio modules, which receive information and signals mostly related to infotainment.The communication technology has evolved from AM, FM, DAB to HD Radio in which transmission method, media size, and quality of service have significantly improved.Since fully connected autonomous driving has wider communication and real-time processing requirements as the high-performance computing unit takes the decisions, researchers have proposed relevant technologies such as DSRC, V2V/V2I, WiMax, 5G-NR-V2X or C-V2X for local and long-range communication.

E. Taxonomy of Edge AI Technologies for CAV
This subsection introduces the taxonomy used in the remaining of this survey paper.First, AI methods used in autonomous driving are described.Second, Edge AI computing applications for autonomous vehicles are explained.Third, the approximation approaches and compression strategies are defined.Finally, energy-efficient mechanisms and requirements in the vehicular ecosystem are discussed.For reference, the topics can be seen in Figure 1.

1) AI Models & Autonomous Vehicles: An autonomous
vehicle is an independent system capable of routing from source to destination by perceiving its surroundings using sensors and processing the sensed data on intelligent algorithms.Advancements in autonomous vehicles and related technology can be associated with the progress of vehicle sensors suite and intelligent algorithms/models.These models have enhanced connectivity, infotainment systems, electrification, and automation.Perception sensors (camera, LiDAR, radar), positioning sensors (GPS, GNSS), and communication modules are used to replace or imitate human driving behaviour using AI models.a) Basic Model: AI models proposed to automate/assist driving tasks can be divided as follows: i) Machine Learning: Supervised, unsupervised, and reinforcement are the popular techniques explored within autonomous driving.ii) Deep Learning: It is a subset of machine learning that consists of several types of neural networks trained on datasets to learn complex features from unstructured or structured data.b) Model Requirements: AI models have specific requirements and guidelines depending on the driving tasks.For e.g., localization, emergency braking, and detecting an obstacle/traffic sign should be highly accurate.Within the scope of this survey paper, the discussed model requirements are: i) Accuracy: The principle behind using AI models is to eliminate human error while driving and achieve an expected level of accuracy for the driving tasks.It is measured as a score of correct predictions/estimation with respect to the total predictions by a model.ii) Latency: Each driving tasks have varied execution requirement.For e.g., detection and localization have strict requirements of a few milliseconds(ms).For AI models, latency (in time) is used to characterize the performance of a model for a specific application.iii) Energy: Desiring the highest level of accuracy for an AI model and fulfilling strict latency requirements for specific tasks generally leads to the use of high-performance computing units, which leads to energy consumption.Energy (Joules) can be estimated by capturing AI models' power consumption (Watts).c) AI models compression: These techniques enable processing large volume data or AI models, such as dense and deep neural networks on resourceconstrained devices with limited computation resources.For vehicular applications, lossless and lossy compression has been explored for models and data.Popular compression approach includes: • Parameter reduction: Reducing parameters from the AI model can compress the model, which may lead to deployment on resource-constrained devices.For e.g., the pruning of non-contributing weights/layers results in parameter reduction, which generally leads to model compression.• Layer/Node reduction: To reduce the compute and memory requirements of neural networks layer/node reduction approach is adopted.Generally practised as a structured approach to reduce computational demand, while balancing the model accuracy.Minimal matrix operations and parameter-sharing are some examples.[87], [149], [305].Software and model compression approaches proposed for framework and AI models in connected autonomous vehicles can be categorized as approximate model or approximation techniques.However, this generalization do not address energyefficiency (one of the three dimensions in approximate computing) from the viewpoint of computation and communication.
i) Quantization: Vehicular applications are dependent on intelligent algorithms, which generally use 32-bit floating point precision for training the model and gradient estimate.The elements can be approximated using quantization to fewer bits, reducing the model size and decreasing the bandwidth load.The approach is inspired by the human nervous system, where information is stored in discrete form [308]. ii) Sparsification: In this approach, a vector is represented by its approximate form where the non-zero components are equal to the corresponding original vector.It is a compression technique often implemented in collaborative and distributed learning approaches such as federated learning which requires frequent communication between the devices, in this case, between the vehicle and edge or cloud.iii) Low-rank approximation: Another technique to implement reduced computation for AI models in low-rank approximation.2) AI Tasks: Driving tasks implemented using AI models can be categorized as perception, SLAM, HD map, path/motion planning, and communication.The AI model and respective driving tasks can be further differentiated on the basis of data processing, feature extraction mechanisms, and hardware used.
a) Perception applications provide scene understanding and are performed using vision sensors such as a camera or LiDar, at the vehicle's on-board computing unit or at the sensor units present within the ecosystem (e.g., CCTV cameras with the computing unit).These applications are performed using CNN or DNN models deployed on the GPU.As the models largely consist of dense layers, the computational demand and energy cost for deployment are relatively high.b) SLAM application enables vehicles to localize in their surrounding using sensor data.AI models enabling SLAM applications are also memory and compute-intensive.The complexity further increases because of the low inference requirement/processing of these algorithms.c) HD map sometimes also referred to as 3D map is an evolving driving service/feature, which provides visuals of the vehicle surrounding replacing the currently used 2D maps.It is expected to be used with detection and localization tasks.d) Communication in the vehicular environment is dynamic and heterogeneous.It exists in three forms; in the vehicle, between vehicles and within the infrastructure.With the evolution, vehicular communication depends on the generation of hardware/software and sensory technologies.Highlevel autonomy is highly dependent on connected vehicles and smart infrastructure sharing raw data, weights, and algorithms.Similar to on-board computation, the complexity in vehicular communication arises due to the large volume of data and additional load on bandwidth.e) Path/Motion Planning is a crucial AI task that enables the vehicle to navigate from source to destination by avoiding obstacles.A traditionally used algorithm is A-star.However, recent approaches involve using AI models with vision sensors, thus combining motion planning and path prediction by avoiding obstacles.
3) Edge AI and CAV: Initially, cloud computing was proposed to facilitate computation, and decision-making for the connected vehicles [73], [303], [115].However, the cloud computing approach had several challenges in transmitting high volume or flood of data from the vehicle to the cloud, data privacy and leakage, adversarial and poisoning attacks on the ground truth data, and algorithms present in the cloud [115].Therefore, an approach to bring computation near the data source to tackle surplus data transmission to the cloud has been proposed in the form of edge computing.

F. Motivation and Methodology of Choosing Literature
In past years, detailed survey in emerging autonomous driving technologies [360], [187], common practices [374], deep learning techniques [85], and communication-efficient [281] approaches has been published.However, little to no attention has been given to energy-efficient approaches and related software approximation techniques for connected autonomous vehicles.In [360], an overview of current and emerging autonomous driving technologies by following the case-study approach is presented.While discussing emerging technologies, the authors also briefly described the future research opportunities in connected autonomous vehicles.A comprehensive study of edge computing systems and edge computing opportunities for autonomous driving is presented in [183], [42] and [187] respectively.The review paper gives attention to computing architecture, software framework, privacy, and security in vehicular communication.In a similar context, [379] presented a review of mobile edge intelligence techniques for vehicles and discussed edge-assisted perception, mapping, and open issues.Articles [374], [85] covered recent autonomous driving state-of-art AI models and techniques in detail.Key discussed topics were machine/deep learning models, driving safety features, system components, and architecture.The review conducted in [125] covers energyaware approaches for hardware and software layers in the edge computing domain, focusing on the framework layer.By focusing on key communication challenges, authors in [281] presented a comprehensive review of communicationefficient techniques for edge computing systems.In [315], authors reviewed cloud-edge computing, and popular frameworks by focusing on application and optimization techniques and benchmark runtime.
To highlight the value of this survey, a comparison with related surveys is shown in Table III.This comparison table is based on coverage of topics: deep learning practices (perception), data & compute-intensive tasks (SLAM, Communication, High-definition Maps), datasets, applications of Edge Intelligence, and related energy-efficient approaches.The review procedure utilized in preparing this literature survey is based on SLR approach adapted from Kitchenham and Charters [313], also shown in Figure 5.This approach demands defining of research questions and objectives initially, followed by identifying the search strategies.While searching the relevant and related content, a connected paper search approach is followed, the inclusion and exclusion criteria will be applied with the keywords and terms to refine the article based on the scope and objectives.In the last two stages of the SLR approach, the collected articles are categorically divided based on the article's contribution toward approximation techniques, autonomous driving applications, and Edge Intelligence.Some approximation techniques overlap in multiple research questions.Therefore, a combined approach is used for review.learning and semi-supervised learning [116], [25].In supervised learning a machine learning model is trained with labelled dataset, while in unsupervised learning a machine learning model is trained with unlabeled dataset, with the common purpose of prediction or classification.In semisupervised learning a machine learning model is trained with both labeled and unlabeled datasets.This approach is proposed to save training time and computational resources [122], [64].

A. Perception
Autonomous vehicles driven using sensory technologies and AI algorithms can be seen in the form of taxies from Waymo, Zoox, Cruise etc. [3], [81].These vehicles are mostly dependent on Perception related tasks: segmentation, Object classification-detection and localization.These three tasks are currently considered as crucial element for the enablement of autonomous driving.The object detection task can be further divided into 2D or 3D detection, which are mainly reliable on the line-of-sight sensors such as High-Definition Camera [193], [390] and LiDAR [170].2D object detection task is generally carried using convolutional neural network and recurrent neural network architecture which involves feature detection and estimation of rectangle or square shaped bounding boxes (x, y) around the detected objects in an image or video frame, whereas the 3D detection involves estimating a cube shaped, three dimensional bounding box in an object, by estimating the position of the object in the 3D plane (x, y, z).
Deep learning has been widely accepted as attractive or prominent technique for image and vision related applications because of development of the state-of-the-art neural network architectures [9], [147], [173], and their delivered accuracy's.The object detector are classified into one-stage and twostage detectors depending upon the backbone of training and inference method used.Table IV VIII), hardware implementation, detection methods, and speed (FPS) which is crucial for real-time deployment.For the 3D detection the initial approach and technique involves pre-processing of the 3D point clouds data and adopting them into the data structure required for the existing deep learning algorithms, thus providing an output based on the algorithm.Recent researches have proposed to process the LiDAR point clouds directly on deep neural network without converting them to any representations.For example [245], [244] proposed different form of deep neural net architectures, called as Pointnets and Frustum Pointnets respectively.These deep learning architectures have shown higher performance and have proved as benchmark for 3D perception based detection such as object classification and semantic segmentation.Pointnets++ architecture [246] proposed by Qi et [245] and ResNet [298] architecture for estimating the 3D frustum and object classification.1) 2D object Detection: 2D object detection in an autonomous vehicles are primarily based on the single or multiple cameras connected to sense the environment or surrounding of the car.The 2D object detection architecture or algorithm requires the raw image as an input, and outputs the bounding box with the class or label of the detected object.In 2D object detection the bounding box is an axisaligned rectangle, which is precisely estimated on the position of the multiple objects or classes in that image, here the bounding box can be parameterized as (x min , x max , y min , y max ) where (x min , y min ) are the pixel coordinates of the bottomleft bounding box corner, and (x max , y max ) are the pixel coordinates of the top-right corner.An example of the unannotated captured image and point cloud from the KITTI dataset [203] is shown in Figure 10, the image shows the front camera view and the generated LiDAR point cloud.
Some of the bench-marked 2D object detectors for real-time applications from the camera frames are [114], [336], [254], [188].These architectures are based on the approach where the image is processed on filters and layers of the convolution neural network, extracting the feature map of the entire image.The selected object regions are passed onto these extracted feature map, and mapped onto the region feature vector, which on the basis of the class scores predicts the type of object and proposes the bounding box onto it.The selected object regions are passed onto these extracted feature map, and mapped onto the region feature vector, which on the basis of the class scores predicts the type of object and proposes the bounding box.
2) 3D Object Detection: 3D object detection is dependent upon sensors such as RGBD camera, 3D radars, LiDAR or combined sensed values, as they can represent the vehicle surrounding in 3D setting.For inference the raw sensed values are processed using the deep learning algorithm, which requires the image with length, width and depth information or the LiDAR point cloud in sparse or dense format as an input.The output from these deep learning algorithm are as follows: At first it detects and classifies the object present in the scene and secondly it predicts a 3D bounding box for the detected objects in the line of sight.In the 3D object detection pipeline, the backbone of the architecture uses neural network with convolutional layers.The convolutional layers are responsible for feature extraction method from the scenes in the local feature map and the global feature map.The next stage comprises of deconvolution layer.The parameters weights obtained after the deconvolution layer are used for two process, in first it is fused together using probabilistic approach to generate and aggregate a score for the detected feature and they secondly they are processed on the pooling layer to fuse them further to obtain the detected object and the predicted bounding box.3D bounding box can be parameterized as (x, y, z, l, w, h, θ).Here the (x, y, z) is the 3D coordinates of the bounding box center, the (l, w, h) is length, width and height, respectively of the bounding box, and θ is the yaw angle of the bounding box.Two different approaches of 3D object detection based on image and LiDAR point clouds is shown in Figure 6 and Figure 7, where the object detection is used using fusion from the LiDAR point cloud and the respective camera image.
Most of the statistical or deep learning related algorithms for near real-time 3D object detection and semantic segmentation [132], [133] are based on PointNet [245], the models proposed here are trained and evaluated on the KITTI dataset, which contains images and LiDAR point clouds collected from the forward facing stereo camera and velodyne LiDAR.Recent point-cloud based architectures such as [152], [48], [365], [151], [278] have made it easier to directly use the raw point cloud for efficient detection on hardware.As reviewed in this section, research in perception category have mainly focused on improving accuracy of the DNN model, multiobject detection and tracking, and implementation on embedded devices, the challenges and opportunities for energy efficient addressed from this sections are: high computational demand, data fusion, collaborative learning models.Takeaways 1) Computational efficiency: Existing models consist of sequential convolution and fully connected layers with a primary objective of achieving high accuracy on a driving dataset.Deployment of such models is strictly dependent on high-performance devices which increases the onboard, computing and energy costs.Processing such neural networks on a resourceconstrained embedded device by maintaining benchmark accuracy remains an open challenge.2) Data fusion: Camera and LiDAR sensors data is used as independent or in combination to detect an object from the vehicular surroundings.However, the current practices remain to process data on individual pipelines and perform a fusion at the last stage.This leads to excessive use of computation resources for the same operation.3) Domain gap: Remarkable progress in object detection can be credited to intelligent algorithms trained on automotive datasets.The sensory technologies used for data collection frequently change in generations (e.g., LiDAR and improvement in resolution).However, little attention has been given to domain adaptation of these algorithms for the next generation of datasets.
Figure 7: Pipeline for the fusion of feature maps.This approach has been proved essential for LiDAR and Image based 3D-object detection.
Lane are essential components for level 4 and beyond autonomous driving.Previously maps were used as a driver assistance feature [290] to guide in navigation from source to destination.Google and Apple were the first of the few organization to collect street, city, and highway data which later enabled the flexible transportation and mobility by using GPS devices or map based applications on the regular smartphones.With the advancement in technology and algorithms the 3D maps of cities such as New York, Washington were created.HD maps for autonomous driving is the result of advancements in sensor and driving use-cases [290], [127].
Current HD maps lack specifications about the data type or standard guidelines, such as annotated information that should be stored while creating them.The automotive edge computing consortium (AECC) has proposed a version of an HD map consisting of four layers.This map version is based on Local Dynamic Map initially proposed by the European Telecommunications Standards Institute (ETSI) [36].The layer includes two static and dynamic layers, which are further classified based on timelines and changes expected within the vehicular ecosystem (Figure 8).Current use-cases, includes creating an HD map from the raw sensor data and updating an existing map using crowdsourced data from the vehicles and infrastructure sensors in the vehicle-edge-cloud setting.The four layers proposed in the AECC version are as follows: • Permanent static layer serves as the foundation by providing a static map of the surroundings.This layer consists of road maps, buildings, and roadside infrastructures.This • Highly dynamic layer frequently changes; in a few seconds to a few minutes.Thus contains information about moving objects such as other vehicles, pedestrians and motorcyclists.This section has not included information requiring frequent updates that may be less than a second interval in an HD map.Relevant work in HD maps in using deep neural networks includes: Hdnet, Vectornet, Exploiting sparse semantic HD maps [127], [248], [162], [249].Machine learning based approach and workflow for creation of high definition semantic map is presented in [127].In this paper author discussed the steps from data capture using sensors, annotations, and map generation.Use-case such as pose estimation, traffic sign and line mapping, lane/road marking were also discussed.In the similar context a complete HD map framework for autonomous driving is presented in [249].The authors comprehensively presented the HD map application by describing the pre-built maps, storage in cloud, locally built maps and update in the global map based on change in static semantic conditions.In this paper, the framework is distributed into onvehicle mapping, user-end localization, and on-cloud mapping.
For on-vehicle mapping traditional semantic method, pose estimation, perspective transformation and local mapping have been used [249], [307], [215].On-cloud mapping is responsible to merge and aggregate map data from multiple vehicles.Functions are used to merge local data timely such that the global map is up-to-date.As the size of data and volume is not fixed, a function to compress the map data is also implemented at the On-cloud mapping.Lastly, the user-end localization are vehicles requesting map information from the cloud.When the vehicle receives the map, an algorithm to decompress map data is implemented and data is further processed through a semantic localization pipeline.
Researchers have also predicted that around 10% of the roads or static conditions changes every year because of the construction and related scenarios.Therefore, crowd-sourcing based HD map update have been proposed to update the global map using individual vehicles [177], [235], [381], [162], [100].In [381], authors proposed to use sensors, such as GNSS, IMU and camera, to detect the change in the HD map using BiseNet architecture as semantic baseline and visual SLAM for localization and mapping.For experiment authors used arrow sign as an example from the surrounding and by using vectorization and matching approach detected the change in existing map data.Similar approach to update HD map using edge-servers is proposed in [162].In this paper authors discussed the issue of diminishing marginal utility and premature convergence of map data from individual vehicles.To this end, task distribution mechanism which uses adaptive time period division mechanism is proposed.In the experiments using edge devices and computing unit the effectiveness is verifies using coverage, cost and efficiency.
A crowd-sourcing based approach to create HD map using graph-SLAM [23] is proposed in [177].The authors used GNSS, odometry, point cloud data, and land marking to be processed using a graph-SLAM algorithm.The authors used pose estimation, smoothing filter, trajectory alignment for the landmarks.Road model inference and lane geometry is used to create the functions for lane boundary lines, connections and point observations.To evaluate the approach, an experiment with the ground-truth data was implemented.Deep learning methods using crowd-source based HD map update is proposed in [235], [100].In [235], authors proposed a change detection algorithm using boosted particle filter.The particle filters are applied during the localization along with a classification algorithm.In [100], authors proposed a framework that maps the sensed image/frame from camera to probabilities of HD map change.As the HD map data consist of geometric information and lane marking, deep learning metric is used to reduce the domain gap.In experiments authors implemented object detector with a pixel-level change detection from the input/sensed image, evaluated on city-scale dataset.
Other interesting techniques that can be explored for HD map creation and development are neural radiance field [206], [301], and mean-field game [110], [109].Instead of using three coordinate system (x, y, z), in neural radiance field [206] a five coordinate system including (x, y, z, α, φ) are used, where the last two are viewing direction.Authors used fully connected neural network to generate 3D scenes and frames based on the trained 2D images.For comparative study, performing techniques such as neural volumes, scene representation networks and local light field fusion is used to directly predict a multiplane image for the input.The approach is very useful for 3d models of object captured from camera.Similar approach is proposed in block-nerf [301] to represent surrounding in large scale view.In [301], architecture layers are modified using pose refinement, generative latent optimization, to adapt image appearance embedding as different images could be captured in different environment conditions.For experiments and evaluation, authors reconstructed 3D scenes using 2.8 million images captured from camera.Interesting work using mean-field game is proposed in [110], [109].In [110], authors proposed a computational framework by categorizing the scenario into microscopic and macroscopic perspective to control velocity for vehicles, and further develop traffic flow for autonomous vehicles.A comprehensive study is presented to characterize equilibrium solutions in both continuous MFGs and discrete differential games, a similar approach can be implemented in HD map creation and update, which requires strategic interaction between connected autonomous vehicles.
The challenges and opportunities in energy efficient approaches with HD map applications are as follows: 1) Data collection and Processing: An hour of driving approximately corresponds to 1.5TB data from a car.Processing and interpretation of collected data requires efficient algorithms and high-end computational resources.2) Map storage and sharing: One of the primary challenge is the design of common energy-efficient framework for edge servers which can store and share the HD map to the autonomous vehicle through local wireless (802.11p),cellular or hybrid communication approach.3) HD Map update: Approximately 10-15% of surrounding or street scenes are expected to change because of the development in infrastructure.Therefore an energyefficient approach and scheme to update the existing HD map, rather updating the database in periodic manner.4) Intelligent driving: The amount of information perceived by sensors in city and highway driving is different, intelligent algorithms developed for Edge server assisted HD map update can help to identify the sensory information needed to map and update.

Takeaways
HD map is essential and an emerging technique in autonomous driving.Present HD maps are available from the semantic and geometric perspective.HD maps can be created locally every-time using vehicular computing unit, but this tends to be compute intensive.NRF, MFG and deep learning techniques can be explored for data generation, map creation and global HD map update.Crowd-sourced map update is promising approach, however data merge, schedule and aggregation approaches should be regularly optimized.

C. SLAM
Simultaneous Localization and Mapping often abbreviated as SLAM has been widely researched in robotics, and autonomous systems, including indoor applications focusing on warehouses and manufacturing units.In an autonomous vehicle, SLAM is a process utilizing algorithms to estimate the real-time position of the vehicle by continuously perceiving and sensing the environment using embodied sensors.The goal of using SLAM is to create a virtual environment for the vehicle by identifying the obstacles, and infrastructure, thus assisting in creating a path for safe navigation.
In [134], [112], authors have proposed maps [235], [345], [392], also referred to as 3D maps, in combination with SLAM for efficient and precise localization.SLAM techniques are mostly dependent on algorithmic approaches such as probabilistic roadmap (PRM), rapidly-exploring random graph (RRG), rapidly-exploring random tree (RRT), and parti-game directed RRTs (PDRRTs).These algorithms are designed to accurately search the subset of euclidean space over the highdimensional geometry by randomly building a space-filling tree (RRT).SLAM application demands low latency (5ms or less) and high computational resources, thus consuming a significant amount of energy from on-board computing unit.Recent SLAM approaches have been proposed without the use of a Global Positioning System (GPS), and can be separated into two categories: Filter-based techniques and Optimizationbased techniques.The filter-based category is primarily built on the Bayes theorem, thus utilizing Probabilistic estimation using Bayesian filters.
Some of the commonly used approaches are: Kalman Filter, Extended Kalman Filter (EKF), Unscented Kalman Filter (UKF).In the same category other used techniques are particle-filters such as FastSLAM, Rao-Blackwellized Particle filters and Monte Carlo filters, commonly practised as learning algorithms for dynamic Bayesian networks.Table V shows a list of popular slam approaches that are based on line of sight sensors, radar, and their fusion.Recently visual or 3D SLAM approaches have been a popular method to localize the vehicle within the environment.The table categorizes the type of SLAM techniques such as 2D SLAM (Camera) or 3D SLAM (RGBD camera and LiDAR).Depending on the input data, a grid, voxel, or point cloud map is used for projection or visualization of SLAM methods.The Optimization-based category for SLAM is primarily based on Graph SLAM, which is also motivated by the Bayesian theorem and is primarily a graphical representation of it by utilizing the matrix form and thus relating the state of the vehicle within the environment.The matrix consists of values or information related to vehicle pose, which can be used to solve the localization problem.
The techniques utilizing Graph SLAM are: Oriented fast and Rotated Briefs-SLAM (ORB SLAM), Large-Scale Direct Monocular SLAM (LSD-SLAM).Other commonly used techniques are based on deep learning practices such as: CNN-SLAM, DeepFusion, Deepfactors, Structured-SLAM, DRM-SLAM.These practices are promising bases on their evaluation and performance on driving datasets such as KITTI, however, they still pose a challenge based on efficient and faster computation scenarios required in non-identical practical driving situations.
Compared to SLAM approaches involving point clouds, visual SLAM is a more preferred approach in terms of cost which uses significantly less expensive cameras compared to LiDARs.However, visual SLAM may not be precise and as accurate as point clouds based SLAM approaches, but it is sig-nificantly faster on standard computing devices [328].Another disadvantage of visual SLAM is being sensitive to the changes in the scenes, illumination and appearance.The accuracy and precision of proposed SLAM approaches could perform differently in dynamic or bad weather conditions.In terms of advantage, visual SLAM has better graphic coverage than point-clouds unless multiple LiDAR are used.Deployment of SLAM in Edge AI environment bring several challenges and opportunities, key points can be highlighted as: Takeaways 1) Computation: In general the SLAM application demands high computation cost for smaller maps, several problems with respect to processing and accuracy can be encountered with respect to non-ideal conditions and size of data captured for processing.At present powerful GPU devices are required for processing, which brings the overall cost of vehicles high.2) Latency time: For real-time execution, latency must be lower than 5 ms if incorporated using Edge or Cloud Computing.3) Algorithm: DNN approaches used for SLAM makes it suitable to operate in familiar environment.However, change in location, weather and daylight conditions can bring additional complexities as the sensed output will be inconsistent and DNN model will not be able to process it.4) Execution: Future connected vehicles are expected to execute services in distributed manner (at the vehicle, edge-server or cloud).With the current DNN algorithms, computational, latency and network bandwidth requirement, it is more realistic to process and execute SLAM at the vehicles on-board computing unit.

D. Vehicular Communication
Communication within vehicular environment plays a key role in self-driving functionality [286].V2X or vehicle to everything communication is another key factor in the selfdriving vehicle ecosystem that allows and enables the communication between vehicles to any relevant entity in the environment for example pedestrians, traffic lights, data centres.V2X comprises of several sub-components and standards such as V2V (Vehicle to Vehicle Communication), V2I (Vehicle to infrastructure), V2P (Vehicle to Pedestrian), V2N (Vehicle to Network), and V2G (Vehicle to Grid) has also been included considering the electric vehicles, charging stations and their involvement in the infrastructure.The Ideal system in V2X communications comprises of pair of radio devices often called as On-Board units (OBU), and Road-side units (RSU).OBU's are placed in the car, sharing car-related information to the RSU and receiving the traffic or surrounding related information from it.Some of the popular modules include [314], [297] which has already been released in the past 4 years.Also hybrid communication approaches combined with cellular technology (CV2X) [247], Dedicated Short-range communication modules (DSRC) [136], [106], [74], also with the LTE based systems and 5G [1], [289], [294], [213], [217] has been proposed.In [117] authors explored reliable connected-vehicle services using wireless local area network, ad-hoc network or hybrid communication architectures using cellular connectivity.To estimate the time duration for connection establishment probabilistic model implementing single-hop communication link in vehicular networks [135] is explored.To further ensure the reliability of communication in vehicular ecosystem a reliable emergency message dissemination scheme (REMD) [26], has been presented by authors.Results from REMD scheme shows high reliability which is around 99% in each hop with low overhead, delivering the message for time-critical applications meeting the lowlatency requirements for sensitive applications.The authors also employ the zero-correlated unipolar orthogonal codes to combat the hidden terminal problem.In the approach the periodic beacons are exploited, to precisely estimate the reception quality of 802.11p wireless link in each cell; then, uses this information to determine the optimal number of broadcast repetitions in each hop.In addition, to ensure reliability in multihop, cooperative communication within the network is also enabled, The simulation results show that REMD outperforms the existing well-known schemes for reliable communication.
The initial vehicular communication was developed considering the local wireless networks such as dedicated shortrange communication or Wi-Fi (802.11p) which is an updated version of 802.11b to enable wireless access in a vehicular environment.However based on the scalability some other versions such as C-V2X [247] were proposed which operates in both the 5.9GHz spectrum and also in the cellular spectrum thus providing channels for long-range communication between vehicles and the surroundings, Table VI shows some of the popular long-range communication technologies.The solutions consisting of proposed combinations can provide low-latency, high reliability and throughput demand [106].Also to overcome these challenges another approach such as next-generation V2X (NG V2X) or New radio technology (NR V2X) [217] has been proposed, as per the results, these approaches overcome the challenges and have better network performance and parameters.Key communication technologies proposed for vehicular communication are discussed below.
DSRC: One of the initial technology proposed for mediumrange vehicular communication is dedicated short-range communication (DSRC).This technology can be used in autonomous vehicles to deploy applications within a transmission range of 25-100 meters.It is a sub-protocol within vehicle-to-everything (V2X) that can enable communication between vehicle-to-vehicle (V2V).V2V supports automated message propagation and exchange of vehicle information (e.g., velocity, acceleration, separation distance, the direction of travel) with nearby vehicles.The purpose of exchanging these messages and vehicle information is to improve traffic conditions and to implement safety applications, such as collision avoidance and safe overtaking [76].With the increase in message transmission capability, recently proposed methods also include cooperative perception using V2V communication [103], [367].Potential driving and safety-critical applications developed and tested with DSRC are collision warning systems and emergency braking [136], [106], [8].However, with the evolution of next-generation vehicular communication technologies and use-cases requiring high-volume data transmission, the technology has not been widely adopted by automotive manufacturers and communication providers [258].
C-V2X: Cellular-V2X is based on the sidelink LTE radio interface enabling point-to-point communication with nearby vehicles and devices.As described in 3GPP, C-V2X generally operates in two channels i.e., 10 MHz or 20 MHz, and includes LTE-V2X and 5G-V2X [247].C-V2X utilizes a timefrequency resource structure, where the time is divided into 1ms sub-frames, and the frequency channel is divided into 180 kHz wide resource blocks.These resource block exists in the same sub-frame and can be further clustered into sub-channels [1].Resource allocation schemes and optimization techniques were proposed in [1], [247] to improve network latency performance.Network performance measurements and scenarioin-loop field-testing method for 5G-V2X were presented in [294], where applications for testing involved braking, obstacle detection, and tracking.A shortcoming in C-V2X technology, in comparison to DSRC is that the vehicles cannot process and exchange messages directly, as it is dependent on the LTE.Another flaw in the current approach is the inability to work in remote or geo-locations with poor cellular/network coverage.
NR V2X: New Radio (NR) V2X is designed to complement the applications that are not fully supported in C-V2X because of varied latency, bandwidth and throughput requirements [217].NR V2X use-cases comprises of efficient and reliable delivery of aperiodic messages, which was not very well supported in C-V2X [251], [22].As compared to V2X, NR V2X also supports groupcast and broadcast transmission methods which are specifically required for applications such as vehicle platooning [217], [22], [54].The development in this category will bring several opportunities for urban and highway driving services, such as platooning, predictive planning, and real-time edge analytics involving traffic flow management and forecasting.Several challenges exist in vehicular communication in terms of latency, privacy, and reliability.
Lessons Learned 1) Latency: In an urban driving scenario, multiple vehicles could be in the same location and will be communicating with the local edge server.This situation brings a challenge for real-time low latency applications such as SLAM, which requires transmission of huge data from vehicle sensor to edge server and vice versa.2) Privacy: In vehicular communication, some sensitive information such as vehicle registration number, vehicle health, real-time status along with sensors data, and statistic models is shared.Sharing this information exposes a threat of data poisoning, model weights manipulation and adversarial attack on the system.3) Collaborative application: As mentioned a local edge server will be communicating with multiple vehicles, and the vehicle is also communicating with a peer vehicle for the applications implementing collaborative driving.The collaborative driving applications require data aggregation methods and processing practices at the edge server to combine similar data from multiple sensors sources and have a common prediction.

E. Energy Efficient Approaches in Autonomous Driving
Autonomous systems such as robots, unmanned aerial vehicle are mostly powered by fixed battery source.The same assumption can be made for the future vehicles depending upon the availability of fuels and planning of the future sustainable transportation systems.For the current deployed autonomous vehicle, It is important to consider the energy required and used by sensors, automotive embedded processors and embedded devices, such as GPU, TPU and CPU while sensing the surrounding data and processing of algorithms.The energy consumed from the processor and devices can be derived by sampling the power consumption at the training of deep neural network model or architecture [77].Another brute force method could be to use power measurement devices with the embedded devices during the inference, and log the power consumption over the processing of algorithm.However these approaches are not very much effective as the autonomous driving ecosystem consists of heterogeneous types of devices, in which some might not be equipped with TPU or GPU, therefore it is important to consider a neutral method to calculate the power usage, in which power consumption from each of these devices or nodes is categorically calculated [270] based on the type of processor.To further estimate the total power consumption for heterogeneous devices in distributed learning setting, summation of the total training time on each of them can be used.However this approach might not work for the federated learning implementation, as the training time between participating devices can significantly vary and the fundamental of federated learning is based on the communication rounds between the devices and the ultimate convergence rate.
Based on the approach such as resource or computational ability, only certain available devices are chosen for training during each communication round, as based on the specification the participating devices, they might not offer the equal computational capability [2].Also another factor in case of distributed training is the total time needed to train the model as it highly depends upon the communication efficiency between the participating devices and the server.It is important to note that in addition to the on-board energy consumption, these approaches also brings into account the energy consumption caused due to communication between devices, network stations and server [145].
Figure 9 is shown based on compare and contrast approach, to merge the content and show an overlap of energy-efficient methods covered in this survey paper.As shown the topics are divided into machine learning based application for autonomous driving services, Edge computing based methods for autonomous driving and the vehicular communication.As these approaches have varied system demands, based on latency, memory and computational requirement, an attempt to show the overlapped area where software approximation can be applied has been made.The emerging areas are Tiny ML (promotes deep learning in compressed form in embedded processors), Distributed Machine Learning & FL which implements collaborative training and inference among several embedded and edge devices.Mobile Edge computing has also emerged as a popular topic which allows processing of data and decision making process close to the Edge thus overcoming latency and memory drawbacks.Rest of this section discusses computing-efficiency and compression methods.
1) Computing Efficiency: DNN based vision oriented systems such as object classification, 3D object detection and SLAM are usually computational intensive, high resource and energy consuming tasks.The computing complexity relatively increases for real-time applications when these larger weight DNN are implemented on the embedded systems with limited memory [150].For example the currently deployed level 3 autonomous vehicles [148], [242] are mostly dependent on vision sensors systems and consumes significant resources in terms of memory and energy.The scalability of these applications on embedded systems with fully connected cooperative autonomous vehicles is yet to be known incorporating full ADAS features.With the implementation of fully connected autonomous driving, the common assumption is the complex calculation and usage of deep/dense neural network will increase the calculation time, thus making some real-time applications difficult to process within the required latency, and on the other hand, the large weights of the neural network will also bring challenges to some embedded systems with limited memory [305], [149], [87], [252], [317].Therefore, there is a need to implement and develop low-weight and compressed neural network for efficient and low-latency calculations.
2) Compression: Compression is an approximation technique which can be implemented for the model and the data to allow the real-time inference on resource constrained devices.Some of the popular compression technique in deep learning involves pruning, low-rank approximation, quantization, knowledge-distillation, sketching.Deep Compression [92] proposed by Han et.al, implements combination of pruning, quantization and Huffman coding on the state-of-art deep neural network such as Alexnet, VGG-16 by maintaining the architecture accuracy.In federated learning practices along with the deep learning approximation technique, the compression is also implemented in communication algorithms using sparsification of gradients.In this section this survey paper discusses these compression approaches by also mentioning some popular inference methods for resource constrained embedded devices.
Low-rank Approximation: A direct mathematical approach to compress a dense neural network is low-rank approximation.As traditional neural network are developed on filters and layer comprising of several matrix, factorization [293], [260] and decomposition [61], [118], [88], [333], [13], [349], [116], [156] of these matrix has helped in reducing the parameters from the neural network, the popular approaches involves singular value decomposition [116], [88], tucker decomposition [139] and canonical polyadic decomposition [21].For decomposition the approach can be targeted to reduce the parameter for overall dimension reduction or targeting a channel through decomposing the relevant filter.In [88] authors proposed a method in which convolutional filter with low rank are decomposed into several depth-wise and pointwise filter.With this approach the large scale model size is compressed and could be easily deployed on mobile and edge devices, however accuracy loss for the network is higher as few high ranked filter could still be decomposed in this approach based on the assumption from a neighbor low-ranked filter.Another approach to prevent accuracy loss is implementing sparse regularization [209], [13] in an hierarchical manner as this approach can enhance network learning by grouping the filter which can be decomposed based on magnitude.Other techniques [116], [156] involves finding kernel or filter with low magnitude during training to enhance the model learning (Accuracy) and later applying a singular value decomposition to achieve a better compression ratio.
Pruning: Pruning is originally a technique applied in agriculture or horticulture to remove certain parts of tree or plant (branch, leaves, stubs) which are not effectively contributing.Inspired from this idea, researcher has applied and implemented pruning in convolutional or deep neural network to compress and reduce the overall parameter of these neural networks and to enable deployment an easy process on resource constrained embedded device for real-time application which also requires smaller models with fast computation process.In current practice there are two popular approach for pruning, removal of weights [159], [126], [198], [97], [98], [96], [179] and removal of neurons [300], [370], [163], [194], [161], [191], [218] respectively.Removal of weights from neural network does not affect the accuracy of model as only those weights are removed which have a magnitude close to zero.Since the implementation of weights removal is based on sparse matrix computation, in some cases it requires dedicated processors to apply this method in neural network [198], [159].
For these methods authors have also proposed Structured Sparsity Learning (SSL) framework designs for hardware (e.g.mobile computing, FPGA framework) [159].In [97], [96] the approach covers pruning the soft-filter where filters are pruned while training a DNN model in iterative manner after the model has been trained for an epoch, based on the magnitude or score.The methodology used for scoring the filter is based on (l 1 or l 2 ) normalization.Once the model is pruned, there are changes in the hyper-parameter and dimension of the network, therefore it is important to adjust them by reconstructing the pruned filter using forward and backward propagation.The second approach which involves removal of neurons is based on heuristic methods and directly impacts the accuracy and overall performance of the neural network however the model performance can be optimized with the fine-tuning [344], [340] or model retraining practices.
Quantization: Uniform and non-uniform quantization techniques are popular methods to compress an AI model.In the uniform quantization technique [72], [51], [396], a linear approach is used to distribute the quantized values over the space uniformly.While in non-uniform quantization, the logarithmic or exponential approach is used to distribute the quantized values non-uniformly.Methods to quantize deep neural network non-uniformly is presented in [129], [355], [176], [120], which is based on quantization interval learning.
Here the quantization intervals are parameterized over the intervals, and the obtained function is applied over the weights and activation of the deep neural network to achieve model compression.Quantization has also helped reduce CNN's overall weight and size, which consists of many convolutional layers.Quantization for layers has been proposed in [5], [404], [82], [234] by using the statistical parameter or scaling factor for the layer.This granular based approach can significantly reduce the model size, however it also results in relative loss of model accuracy as a kernel or filter containing important feature will loose its weights because of another kernel or filter with no feature present in the same layer.A better approach to counter this problem is quantization in group [373], [275], [372], where kernel or filter with no feature or weights can be grouped together and removed.This approach maintains the architecture accuracy but requires additional scaling parameter for each layer.Recent used approach in granular quantization is with channels [111], in this approach the length of activation and weights are scaled for each channel to reduce the overall weight [167], [395] for each convolution filter during training.The scaling factor is applied on input feature maps and output feature maps of the channels as they have different lengths, which results in parameter reduction without loss in accuracy.Some applications require to modify or rearrange the parameter of convolution or deep neural network after the model is trained, this approach is often termed as quantization aware training and post-training quantization.Quantization aware training process includes retraining the model with methods such as: straight through estimator [70], [405], [364], target propagation [224], [154], [57], regularization [219], [264].
Knowledge-Distillation: Another efficient approach of deploying large sized neural network to edge devices is Knowledge distillation.This technique [6], [312], [363], [237], [208], [128], [52], [263], [184] consists of two processes, in first part the large model is trained over a complete set of dataset on high performing devices, which results in output feature maps predictions.In the second process a compressed version of the large model is trained over the dataset (sampled form + ground truth), which results in output feature maps predictions, which is then combined with the output feature maps of larger model thus providing knowledge (distilled) from larger model to the compressed one by still marinating accuracy and net loss.Some approaches involves [263] direct correspondence between layer of large and smaller model sometimes also referred as utilising the soft probabilities from larger network to train smaller network rather than the ground truth, as this information not only contains the output feature maps but also the activation maps thus making the smaller network learning faster.This approach has shown potential for transferring the large models from high performance devices to edge devices or embedded processors, but to achieve high model compression ratio with soft probabilities or direct correspondence is still a challenge.As the other approaches such as pruning and quantization is capable of balancing a trade-off between accuracy and compression ratio.Some approaches [138], [226], [172] also involves using combination of multiple compression techniques: knowledge distillation, pruning, and quantization to achieve better accuracy and compression ratio.
3) Role of Edge AI: This section discusses the influence of edge computing and related applications on autonomous driving.As the volume of data keeps on growing with the number of sensors, a research direction is focused on processing data near the sensing device.Cloud computing, cloud centralized intelligence [192], [273] was initially proposed as solution for fully connected autonomous driving, however the latency requirement for time sensitive applications and the expected bandwidth (Table VII shows comparison of Edge and Cloud intelligence) for data transmission became a challenge.To address this challenge Edge Intelligence has been proposed as a suitable solution, which allows processing of data closer to the edge device rather than in a centralized cloud.In [400] the authors presented in detail about the motivation and benefits of using edge intelligence where the primary concepts highlighted and can be linked with autonomous vehicles are: the volume of data generated by vehicle senors at the edge device need machine and deep learning approaches for processing and decision making process thus proposing the concept of AI at the Edge.The concept has been proposed in several stages where the primary focus is on transmission of sensed data to the server or cloud for processing and decision making.The first stage contains the parameters of cloud intelligence shown in Table VII, thus allowing training and inference via a centralized cloud.The second stage comprises of edge-server joint training and inference.In this stage depending upon the requirement and processing ability the model can be jointly trained at the edge and server or at the server and inference occurs at both using distributed learning and computing methods.
The last stage of edge intelligence allows the training and inference occurrence on the device itself or near the device (edge) through data offloading and real-time compressed sensing approaches [385].For autonomous driving applications Pi-Edge [303] and AVe [73] are the two initial proposed framework consisting of driving services with data offloading and resource allocation techniques.Later proposed edge AI framework for autonomous driving [302], is also influenced by Pi-Edge and proposed data offloading and resource allocation scheme, thus allowing edge-server joint inference using hybrid communication architecture.However the framework misses energy saving mechanism and the assumptions on trade-off which data offloading and compression brings on the end-toend accuracy of the model.In [261], [262] the authors propose intelligent edge architecture for autonomous driving vehicles with OpenStack and ETSI open-source MANO.Using the architecture the allocated and resources of edge devices can be visualized at the server or cloud and also allows managing of mutli-access edge and mobile computing, thus allowing to free edge device memory from raw data using offloading.
In [115] the authors proposes an edge architecture with low latency communication and resource allocation scheme for compute intensive tasks.Using the reference architecture the authors designed an advanced autonomous driving communication protocol to enhance and facilitate communication between edge device, servers, data centers and the centralized cloud.Here the cloud contains legacy or ground truth data contributed from the vehicle sensors, servers, infrastructure sensors and the vehicular surrounding.For the decision making process a deep reinforcement learning approach is used for training and inference.The edge frameworks, offloading schemes and approximations are comprehensively covered in section IV and V.

F. An overview of Dataset for Autonomous Driving
An important requirement to develop machine/deep learning based autonomous driving services or tasks is dependent dataset.Several datasets has been made available by the universities research groups, and the automotive companies in the last decade.In this article an attempt to categorise these datasets has been made on the basis of sensors and the driving application which can be derived as a result.Based on convolutional neural network, one of the most researched topic is object detection containing several classes such as pedestrians, traffic signs, lane, vehicles (cars, truck, ambulance, school bus).The advancement in minor features recognition from the image or video frames also resulted in development of applications such as: vehicle model detection, license plate classifier, and other cooperative applications.Some of the commonly used datasets are KITTI [203], Cityscapes [55] and PASCAL VOC [318].After 2017 high quality data comprising of multi sensors primarily camera and LiDAR has been collected and released for development of advanced applications targeting level 5 autonomy [105], [292] also shown in Figure 10.
To prevent developing biased AI models, the traffic scenes or data were also combined from multiple continents, countries and cities.The EU Long-term dataset [352] is collected in several location within europe, nuscenes [39] collected in Singapore and USA, comprises of multi-sensor suite.Argoverse [43] dataset collected by Ford is one of the unique dataset which also provides functionality to try and test the high definition map applications based on LiDAR and camera sensors.As the sensor/data fusion approach is being researched for low powered embedded devices, the driving tasks, such as adaptive cruise control, path planning, and SLAM has involved usage of radar sensor values with the LiDAR point clouds and the camera frames.Radarscenes [269], Astyx HiRes [205], Ford multi av [4], Neolix [323], Pixset [63], are some datasets which provides the annotations on data based on these three sensors.Similarly another high quality dataset also comprising of HD Map annotation has been made publicly available by the Deep Route AI targeting the level 4+ Full-stack selfdriving system.Table VIII shows list of open-sourced datasets available for the AI model development and testing.
Lessons Learned 1) Adversity: Popular datasets do not include unexpected or undesirable uncertainties, as it is difficult to estimate a ground truth for them.Similarly, there is a limited representation of different weather and light conditions in the training and testing datasets.An AI model trained/validated on such a dataset might not be generalisable.2) Biases: The majority of the datasets are collected from urban driving conditions.This does improve the accuracy and development of an AI model for urban driving scenarios but also brings significant challenges to the model's adaptability to diverse and dynamic conditions such as highway driving or severe weather conditions.3) Disparity: A form of bias can be inherited in AI models due to the disparity of annotated classes.Popular driving datasets generally discuss the number of scenes, annotations, and bounding boxes covered for training-testing.However, they lack a discussion on diversity and the distribution of classes covered.For example, the annotations of vehicles, and traffic signs are much higher represented as compared to cyclists, motorcyclists, or pedestrians.4) Data fusion and Collection format: Statistical models are developed and adapted as per the format of datasets.Current datasets vary in logging approaches which brings challenges to model or cross-data transformation which can also create a bias on the developed AI algorithm.

IV. EDGE AI WITH AUTONOMOUS DRIVING
Edge computing systems have already been used and tested IoT use-cases or applications, which require relatively less computation, and power [171], [387].Hardware manufacturers such as Nvidia, IBM, Intel, Qualcomm, NXP has developed and released edge computing hardware with respect to the dedicated tasks such as speech recognition and vision based applications.For autonomous driving the edge intelligence demands data processing pipeline which should be capable of data management, analysis and data storage.Popularly used vehicle edge computing devices include Nvidia's Jetson and Xavier Platform.These devices are largely used in combination with on-board sensors such as: cameras, LiDAR, radar, IMU, GNSS and V2X module or router for communication with other devices and server.As per current description the subsystems required to enable fully connected autonomous vehicle comprises of: the autonomous vehicle containing  Images are from Lyft, KITTI, nuScenes, ApolloScape, and ONCE dataset [105], [203], [39], [112], [200], respectively.cellular or edge connectivity, the roadside units connected with the infrastructure, Edge server, the micro data centers, and lastly the cloud or main server having connectivity with all the mentioned subsystem and the autonomous vehicle, a description and layers are shown in Figure 11.It is important to note that the introduction of vehicular edge computing and intelligence [371], have further strengthened the scope and area of vehicle-to-everything communication (V2X) [1], [210].
The key components for enabling edge artificial intelligence for autonomous driving includes edge training, inference, caching, optimization, and communication.Vehicular communication has already been covered in the previous section, however distributed approaches such as federated learning remains, therefore this section first discusses Edge training and inference, Edge computing-based applications for autonomous driving, and recently proposed federated learning approaches, cooperative and collaborative autonomous driving.

A. Edge Computing and Intelligence
The future of autonomy in vehicle has been previously proposed with centralized cloud [273] and machine/deep learning algorithms deployed at cloud [192], however transmitting the large volume data from the vehicle to cloud and receiving the model weights from cloud to vehicle brings latency issues for the time critical applications such as SLAM.This technical challenges leads to bringing artificial intelligence closer to the edge using distributed learning, in this context edge device (present in vehicle) and edge-server (present in vehicle surrounding), corresponding abstraction of Edge AI layer is shown in Figure 11.Some of the proposed collaborative applications and approaches includes perception [46], SLAM [101], [346], [10], HD map [381], collision warning systems [60], [80] and path planning [310].
In cooperative perception applications at edge, F-cooper [46] provides collaborative object detection using high level fusion from multiple vehicles LiDAR point clouds.For object detection authors used voxel feature fusion (as shown in Figure 6), and spatial feature fusion approach.The object detection methods were lightweight and allows the transmission and sharing over dedicated short range communication.The presented approach is deployed in the edge device and the method was tested using real-world data.Similar approach is presented in [20], here the authors proposes an early fusion scheme and late fusion scheme.The early fusion scheme is used for detecting the objects and the late fusion scheme is used to propose the bounding box on the detected objects.For testing the proposed approach the authors used the synthetic dataset over a T-junction and roundabout vehicle environment.For evaluation of the proposed schemes the precision, communication cost and on-board computational latency has been compared.An approach based on value-anticipating networking is proposed in [103], here the vehicle based on previous learning decides about transmitting the sensed information to other vehicle.Another cooperative perception [17] is proposed using deep reinforcement learning for connected autonomous vehicles.The proposed model uses scheme to select sensed data for transmission amongst the connected vehicles.The authors further develops a cooperative vehicle simulation platform for object detection and communication.
Similar to perception, collaborative SLAM using edgeserver [346] has been proposed for highly automated vehicles.As mentioned previously SLAM suffers with high computational demand and low latency requirement.To overcome computational requirement cloud-based SLAM has been proposed [268], however some drawbacks in centralized approach are the extreme low latency requirement and the current uplink bandwidth.Edge assisted SLAM [101], [346], [10] approaches includes efficient computation, task scheduling algorithms, data offloading and sharing strategies.The backbone used in [346], [10] is ORB-SLAM [215] and ORB-SLAM2 [216] which provides the algorithm centimeter level localization accuracy.The approach uses distribution of SLAM block from ORB-SLAM2, across the edge-device and server thus overcoming the edge-device(on-board) computational complexity and processing the computation at the edge-server.To further improve the results and high precision, approaches involving crowd-source semantic mapping or fusing the results with HD map [177] can be proposed.

B. Edge Training and Optimization
In collaborative learning setting for autonomous driving, training or retraining a model will be common practice as edge devices present in vehicles collaborate to train, a deep neural network model with the help of server acting as mode of parameter or weight updates for edge devices.For autonomous driving the edge training and optimization model should consist of model that needs to be trained, training acceleration methods, optimization parameters and model uncertainty estimation.Inspired from this, an edge training and optimization process consist of training dataset present as either raw-sensed values or as the legacy data, and the tunable parameters.For edge devices training can be organized for an individual edge device or for group of edge devices [386].
While training a model on single edge device no inputs or parameters exchange occurs, however in group training the participating edge devices communicates and share the model weights and parameters as per the set iterations.
The computational demand and memory requirements for individual training is much higher, therefore using distributed and collaborative learning approach, attention has been given to group training [361] to address the computational demand.In the group training of devices an attention is also given to communication-efficient approaches to better energyefficiency, improve the communication round and decrease the training time.In [306], authors proposed a stochastic gradient descent method for improving the convolutional neural network training on the edge devices.The approach consist of sparse methods to improve the convergence rate and overall performance parameters of the model.To implement compression the gradient sparsification methods are used, which reduces the communication cost by identifying the gradients needed to share.To counter the convergence rate, which can be caused by the frequent sparse updates, a momentum residual is proposed.For evaluation, a model training using MNIST dataset was implemented.

C. Edge Inference
Edge inference is the process of converting raw sensed data into decision making task by processing them over the AI models deployed on edge device.As mentioned previously the approach is already being used for perception, SLAM, HD map and video analytics applications.Data flow and process of edge inference is shown in Figure 12.As covered in Section II, most of the existing AI models for perception and SLAM are developed on the devices/machines which are powerful and consist of high-end graphic processing units and excessive memory.Therefore to make the AI model deployment possible on resource constrained embedded/edge device [326], compression and software approximation approaches are implemented on the pre-trained models [309].Current Edge Inference practices for autonomous vehicles can be classified into three categories: local Inference on the edge device (vehicle), inference at Server, and jointinference at the vehicle and server [400].In the case of local inference, the sensing and decision-making process is performed on-board, this approach is currently in practice and requires large memory space and expensive computation devices [115].Local inference is very useful for lightweight applications such as on-board speech recognition.However, for heavy computational tasks, this approach suffers from computational complexity, data storage, and energy consumption problems.In server based inference, the sensing takes place on the vehicle or infrastructure sensors, and the data is uploaded to Server using wireless communication.The server is deployed with heterogeneous computing devices, processing the received data on the deep learning model, which are responsible for decision making process [113].An example of analytics oriented applications are presented in [385], [67], which contains of edge framework deploying edge intelligence based on a hierarchical manner.The approach is very useful to bring down the on-board computational cost and energy consumption, however, this practice brings challenges based on latency for time-critical applications, privacy, and security of data and model which is being shared over a wireless channel.Also, communication delay can be encountered from a corresponding server if it is responsible for the processing of data from too many vehicles at the same time.Edge-Server joint inference for connected vehicular applications is proposed in [302], [324].In these proposed approaches, the sensing takes place on-board, and based on the available onboard computational resources, part of the computation and decision-making process takes place on-board, which contains a lightweight or compressed AI model, and the remaining takes place at the server, which contains the global or dense model.After the model weights are generated individually, using an aggregation approach the model weights are combined and the decision process takes place.Edge-assisted SLAM, perception, HD map updates are some practiced and proposed methods.Some of the frameworks and approaches proposed in this category are [45], [44].In these approaches, the common practice is to split and partition the deep neural network amongst the participating devices and server.Resource allocation scheme [223], [196], communication-efficient algorithms [121], [281], task scheduler [212], [369], early-exit models [149], [317] and heterogeneity-aware layer [186], [380] are proposed in Edge-Server joint inference to take advantage of on-board and server resource to implement energy-efficient approaches.For further optimization of joint inference methods, a hardware acceleration approach such as parallel computation using heterogeneous architecture device [376], [190] is proposed.In similar category, software acceleration approaches [270], [223] involve resource management, Edge AI pipeline design, approximating compilers, and compression of models.
Lessons Learned from Subsections A, B and C:
Resource constrained: Edge device-server joint inference and optimization [339], [329], [276], involving edge device computation capability and associated local model accuracy with minimum cost.The resource in this context is computation, power capability and communication overhead between edge device and server.Joint optimization is prioritized using vehicle parameters such as position and velocity to ensure a round of communication and parameter update with local edge server.The system [339] comprises of connected autonomous vehicles where edge device handles the initial computation requiring less resources and offloads the heavy computational tasks to the distributed edge servers in the urban driving scenario, with local model training, selected model aggregation [361], computation complexity and weights transmission as primary matrices.For computation optimization a self-adaptive global best harmony search (SGHS) algorithm is used.For ondevice resource allocation combination of SGHS and on-board computing and transmission power optimization algorithm is used to enhance the local model accuracy.
Heterogeneity Aware: In collaborative driving the data obtained from multiple sources such as infrastructure sensors, legacy data available in server or from other vehicle sensors is of heterogeneous form [121].This basis and requirement bring heterogeneity aware distributed learning as a primary criterion for fully connected autonomous driving.Federated learning by choosing edge devices is addressed by [223], [309], [45] to counter the computational capability and communication bandwidth.In the approach edge server randomly chooses the client for model aggregation and requests for current communication and computation resource available for processing, based on the received information the edge server distributes the model parameters to the edge devices with high available resources for the model aggregation and which uses batch normalization approach for updating the global model.Another distributed approach is studied in [65] where the heterogeneous data is combined in subsets to minimize the aggregation loss from edge devices and improve the convergence, combination of these approach is also followed in [168], where low latency communication is ensured through quadratic convex functions.
Communication efficient: A semi-supervised federated learning (SSFL) is proposed in [64], to alternatively train the statistical model at the edge server with unlabeled data using semi-supervised fixmatch [122], [388] and mixmatch learning method [28].For acceleration and better convergence of local model, static batch normalization technique is used which is adaptation of batch normalization [122] and group normalization [388].In alternative training the local model at edge server is aggregated by retraining with the ground truth or legacy data to enhance the model accuracy at each round of training and in the next round of communication between the node and server the aggregated model weights are transmitted to update the global model and legacy data.Similar joint learning method is proposed in [45], [44], where the local model is re-trained over edge devices and is transmitted over cellular network to the base stations for global model aggregation.To minimize the model learning loss and to collectively use the communication bandwidth, the base station categorically select the edge device using greedy approach by proposing a resource allocation and power allocation schemes at base station and edge device respectively.For the power allocation scheme at the edge devices two primary criteria: retraining of local model and power needed for model or weights transmission is considered.Other proposed method includes sparsification of data and gradient, quantization for minimizing communication bandwidth, which has been discussed below.
1) Sparsification: For collaborative or federated learning the commonly used approaches for sparsification is to compress the gradient and/or the data.Edge computing or process-ing near the edge is being adopted as a popular approach for an autonomous vehicle.Instead of transmitting the data or raw data, the model weights processed at the edge is transmitted to the devices participating in communication.Reducing the transmission time [14] or using efficient delivery scheme, such as REMD is also proposed as communication-efficient approach in FL setting [117].Another approach [332], [398] proposed in FL use-case is to use of a lower-limit value in which the gradients with certain magnitude and greater than the predefined lower-limit are sent from the edge to the server and the left-over gradients are not used to weight or model aggregation.Using this approach the compression on the up-link and down-link communication can be implemented.However, the challenge is to choose the favorable lower-limit value, as similar to soft-filter pruning, the quantization and selection of the wrong lower-limit value can directly impact the overall model aggregation, which may provide an overall reduced model size but decreases the accuracy.
To overcome the previous challenge, stochastic gradient descent with k-sparsification is proposed in [288], by reducing the data and model size and also improving convergence through error compensation for the transmission taking place between edge and server.A similar approach is used in [7], the method proposes to fix the sparsity rate.The communication or transmission of the gradient is only enabled for a fraction of the gradient with the highest magnitudes and keeping the unused gradient in the container.The sparsity rate used by the authors is p = 0.001, and this approach has relatively less impact the learned model overall accuracy and performance.To further overcome this performance gap, authors in [181] proposed modifications to the existing approach through deep gradient compression.Deep gradient compression uses approaches such as: momentum correction, local gradient clipping, for the convolutional neural network and recurrent neural network.Results show that gradients are compressed by ratio of 270-660 following a hierarchical approach, without slowing down the model convergence.Sparsification methods were initially proposed with the function of improving and promoting distributed and parallel training among the cloud and data-centers.However, these methods lacked model convergence and aggregation as a scope which is currently a most essential metrics for the federated and distributed machine learning.Similarly, attention should be given to the number of edge devices participating in the transmission and the server participating in collaborative training.As the study in [181] shows the communication between the edge and server will not be compressed and reduced if the number of devices participating in training is less than the chosen sparsity value.
2) Quantization: Along with the usage for compression of deep neural network, the approach is also used in communication-efficient algorithms, with the goal of minimizing the communication bandwidth between the edge device and server.Quantization in communication applications with a federated learning setting, can approximate the weight updates on edge devices by limiting the update to a certain set of values.One such implemented approach on independent and identical distributed data is signSGD [27].In the proposed method authors quantized each gradient update to the allocated binary sign and reduced the bit size, with a value of 32.It is important to note that signSGD also implements compression at the server by approximating the gradient received from edge devices and further contains investigation and theoretical analysis of algorithm in distributed machine-learning setting.In this approach the participating devices transmits the information of the associated gradient to the local server which transmits back the updated and aggregated gradient sign to the participating devices for the local model aggregation.The analysis shows that this approach achieve a similar variance score in comparison to other contemporary methods and has a better convergence rate to a stationary point of a general non-convex function.Similar approaches of scalar quantization through stochastic methods are proposed in PowerSGD [319], ATOMO [322], TernGrad [334], QSGD [11], [12].
ATOMO [322] and QSGD [11] propose to quantize the gradients with a better convergence rate allowing faster distributed training of neural networks, which is highly suitable for enabling collaborative learning within the vehicleedge environment.However, the performance analysis in the vehicle-edge surrounding should consider trade-off such as accuracy-efficiency-reliability for safety-critical and real-time applications while accuracy-energy for the latency tolerable applications.While deploying such methods focus can be also given to compression ratio and convergence rate, as for communication and federated learning within autonomous vehicles it is necessary to consider compression in uplink and downlink transmission and communication.In [11], [12] authors theoretically analyse the quantized stochastic gradient descent to balance the trade-off with federated learning parameter: convergence and communication cost.In this approach, the edge devices are allowed to adjust the number of bits transmitted in each iteration of communication according to the variance.As shown in [11] the device in a federated setting can transmit around 2.8n+32 bits in one communication round (here n is the number of parameters in model).This setting leads to 5x approximate bandwidth saving.Similarly, to speed up the training amongst participating devices an approach is presented by [271] to perform gradient quantization using one bit, which can make the distributed training to be 10x faster.For evaluation in [271], authors used neural network with speech recognition which is highly anticipated use-case in autonomous driving [302], [303].
Dedicated uplink compression has been explored in [282] by using the quantization theory.In this work authors explores the transmission of trained model by identifying the available channel bandwidth through quantization scheme.The authors further propose an encoding-decoding approach consisting of partitioning, dithering, quantization and entropy coding at the encoding function and entropy decoding, dither subtraction, collecting and model recover at the decoding function.The evaluation of proposed quantization system is demonstrated through numerical study which shows error is mitigated through federated averaging and high federated learning performance gains.Contrary approach to scalar quantization methods, for the uplink and donwlink compression is vector quantization method [283].As compare to scalar methods, vector quantization offers dimension reduction along with the quantization scheme in federated learning setting.In the vector quantization method [283], numerical studies similar to [282] were conducted.The method comprises of encoding strategy similar to [282] and analysis using probabilistic quantization.However, a different decoding step of dither subtraction is applied to reduce the distortion and minimize the error.The approach also involves using of lossless source coding scheme in entropy coding and entropy decoding to generate nonuniform distribution of the quantized outputs.
Lessons Learned: 1) FL using Edge: Collaborative or joint-learning applications, such as Edge computing and Edge AI, complements federated learning.The advantages of using these techniques in conjunction with each other allow a reduction in communication bandwidth to the cloud and also promote privacy by not sharing/transmitting sensitive data.2) Compression: It is extremely challenging to implement traditional federated learning techniques within conventional edge devices.Model compression approaches have been explored to accelerate the training/inference by reducing the computational complexity and requirement.3) Re-Training: AI models deployed for connected vehicle applications can often encounter unseen data.Property of FL to retrain the model and update the weights through convergence benefits use-cases, such as HD map update.4) Communication Reduction: Current federated learning approaches focus on reducing the communication overhead through compression by overlooking the exploration of protocols that are lightweight in nature.

3) Overcoming Communication
Overhead: An open challenge for autonomous vehicles in federated or distributed learning environment is overcoming the computational complexity and communication overhead.Federated averaging [202] proposes methods to reduce the communication frequency to overcome communication delay by not initiating communication between device and server after every iteration.Rather the federated averaging method computes the weight for every participating device using multiple iterations of stochastic gradient descent.Implementing the approach on convolutional neural network and recurrent neural network, the analysis shows that communication between participating devices can be delayed upto 100 iterations by still maintaining the convergence rate.A key requirement for this convergence rate is that the data should be independent and identically distributed between the participating devices.The communication round can be further increased with a higher delay, but as a trade-off it increases the computational cost on participating devices.As shown in above subsections, the work to overcome communication overhead combines the use of sparsification and gradient quantization [34], [309], [168].These methods however do not have a better convergence rate.
A ternary quantization-based federated learning approach is proposed in [276] to overcome the communication overhead in uplink and downlink communication.The quantization method is implemented on the participating devices and the server thus implementing local training and global model update through weights.This approach also reduces the model complexity for the edge and server devices.For evaluation authors performed simulation considering the battery powered vehicle with connected autonomous driving capability to achieve fast inference and low communication overhead thus making inference possible on resource-constrained embedded and edge devices [64], [196].

Challenges for vehicular services:
Distributed learning has been a popular approach to tackle computation and communication challenges.Federated learning has provided alternative methods to re-train and deploy AI models with low communication and computation cost in dynamically distributed heterogeneous settings.Deployment of connected autonomous driving services (e.g.OTA update, traffic monitoring, and forecast) using federated learning approaches will enhance the privacy of data used for training and can also prevent attacks on the AI model.However, for real-time applications such as vehicle localization, and mapping, challenges exist in terms of computational resource requirement, latency, and communication bandwidth.A typical SLAM application in the vehicular application is deployed using large sensed data from a camera, LiDAR, and radar.The data size is approximately in gigabytes and should be processed by the AI model in less than 5ms, which also makes it challenging to transmit it to a nearest participating device for computation.Deployment of FL using Edge AI for vehicles can be considered as an optimization problem.The complexity further increases when energy-efficiency is considered as a direct parameter.A major challenge currently encountered for optimizing such efficient applications with FL context is the unavailability of the real-world large-scale dataset.As the problem has to be tackled by considering the communication and computing cost.

V. ENABLING FRAMEWORKS FOR AUTONOMOUS DRIVING SERVICES
Due to the limited computation, storage, and communication resources of edge nodes, as well as the privacy, security, low-latency, and reliability requirements of AI applications, a variety of autonomous driving oriented edge AI system architectures have been proposed and investigated for efficient training and inference.This section gives a comprehensive survey of different Edge AI frameworks and related architecture.It starts with a general discussion on different architectures and categorically comparison.

A. Autonomous Driving Framework
Since the development of deep neural network supporting perception and SLAM applications, researchers have focused on the design and development of simulators, software often referred to as a framework.Nvidia Drive [33], Waymo [3], ApolloAuto [18] are some commercially released driving frameworks supporting vehicular applications.Autoware [134] based on ROS is developed for an embedded platform that was released in 2018.OpenCDA [348], is one of the recently released and most complete open-sourced driving frameworks consisting of communication modules, real-time feedback and a simulation environment, thus providing a platform for cooperative driving applications.Following section details the architecture and components of these frameworks.
1) Autoware: Autoware [134] is ROS [250] based framework.It is developed on the concept of the sense-think-act model, also shown in Figure 13.It is primarily designed for vehicles driving in urban areas.Autoware is dependent upon perception-based sensor suites such as cameras and LiDAR for enabling object detection, tracking, and localization using deep neural networks.The sensed information is fused from both sensors to also create 3D maps around the vehicle, which helps in precise localization by combining it with SLAM algorithms and sensors such as GNSS and IMU.The other major components are planning and control, which is based on probabilistic robotics utilizing deep neural networks.The software can be installed on the autonomous embedded platform containing Ubuntu operating system by using ros packages and dependencies to enable self-driving functionality in urban scenarios.Additional software module development and sensors integration such as radar is in progress which is required for the highway and related driving scenarios.
2) Apollo Software Platform: Apollo software platform has seen multiple revisions since its release, the currently available version integrates processing components: localization, perception, prediction, planning, control, and communication (V2X).At present, the platform incorporates deep learning models to perform major tasks through a dedicated computing unit comprising of CPU and GPU.One of the unique components of this platform is HD Map which can be also be tracked on the generic display monitor to perform and visualize accurate localization.The platform can be easily integrated with autonomous embedded platforms running UNIX operating systems.However, one of the important to calibrate with respect to the sensors and computing hardware installed on-board.The components in the apollo framework [18]: Perception: The perception module majorly focuses on obstacle detection, traffic lights and lanes.The perception module is mostly performing 3D object detection and is implemented using a deep neural network focusing on the region of interest on the high precision map.The output from the object detection module comprises 3D bounding boxes around the object based on the class, height, width and probability of the detected object.In the background a detection to track algorithm is used in order to identify the individual objects with respect to the timestamps, this timestamp is logged in the system and later serve as feedback to improve the accuracy Figure 13: Sense-Think-Act model, which has been used as a backbone for autonomous driving frameworks [134], [348].
for the similar detected objects.The perception module utilises the data fusion strategy using the Kalman filter.
Localization: In the platform, multisensory fusion localization is used which is based on GPS, IMU, LiDAR, radar, and HD maps.The localization module is based on the fusion approach of the Kalman filter comprising of two-step prediction update cycle.It comprises of two major blocks, the GNSS localization which provides the position and velocity information and the LiDAR localization which provides the position and heading information.Finally, the inertial navigation solution is used for the prediction step of the Kalman filter, while the GNSS and LiDAR localization is used to update the measurement step of the Kalman filter.
HD Map: The high definition map [100] component in apollo comprises legacy data collected by sensors containing information related to road definitions, intersections, lanes, traffic signals.It is used to reduce the computational demand of the hardware by integrating the existing information of the street or lane the vehicle is currently driving on.In the apollo platform, it is also used as a safety feature providing centimetre level accuracy in localization of the vehicle.The steps involved in the development and publication of HD Maps include sensor data sourcing, processing, object detection and manual verification.In case of road or lane change, the existing platform utilises updates of HD maps in data centres through crowd sourcing which can involve data collected by other autonomous vehicles, smartphones and other sensors on the intelligent map production platform.
Simulation: Along with the on-device implementation, apollo platforms also provides the function to virtually create the driving scenarios by choosing the above-mentioned modules, dedicated deep neural networks and test driving scenarios, validate, and optimise the existing models.The simulation results of the driving scenario can be logged which can be further utilised as feedback for the development of algorithms and tackling the false-positive scenarios.
3) OpenCDA: OpenCDA [348] is one of the driving frameworks designed for cooperative driving with simulation and prototyping capability, it contains three major components which are: cooperative driving system, co-simulation tools and scenario manager.In the background the cooperative driving system is also based on the sense, think, act model and comprises of perception, communication, planning and control as the fundamental blocks to enable individual as well as cooperative driving.There is an application layer also present which is responsible for enabling cooperative perception, cooperative localization, platooning, and cooperative merge.For the second component i. e. simulation part, this framework utilises CARLA [66] for autonomous driving simulation and SUMO [146] for traffic simulations, and with combined integration of these two, the traffic scenes and simulation can be created for example vehicle platooning, traffic merge.The simulation tools exchanges information with the sensor and processed data, it continuously provides the HD map data to the system and receives control commands.The third component which is scenario manager exchanges information with simulation tools and cooperative driving system, to evaluate the cooperative driving states, and trigger special event and provide it to the simulation tools.The framework is developed in python and is also scalable for the 64-bit OS UNIX system.4) Openpilot: This is another framework in the category of conditional or partial automation.The framework is developed by http://comma.ai/[30] and was released in 2017, and with revisions and additions of new features from 2017-2021, it is primarily dependent upon vision sensors and provides assistance to the driver with the driving services such as adaptive cruise control (ACC), forward collision warning (FCW), lane departure warning (LDW), and automated lane centring.The framework is dependent upon the services or components which can be divided as: Sensors and actuators, Neural network runners, Localization and calibration, Control, and System Logging & miscellaneous services.The versions of the framework can be integrated into embedded devices supporting the android or UNIX operating system.
5) Autopilot: Autopilot [148] provides assistance to the driver by sensing the environment around the vehicle through high definition automotive cameras and ultrasonic sensors.The software stack comprises of assistance and safety features such as automotive emergency braking, collision warning (front, rear and side), obstacle detection and also include smart navigation systems thus providing actuation and control.The framework on the backend uses a deep neural network performing object detection, semantic segmentation, and depth estimation to further provide the feedback and output for motion and path planning algorithm which suggests optimal route and actuate according to the destination set in the navigation.The software framework was initially designed to support the driver for highway driving scenarios and is also being tested for urban driving conditions.6) CARMA: This framework [37] falls in the category of cooperative driving by enabling connected vehicles.The software stack is programmed in C++ programming languages and is configured using the ROS environment for the Ubuntu operating system.The framework utilises the Autoware citeAutoware for enabling level 3 automation capability and additionally contains a communication module in the sensing layer which includes DSRC, V2X and cellular connectivity, thus initiating communication and exchange of information with other vehicles, infrastructure and the cloud.The cooperative feature of this platform consists of four levels of planning for the vehicle which includes route planning, maneuver planning, trajectory planning and command planning.
7) AutoC2X: AutoC2X [310] is a cooperative driving framework that is a combination of two software: Autoware citeAutoware and OpenC2X citeOpenC2X developed for cooperative driving applications.OpenC2X is cooperative intelligent transport system software that is open source and is helpful for prototyping solutions such as traffic management, and platooning.AutoC2X setup comprises of pair of devices which is a computing unit and router, installed with AutoC2X-AW and AutoC2X-OC software at the car and infrastructure respectively.The flow of information can be from car to infrastructure or from infrastructure to car.For the test experiment, the authors enabled cooperative driving services such as perception, coordinate transformation, localization, path planning through a proxy cooperative awareness V2X messages.The results from the experiment show that cooperative perception messages using AutoC2X were delivered within 100 ms.

Lessons Learned:
1) Stack: The discussed autonomous driving framework incorporates popular deep-learning algorithms to perform perception, localization, mapping and path-planning tasks.2) Resource: These frameworks require an onboard high-performance computing device with extensive memory capacity to process large-volume data and deploy intelligent algorithms such as CNN, DNN, or RNN. 3) Energy: The presence of extreme resources and computing devices results in high on-board energy consumption, which has been overlooked.4) Communication: Initially proposed driving frameworks lacked the presence and usage of a communication unit/module, which is highly important to enable collaborative driving and fully autonomous vehicle.

B. Application oriented Frameworks
In autonomous driving frameworks, the other proposed approaches are tasks oriented and are strongly influenced by distributed or collaborative learning approaches.Popular research directions for an energy-efficient edge in these categories are data partition, model partition, Offloading, and communication.In the data partition method [273], the collaborative compressed sensing approaches are used, which allows the distribution of data amongst participating devices, thus leveraging repetitive computational load on an individual device.Model partition approaches [291] utilize resource allocation schemes [385], which are based on the availability of computing resources at the participating devices.A large DNN model is split into smaller forms for collaborative training and inference.Using the server as the central or primary mode of communication in edge-server joint inference applications computation offloading-based edge inference systems [211], [391], [108] has been proposed.The approach involves offloading data or offloading a part of the inference load or the entire task to the edge server in the surrounding.In this context, communication and resource-aware techniques are also implemented, which decides on choosing a server amongst the available server based on latency.
Lessons Learned: 1) The approach proposed in these applicationoriented frameworks for connected vehicles considers either data reduction or model reduction, which can result in energy-saving mechanisms from either computation delay or communication perspective.However, for energy-efficient connected vehicles, both metrics need joint optimization and acceleration.
2) The communication approach proposed in these use-cases generally considers ideal conditions in communication.However, the communication in the vehicular ecosystem is often dynamic and heterogeneous, which consists of several low, and mid-range protocols with minor differences in distances.Therefore, another limitation of these frameworks is the inability to work in dynamic network conditions.3) Similar shortcomings can be seen in computation as well.Edge in the vehicular ecosystem is constructed from heterogeneous devices with different computing abilities.AI models proposed in these application-oriented frameworks does not account for computing heterogeneity which may lead to miscellaneous cost.
C. Energy-Efficient Edge Frameworks 1) OpenVDAP: Open vehicular data analytic platform (OpenVDAP) [382] is a data analysis framework developed for connected autonomous vehicles (CAV) with the design requirements of edge computing.The services included in OpenVDAP are real-time diagnostics, advanced driver assistance systems, infotainment, and other quality-of-experience services.The platform is developed to deal with low latency applications in autonomous driving by collaborating with the other edge nodes (other vehicles), base stations, local servers, and the cloud in the driving environment.With respect to the application, the platform consists of on-board heterogeneous computing, a communication unit, an edge-based vehicle operating system (EdgeOS v ), a driving data integrator, and edge computing aware libraries for vehicular data processing.The primary purpose of using these components is to intelligently allocate the on-board computing resource to the algorithms for the data processing, implement the data offloading strategies and also enable communication between the vehicle and infrastructure.
2) CAVBench: The benchmark suite [330] was proposed to evaluate the performance of edge computing frameworks and software in connected autonomous driving services.Applications or services included in the CAVBench are object detection, tracking, SLAM, battery diagnostic, edge video analytics, and speech recognition, which are similar to the components included in OpenVDAP [382].The services and deep learning algorithm associated are evaluated based on latency (on-device processing), and power consumed as these can help in the development of an end-to-end autonomous driving application.For the evaluation purpose, the state-of-art algorithms such SSD [188], ORB-SLAM [215] were implemented and resulted in observations such that the priority is to be given real-time applications with the latency demands for instances the demand for localization and processing is greater than the tracking.Therefore, the system demands a processing layer or container to execute the driving data and tasks in a hierarchical manner.The observation also shows end-to-end deep learning applications can decrease the processing latency of computing units with heterogeneous structures.Therefore, distributed algorithms can perform better than the baseline for some of the autonomous driving services.
3) π-Edge: To enable the computational intensive tasks simultaneously on resource-constrained embedded systems, π-Edge [303] is proposed which enables edge intelligence on the low powered embedded devices using the operating system π-OS.As the present embedded devices contain heterogeneous computing structure [387], [259], the authors proposed a heterogeneity aware run-time and scheduling layer to execute the tasks by targeting the on-board energy efficiency.The framework also contains a component that enables the communication between edge-node and server and also performs the data offloading tasks to save the on-board power consumption.For offloading experiments, authors used applications and data from object detection and speech recognition, as their latency demand (requires approx 100 ms) is more compared to SLAM applications (should be performed within 4 − 5 ms).The offloading algorithm is implemented through collaboration between edge-node(vehicle) and the server where it categorically searches for edge-node where data can be offloaded and estimate a time required for this application along with the needed computational resources.If the server is not capable of offloading the data the information is shared over the network with the purpose of executing the offloading task on the next available local server.The results were demonstrated by integrating the framework on Nvidia Jetson devices which consume 11 W of power.
4) MobileEdge: As connected autonomous vehicles are processing and integrating multiple driving services at the same time, the vehicle computing unit can face significant load because of computational complexity.To address these issues several distributed computing approaches in the vehicular ecosystem has been proposed.MobileEdge [324] is one such edge computing framework that utilises the main vehicle computing units and the other resource-constrained edge-node or devices such as raspberry pi or Hikey970, present in the vehicular ecosystem.The architecture of the MobileEdge framework consists of two processes one which is executed on the vehicle computing unit and the second process which occurs on the random edge-node.The vehicle computing unit further consists of a management system and device resource monitor, the on-board task scheduler and the task execution process.while the edge-node consists of resource monitor, task receiver and task execution process.The communication between the vehicle computing unit and edge-node is initiated over the local wireless network.The resource monitor on both devices is responsible to track the system usage and being aware of the power consumed.The task scheduler manages the incoming raw data from the sensors and passes them for execution or to offload it to free resources.The task executor process the driving services associated such as video analytics or speech recognition.Task receiver module which is present on the edge-node receives offloaded data from the vehiclecomputing unit and pass it to task execution module of edgenode, by implementing the distributed computing application.
5) LoPECS: LoPECS [302] is another low power edge computing system for real-time autonomous driving.It has addressed the challenges of implementing computational intensive tasks on resource-constrained embedded devices and can be considered as an extension of π-Edge as it replaces the π-OS with the real-time OS which is lightweight as compared to traditional used ROS.The architecture of LoPECS contains four major layers: services classification, runtime layer, heterogeneous aware layer and edge-server coordinator.The services classification layer helps in the identification of tasks and features which needs real-time execution and associated power consumption.The second layer is runtime which contains the real-time OS, architecture-aware scheduler and API.The architecture-aware scheduler can be further categorized into the inter-core scheduler and inner-core scheduler.This scheduler helps in processing the incoming data and acts as a data pipeline to the systems GPU, CPU, video and audio accelerator.The last layer is the edge-server coordinator and it performs the data and algorithm management strategies by enabling communication in the vehicular environment.This layer is also responsible to implement data offloading strategies.For the evaluation purpose, the framework combining SLAM, object detection and speech recognition is implemented on Nvidia Jetson TX1 (15 W capacity) with consuming 3.5 W on GPU, and 4.2 W on CPU from these tasks and still allows resource and memory for implementing other driving tasks.
6) AC4AV: AC4AV [383] framework is designed for connected autonomous vehicles and proposes the access control techniques for the autonomous vehicle.The framework also utilises a data processing and abstraction method in which the sensed data from the sensors is identified and applied for access related applications.The primary purpose is to protect the sensed data from phishing attacks or being maligned from the vehicle environment.The architecture of AC4AV comprises of three-layer to prevent the raw sensor data from unauthorized access which are: access control engine, action control, and lastly a logger database.The access control engine provides dynamic authentication to access the data and also incorporates a data processing layer that identifies the type of data and its relevant use in the autonomous driving services, as the vehicle is sensing from several sensors and the same data can be used for multiple algorithms.The action control service layer is responsible for two tasks which are action capturing and responding.The last layer is the logger database which captures and records the actions.The information from the logger database can be used as an audit for future actions as it can help in improving latency for targeted applications.The implementation is based on publishing and subscribing, a classic approach for message and communication within an embedded environment.A similar framework autonomous vehicular edge [73], is based on ant colony optimization, which includes offloading and task scheduling strategies with a decentralized approach.In this paper, the task scheduling strategies use a generalization assignment problem and is categorized according to the driving priority and latency demand.The computational complexity using a greedy algorithm and ant colony optimization were analysed in which the computational power is measured along with the latency and ant colony optimization results in latency less than 1 ms.

Key Takeaways and Lessons Learned:
1) OS: Traditional autonomous driving frameworks used ROS or similar open-source systems integrated with Unix to deploy CAV.In contrast to the on-board vehicular frameworks, the discussed edge frameworks are integrated using a custom lightweight OS to reduce computational delay for computing-intensive applications on resource-constrained devices.2) Scheduler: As the vehicular services are hierarchy-oriented and require execution within a short timeframe.These edge frameworks focused on integrating a scheduling algorithm also sometimes referred to as the runtime layer to optimize the data processing for vehicular services.3) Communication: The discussed edge frameworks mostly used the combination of OBU and RSU to exchange vehicle data and model weights.A few frameworks also used local wireless networks (802.11b)installed customarily at the road intersection to initiate communication.However, the frameworks lacked testing the communication heterogeneity using the combination of edges such as base stations, RSU, cellular stations, and embedded devices integrated with wireless modules.Strict latency transmission of information to The communication approach proposed in these use-cases generally considers ideal conditions in communication.However, the communication in the vehicular ecosystem is very dynamic and heterogeneous, which consists of several low, and mid-range protocols with minor differences in distances.Therefore, another limitation of these frameworks is the inability to work in dynamic network conditions.4) Data: A shortcoming in the edge frameworks is the inability to handle high-volume data from the vehicle sensors in case of collaborative inference between multiple vehicles.These frameworks do not propose any modules to offload or aggregate the sensed data at the edge.This may result in flood of data at the edge and repetitive computation for the redundant data.

VI. RESEARCH OUTLOOK AND OPEN PROBLEMS
This survey studies a comprehensive and categorized review of approximation techniques and energy-efficient methods for autonomous driving services.The perspective and basis on selection of topics is based on previously and recently proposed AI and Edge Computing approaches for the driving services considering model size and real-time deployment for the low powered embedded devices, and the relevant conclusive factor of these approaches is based on the heavy computation complexity which results into high energy consumption on embedded devices.The main question asked in this survey is, What are the current approaches and trends which can promote the concept of Level 5 self-driving by enabling the Artificial Intelligence at the Edge Devices with an energy-efficient approach.During the process, some of the secondary questions related to development of model, Optimization and Inference approaches such as Federated learning were explored.However there are some research gaps and open problems which needs to be considered such as: Data management and process techniques on the Edge devices, Categorization for autonomous driving use-cases for real-time use-cases, autonomous driving tasks hierarchical categorization and energy implications of them.These topics are covered in the following subsections.
A. Connected Vehicle Service and Case-Study 1) HD-Map: Vehicle drivers has been regularly using 2-D map (for example: Google Maps, Apple Maps) with the cellular technologies to have a precise and short duration travel within or between the cities.For Self-driving vehicle this is been replaced by High Definition maps or 3D maps which are a result of mapping the roads and infrastructure using high definition cameras and LiDAR sensors to localize the vehicle precisely in the 3D environment and by saving the information over the data centers or cloud services.The average roads or dynamic scenes in a developed country changes only 5% -13% [100] over the year, due to construction or any other dynamic events.Therefore an approach can be implemented along with SLAM technique to update the previous captured HD Map in the cloud based on change in the scenarios.Lately, research approaches [381] has been proposed to have a DNN model to update the HD map data available in the cloud from the crowd-sourced data.
2) Vehicular Networks and Communication: For Edge-Assisted autonomous driving learning a cooperative approach needs to be implemented and practiced for collaborative decision making.Federated Learning has been proposed as potential solution for this problem, however open directions remains on the topics including common framework and deployment for heterogeneous vehicular networks, resource allocation using Federated Learning, communication, computing, and caching strategies for FL, data privacy and model security, collaborative intelligence.

B. Enablers for Edge Application in Autonomous Driving
1) Data Management for Edge-Assisted Services: The current autonomous driving practices involves individual implementation of tasks such as Classification, Detection or Localization.One of the reason associated with individual processing is non-availability of data management techniques and practices for the edge devices.If data management techniques can be proposed a heterogeneity aware layer can be integrated to serve as a data flow between the Sensor and DNN algorithm.Having Data Management techniques for the Edgedevices can simultaneously enhance the collaborative driving functionality and also improve the offloading strategy thus enabling each vehicle to make independent decisions and also share the output for cooperative driving use-case.Real-time compression of streaming data (from IoT/camera) and to be stored on the Edge for tracking or monitoring.
2) Collaborative Edge Intelligence: The limited data bandwidth over wireless communication may lead to failure with decision making process in an autonomous cars as in case of cooperative driving the autonomous vehicle should continuously transmit data between the vehicle and the cloud.Implementing AI at the Edge on large scale can enable autonomous cars to efficiently process data and also enabling communication between vehicles, to overcome the network and communication related issues, distributed edge computing and federated learning approaches can be implemented which can enable the data processing and computation close or near to the vehicle as compare to the approaches in cloud computing where the processing and computation takes place in the centralized cloud.With the computation occurring close to the vehicle challenges and critical requirement such as accuracy, low-latency, reliability, power, and energy consumption, of autonomous vehicles [180] can be achieved.However, bringing services near the vehicles' network where connectivity of the cars and their data is increasing at a tremendous rate often becomes highly crucial due to scalability issues in terms of functionality, administration, and load.Moreover, the connectivity among a large number of devices results in a flood of data production that can hinder the edge node to perform analytic on such a large-scale data by meeting strict latency requirements of autonomous cars.An adequate consideration must be given to resolve the edge-related issues for enabling successful deployment of autonomous cars.
3) Training and Inference at the Edge: As covered in this survey, the volume of data from the sensors and the quality of data is rapidly changing and increasing depending upon the change in dynamic layer.To ensure the adaptability of Edge AI algorithm for a new or different data from the autonomous driving services environment, it becomes necessary to perform and implement AI model training and inference at the edge.As this will ensure the real-time update of legacy or groundtruth data available near Edge and will also ensure the timely update of global model by exchanging binary weights with the backend cloud.The training and inference approach at the edge device can counter two major challenges: Inference latency which can be caused when the model is trained over other device or system (for example cloud) and Secondly the privacy as on device training will prevent the data from being shared over cloud.
4) Common Edge Framework: The implementation of approach such as Federated Learning, in autonomous driving demands a common Edge AI framework to be implemented across entities involved.A common edge framework across Vehicles, Edge Server, Infrastructure Sensors and Centralized cloud needs to be deployed to increase the efficiency and accuracy of applications.A common edge framework can bring the performance of individual devices to optimum level with need-basis collaboration from the vehicles and infrastructure sensor, Also it is important for privacy and security features.

C. Energy Efficiency Evaluation of DNN Implementation on embedded devices
Resource Constrained Devices: Deep neural networks have delivered competitive accuracy for detection, segmentation, mapping and localization-related tasks for autonomous driving and with the advancement, in libraries and frameworks, they have also been deployed on resource-constrained devices such as smartphones, FPGA.However, there are several drawbacks which cannot be overlooked.The best-in-class accuracy from the state-of-the-art DNN is delivered at the extreme computational cost caused during training and inference [230] which significantly increases the overall energy consumption in the autonomous driving ecosystem.Literature covered in this survey shows several methods that have been proposed to improve the accuracy and speed of DNN processing by optimizing metrics involved, for example optimizing the binary weights and operations involved in complex layer such as convolutional, Fire modules.These approaches do not necessarily make a significant improvement on the embedded device deployment and applications.Therefore there is an open requirement to propose an efficient DNN model for autonomous driving training and inference applications which simultaneously tackle the problem of low latency applications by overcoming the challenge of data and the energy consumed.
Real-time applications such as SLAM or vision related tasks requires low latency and high precision by the embedded devices.The relevant literature covered in this survey mostly exploits high-end GPU which is cost-intensive for large scale deployment.To enable these tasks on edge embedded devices a combined software and hardware acceleration approaches can be proposed which integrates data offloading strategies and energy or power saving techniques by simultaneously enhancing the accuracy and performance of these resourceconstrained devices.

D. Outlook of Edge AI Pipeline
Takeaways and lessons learned from this survey highlight the need for an Edge AI processing pipeline that can process large volumes of data to carry out decision making processes.Figure 14 shows an overview of the Edge AI processing pipeline envisioned for future connected autonomous driving services, where the design of this pipeline corresponds to the joint processing of data at the vehicle on-board computing unit and at the Edge-server.In the proposed scenario, the AI processing pipeline consists of four major components.The first component comprises of the sensing unit present in the vehicle (camera, LiDAR, radar, GPS, and the communication unit (on-board unit + cellular connectivity), which is capturing data from the vehicles surrounding.
The second component consists of computation and decision-making process, it involves an edge device placed in the vehicle processing the data through a deep neural network thus enabling driving services such as perception, SLAM and communications.The computation and decisionmaking process is a complex task while incorporating energyefficient autonomous driving service through edge intelligence.Therefore, it is necessary to highlight the process which consumes a significant amount of on-board energy.Further, the computation and decision-making process is divided into data processing pipeline and computing respectively.The data pro-cessing pipeline is assigned with tasks, such as offloading, labelling, real-time compression, legacy data update and sharing the refined data with other entities involved in the surrounding, such as other vehicles, or edge servers.The processes carried out in the data processing pipeline can solve the primary concern of memory and power for resource-constrained edge embedded devices.The computing part involves processing the refined data over a deep neural network to generate the weights for driving applications.With the possibility of optimizing deep neural networks further acceleration and approximation techniques such as deep neural network model compression, data fusion or approaches such as early exit deep neural networks can be used.It is important to note that tasks such as SLAM, object-tracking, obstacle detection has low-latency and high bandwidth requirements, which makes it necessary and practical to process sensed data at the vehicle's on-board computing unit for these tasks instead of processing at the edge or remote cloud.Therefore, one of the inputs from the vehicle sensors bypasses the data processing pipeline and is directly used for computational purpose.
The third component of the proposed edge AI processing pipeline consists of an edge server that is responsible for the processing of large-volume data and enabling communication in the vehicular ecosystem.The communication here can be categorized as: vehicle to edge server (for sharing of raw data), Edge server to a vehicle (for sharing of DNN model weights and refined or processed data), Edge server to infrastructure, and lastly edge server to backend cloud.To reduce the extensive on-board energy consumption in an autonomous vehicle, it is important to process the computationally intensive tasks over the edge-server, which implements lossless compression, optimization, and software approximation approach, which can help in achieving overall end-to-end energy efficiency.
The fourth component consists of roadside infrastructure which includes a sensor suite (CCTV, traffic lights, LiDAR, communication unit, GPS) similar to the vehicle and helps in tasks and applications such as smart traffic flow, traffic monitoring, map update etc.As illustrated in Figure 14 the component also comprises of similar data processing pipeline executing tasks such as offloading, labeling, real-time data compression and data or model sharing over wired communication with the edge server and backend cloud.The backend cloud is communicating with the vehicle, server and infrastructure sensors in case of DNN model update, or legacy data update.To improve the accuracy and enable collaborative driving, the model weights and data update should be shared between the backend cloud, vehicle and edge server over wireless and wired networks respectively.

VII. CONCLUSION
This paper has explored and reviewed autonomous driving applications of perception, SLAM, HD map, vehicular communications, and inference approaches deployed on autonomous embedded platforms and edge devices.Attention has been given to exploring the currently available datasets and autonomous driving frameworks.Focusing on the impact of computational complexity and energy-efficiency on resource-constrained devices, we highlight the communication efficient approaches and software approximation techniques, including low-rank approximation, pruning, quantization and sparsification, which aim at reducing the statistical model parameters for inference.In addition, we also covered the energy-efficient deployment of AI applications on resourceconstrained devices using allocation schemes, heterogeneityaware mechanisms and federated learning.Our purpose is to provide a dedicated review of energy-efficient approaches for connected autonomous driving, ranging from vehicular communication, edge computing, approximation techniques to novel software-hardware frameworks.Besides identifying research gaps, we highlight the existing challenges and open problems that deserve further research investigations from the community.Finally, based on the identified gaps, we envision an Edge AI processing pipeline to share our outlook on potential development of energy-efficient applications for level 4 and beyond edge-assisted autonomous driving applications.

Figure 1 :
Figure 1: Classification of Topics Covered in This Survey

Figure 4 :
Figure 4: Communications in vehicular ecosystem across vehicles, infrastructure, and road-side networks.

Figure 6 :
Figure 6: DNN pipeline to show 3D object detection using video frames, bird-eye-view, and LiDAR point clouds.

Figure 8 :
Figure 8: HD-map layers representation in ecosystem

Figure 9 :
Figure 9: Overlap of ML-Driving Services, Communication and Edge Computing

Figure 11 :
Figure 11: Edge AI layers for connected vehicles

Figure 14 :
Figure 14: Edge assisted autonomous driving.The pipeline consists of on-board vehicle sensors in the car, the computation and decision-making process, Edge-server, infrastructure sensors & devices, and the remote cloud.

Table II :
Number of sensors present in an autonomous car and an approximate count of sensors for level 4 and level.

Table III :
Coverage and Comparison of previously published Survey

covers popular and recently published object & lane detection approaches for autonomous driving. Table IV is formulated on AI model performance over the popular driving datasets (covered in Table
al. is capable of both classification and semantic segmentation of 3D point clouds by learning the local and global feature vector from the raw point clouds.Zhou et al. presented VoxelNet [399], a deep learning architecture detecting 3D bounding boxes based on reading of LiDAR Point clouds, here the LiDAR point clouds were divided into 3D voxel spaced equally.The architecture successfully detects and gives high performance for the car, cyclist and pedestrians.The most prominent 3D object detector Frustum-Pointnet [244] is presented by Qi et al., which predicts the bounding box on an object based on instance segmentation and the bounding box estimation.A similar method Pointfusion [343] is proposed by Xu et al. which utilizes the Pointnet

Table IV :
State-of-the-art DNN architectures benchmarked over KITTI and COCO datasets.The table is arranged according to the timeline, data and method used for computation, and on-board inference speed.

Table V :
The table shows deep learning models proposed for vehicular SLAM application.It also includes approaches proposed within the indoor environment, which are scalable for the outdoor scenes.

Table VI :
Long Range Communication Technologies for Autonomous Driving

Table VII :
Edge Intelligence & Cloud Intelligence parameters comparison for Self-driving vehicles

Table VIII :
Publicly available Dataset for Autonomous DrivingThe table is arranged according to the timeline of release, URL's were last accessed on 15-February-2023.