On the Potential of V2X Message Compression for Vehicular Networks

The emergence of connected automated vehicles and advanced V2X applications and services can challenge the scalability of vehicular networks in the future. This challenge requires solutions to reduce and control the communication channel load beyond the traditional congestion control protocols proposed to date. In this paper, we propose and evaluate the use of V2X message compression to reduce the channel load and improve the scalability and reliability of future vehicular networks. Data compression has the potential to reduce the channel load consumed by each vehicle without reducing the amount of information transmitted. To analyze its potential, this paper evaluates the compression gain of three compression algorithms using standardized V2X messages for basic awareness (CAMs), cooperative perception (CPMs) and maneuver coordination (MCMs) extracted from standard-compliant prototypes. We demonstrate through network simulations that V2X message compression can reduce the channel load. In particular, the tested compression algorithms can reduce the channel load by up to 27% without reducing the amount of information transmitted. Reducing the channel load and the consequent interferences significantly improves the reliability of V2X communications. However, this study also emphasizes the need for high-speed compression and decompression modules capable to compress and decompress V2X messages in real time, especially under highly loaded scenarios.


I. INTRODUCTION
Vehicular networks enable the continuous exchange of information between vehicles and other nodes using V2X (Vehicle-to-Everything) communications. The emergence of connected automated vehicles will increase the demand for advanced V2X applications and services and therefore the information exchanged. This can challenge the scalability of V2X networks in the next years. To address this challenge, it is necessary to reduce and control the communication channel load and the interferences present in the network. bits per message without affecting the information effectively transmitted per vehicle nor modifying the communication parameters. Data compression is widely used in communication systems to improve the bandwidth utilization. For example, HTTP compression can be applied in web servers and web clients (browsers) to improve transfer speed and bandwidth utilization [9]. HTTP compression for text files can achieve compression gains of around 75% [10]. However, the use of data compression in vehicular networks has been underexplored. In this paper, we propose and evaluate the potential of data compression to reduce the channel load and improve the scalability of vehicular networks without modifying the information transmitted or the message rate. To make it agnostic from the radio access technology, we propose that the V2X messages generated by the upper layers of the protocol stack are compressed before they are sent down to the Transport & Network layer. A V2X message compression component could be implemented at the Facilities layer of the ETSI [11] or ISO [12] ITS Architectures (see Figure 1), or above the WSMP Transport Layer of the 1609/WAVE Architecture [13]. This component would be in charge of the message compression at the transmitter and the decompression at the receiver. It would also be independent of the radio access technology considered. The use of data compression would not require any other significant modification to the protocol stack, which increases its potential for future standardization. The compression gain significantly depends on the size and statistical properties of the input data. Previous studies have shown that text files can be significantly compressed because they have repeated substrings (e.g. words) [14]. However, V2X messages have a relatively small size (hundreds of Bytes) and their properties and potential for data compression have not been studied yet. In this paper, we propose and evaluate the use of data compression using standardized V2X messages obtained from experimental prototypes. We consider three different compression algorithms and compare the compression they can achieve. In addition, we evaluate by means of network simulations the positive effects of the V2X message compression on the channel load and V2X communications reliability. To conduct this study, we apply data compression to different V2X messages standardized by ETSI: CAMs (Cooperative Awareness Messages), CPMs (Collective Perception Messages) and MCMs (Maneuver Coordination Messages). These messages are coded in prototypes that implement the ETSI ITS architecture and use ITS-G5 (adaptation of IEEE 802.11p) for the physical and MAC (Medium Access Control) layers. Experimental data to build and transmit the V2X messages has been obtained from field trials. The prototypes and field trials have been implemented in the H2020 TransAID project [15]. The use of standard-compliant prototypes and standardized V2X messages is important because the properties of the data to be compressed can influence the compression gain that can be achieved.
This paper is structured as follows. Section II presents the compression algorithms considered in this study. Section III describes the V2X messages that have been employed. Section IV presents the experimental testbeds that have been used for the collection of the V2X messages. Section V analyzes the statistical properties of the V2X messages collected, and evaluates the compression gain of the different algorithms. It also compares the compression gains achieved with the best algorithm to its theoretical compression limit, and measures the time needed to compress and decompress V2X messages. Section VI evaluates the impact of message compression on vehicular networks by means of simulation. Section VII presents the main conclusions achieved.

II. DATA COMPRESSION
Data compression is the process of encoding information to reduce the number of bits of the original representation. It is also known as source coding in data transmission. Data compression can be either lossy or lossless. Lossy compression reduces the number of bits by removing unnecessary or less important information, and is typically applied in audio or image compression. Lossless compression reduces bits by identifying and eliminating statistical redundancy. Lossless compression can perfectly reconstruct the original data from the compressed one without any loss. Considering the critical nature of V2X information, this work focuses on lossless compression and we study two types of compression methods: entropy compression and adaptive dictionary compression [16]. Adaptive dictionary compression is one of the most common methods. Different algorithms of this type have been shown to be computationally efficient and universal, i.e. they do not require prior knowledge of the data to be compressed. As a consequence, this method can be applied to any V2X message without any previous analysis or processing. Its main drawback for the compression of V2X messages is that it only becomes effective for long bit sequences and therefore its compression gain for relatively small messages needs to be studied. Entropy compression is also widely used in many different fields because of their simplicity, and lack of patent coverage. In contrast to adaptive dictionary compression, entropy compression is not a universal compression method. Entropy compression thus requires prior knowledge of the data to be compressed. To apply it in vehicular networks, the statistical properties of the V2X messages need to be analyzed to build a static dictionary, agreed by all vehicular nodes. Its main advantage is that, once this dictionary is built, its compression gain does not depend on the message length.

A. ENTROPY COMPRESSION
The idea behind entropy compression (also known as probability compression) is to divide the message source in symbols of equal length. The alphabet is the set of possible symbols. Frequent symbols are encoded in fewer bits than infrequent symbols. Therefore, compression algorithms that make use of entropy compression take into account the probability that each symbol appears in a message. Using these probabilities, a table of codewords or dictionary is constructed. Codewords for symbols with low probabilities have more bits, and codewords for symbols with high probabilities have fewer bits so that the input data can be effectively compressed.
Different algorithms have been proposed in the literature to construct the table of codeworks or dictionary. One of the most used algorithms is Shannon-Fano coding [18]. It is able to derive compact dictionaries, i.e. with the shortest average codeword length to represent the messages of a given source. Assuming that the probability of each symbol in the alphabet is known, the Shannon-Fano coding algorithm operates as follows: 1. Sort the list of symbols in decreasing order of probability, the most probable ones to the left and least probable to the right. 2. Split the list into two parts with the total probability of both parts being as close as possible. 3. Assign bit 0 to the left part and bit 1 to the right part. 4. Repeat the steps 2 and 3 for each part until all the parts are split into individual symbols. This algorithm is used to build the dictionary of codewords. It only needs to be executed if the probabilities of the symbols of the alphabet change. Once the dictionary of codewords is obtained, it is used to compress the messages by replacing each symbol with its codeword. It is also used to decompress the messages by replacing each codeword with its corresponding symbol.
To compute the probability of each symbol, two different approaches could be envisaged for V2X messages: Approach 1: Compute the probabilities of the different symbols for each V2X message individually before it is transmitted. This approach implies a different dictionary of codewords for each V2X message. It ensures that the dictionary constructed for each message is the optimal one. However, the receiver needs the dictionary for decompressing the message and the transmitter must send it to the receiver together with the message. This can reduce the gains achieved with message compression since V2X messages are generally of small size.
Approach 2: Compute the probabilities for a set of V2X messages of certain type (e.g. CAM, CPM or MCM) to construct one dictionary per V2X message type. The dictionary does not need to be transmitted with each message, if it is fixed and known by the transmitter and receiver. The main drawback of this approach is that the dictionary is the optimum for the set of messages analyzed, but might not be optimal for each individual message. However, we adopt this approach in this study since the need to append the dictionary to each message in the first approach significantly reduces the gains of compression.

B. ADAPTIVE DICTIONARY COMPRESSION
Adaptive dictionary compression method does not need to parse data before compressing to calculate the symbols' probabilities. It compresses messages by looking for repeated substrings in the input data. At the start, compression algorithms that use this method have no dictionary or use a default baseline one. As compression proceeds, the algorithms add new symbols to the dictionary following certain rules. The algorithms read the input data and search groups of symbols that appear in the dictionary. If a string match is found, a pointer or index into the dictionary is sent to the output instead of the symbol. The compression ratio improves with longer matches. Algorithms based on adaptive dictionary compression have become the de facto standard for general-purpose data compression due to their high-performance compression combined with reasonable memory requirements [16]. One of the most widely used algorithms based on adaptive dictionary compression is the so-called Lempel-Ziv algorithm. It is a universal algorithm that has been shown to be computationally efficient. It can therefore be applied to any V2X message without any previous analysis, but its compression gain could be limited when the amount of data to be compressed is small. In this study, we have used two open-source tools that implement this algorithm: Compress [19] and Gzip [20].
Gzip is based on the LZ77 algorithm [21], which is the original algorithm proposed by Lempel and Ziv. LZ77 makes use of a sliding window buffer to look for repeated substrings in the input data. The sliding window buffer is divided in two parts: a search buffer and a lookahead buffer. The search buffer contains the data that has already been compressed. The lookahead buffer contains the data that has not been compressed yet. As data is compressed, the oldest compressed data is removed from the search buffer and new uncompressed data is added to the lookahead buffer. Data is compressed when a substring in the lookahead buffer is found in the search buffer. When this happens, the substring is replaced by a pointer that contains its position and length within the search buffer. The longer the substrings, the higher the compression gain. The Gzip output format is described in RFC 1952 [22]. It contains the compressed data together with a series of additional headers. These headers occupy at least 18 bytes and contain a CRC-32 checksum and the length of the original uncompressed data, among other information.
Compress is based on an evolution of the Lempel-Ziv algorithm that is known as LZW [23]. LZW uses the input data to construct a dictionary. The algorithm is normally initialized with 256 entries, each of them one-byte long. The input data is then parsed looking for substrings that appear in the dictionary. When a substring S of the input data is found in the dictionary, it is substituted by its index and a new entry is added to the dictionary. This new entry contains S and the next symbol in the input data. A new entry is therefore added if the dictionary contains a prefix one byte shorter (e.g. ''car'' is only added to the dictionary if ''ca'' has appeared in the data).
In general, the size of the input data and the distribution of common substrings significantly affect the compression gain of compression algorithms based on adaptive dictionary like LZ77 (Gzip) or LZW (Compress). Using text as input data, LZ77 is able to produce a compression gain of 60-70% and LZW around 50-60% [19], [20]. However, the compression gain of Gzip and Compress when applied to V2X messages like CAMs, CPMs and MCMs has not been studied and is unclear as it will depend on the V2X message size and content (e.g. the presence of repeated substrings).

III. V2X MESSAGES
The size and statistical properties of the data to be compressed can have a high impact on the performance of the compression process. For this reason, it is important that studies are based on real data and messages. This study uses standardized V2X messages defined by ETSI for basic cooperative awareness (CAMs, Cooperative Awareness Messages), cooperative or collective perception (CPMs, Collective Perception Messages) and cooperative maneuvering or driving (MCMs, Maneuver Coordination Messages). The V2X messages have been implemented in standard-compliant testbeds developed during the H2020 TransAID project [24] and data has been obtained from field trials. The authors would like to note that the statistical properties of these messages in large deployments can vary from those obtained in these testbeds. However, all the messages analyzed in this study have been generated following the ETSI standards and hence provide useful insights into the potential of compression for V2X communications.
CAM messages are basic broadcast messages that are used to transmit information about the transmitting vehicle [25]. They contain basic information (e.g. position, speed and status information) and have a small size. CAM messages are regularly broadcasted so that vehicles improve their awareness of the driving environment. CAM messages are the basis of many services (e.g. intersection collision warning or slow vehicle indication). Automated vehicles will also broadcast CPM messages for cooperative or collective perception in connected automated driving [26]. The idea is that connected automated vehicles share information about objects detected by their on-board sensors. Using this shared information, vehicles can extend their awareness even beyond their sensors' field of view. MCM messages are being designed to implement cooperative maneuvering or driving [27]. The goal is that vehicles share their planned and desired trajectories to coordinate safely and efficiently their driving maneuvers. The planned trajectories are used by vehicles to improve the prediction of future locations of nearby vehicles and to detect conflicts, and are always included in the transmitted MCMs. The desired trajectories are used to request a coordination between vehicles. CPM and MCM messages are of larger size than CAMs and will also be continuously broadcasted. They could hence consume a high proportion of the channel bandwidth. This section summarizes the structure and format of the different messages defined (or under definition) by ETSI.

A. CAM
A CAM is composed of one ITS PDU header and multiple containers [25] (see Figure 2). The ITS PDU header is common to multiple messages. It includes data elements like the type of message and the ID of the transmitting vehicle or RSU (Road Side Unit). CAMs transmitted by vehicles must also include one Basic Container and one High Frequency Container. The Basic Container includes basic information of the vehicle that transmits the CAM, such as the vehicle type or its latitude and longitude. The High Frequency Container contains dynamic information of the transmitting vehicle, such as its heading or speed. The CAM can also contain optional containers, such as the Low Frequency Container and one or more Special Containers. The Low Frequency Container includes data elements such as the vehicle role, the path history, or the status of the exterior lights. The Special Vehicle container includes information related to the vehicle role, and was designed for public transport vehicles or emergency vehicles, among others. All containers have optional and mandatory data elements. Therefore, the CAM size depends on the number of optional containers and optional data elements considered. The CAM size can change significantly depending on the driving conditions [28].

B. CPM
The CPM includes an ITS PDU header and 5 types of containers [26] (see Figure 3): a Management Container, a Station Data Container, a Sensor Information Container, a Perceived Object Containers and a Free Space Addendum Container. In addition, it also contains a data element that specifies the current number of detected objects. This number does not necessarily match with the number of objects included in the CPM because all detected objects are not included in all CPMs.

The Management Container and the Station Data
Container include information about the transmitter. The Management Container is mandatory and includes its position and type (e.g. vehicle or RSU). This container also includes an optional data element to notify if the transmitted CPM is part of a larger CPM that has been segmented due to message size constraints. The Station Data Container is optional and includes additional information about the transmitting vehicle or RSU. For vehicles, this container includes information about the vehicle dynamics, such as heading, speed, angle, and its size. For RSUs, it includes information such as the Intersection Reference ID or Road Segment ID.
The Sensor Information Container, the Perceived Object Container and the Free Space Addendum Container are used to exchange information about the on-board sensors and the perceived environment (detected objects and free space). The Sensor Information Container is optional and informs about the on-board sensors used by the transmitter (up to 128 sensors). For each sensor, this container includes the sensor ID, sensor type (e.g. camera, lidar or radar) and its detection area. The receiver can use this information to estimate the areas covered by the sensors of the transmitter. The Perceived Object Containers is also optional and includes information about the detected objects (up to 128 detected objects). For each object, it includes information such as the object ID, position, speed, acceleration, and size of the object, among other fields. Finally, the Free Space Addendum Container is optional and describes the free space areas within the sensor detection areas. The receiver can use this information to better estimate the free space areas around the transmitting vehicle.

C. MCM
The standardization of the MCM has not yet concluded [27] but the H2020 TransAID project has proposed a format for MCM messages [24], [29] that follows current discussions at ETSI. We use this format in this study since it is the only concrete proposal to date. The MCM includes the ITS PDU header and the containers illustrated in  Vehicles transmit the Vehicle Maneuver Container, which must always include the planned trajectory of the vehicle. In addition, this container can include its desired trajectory. Each trajectory contains a variable number of trajectory points, each of them with its coordinates relative to the vehicle position (deltaXCm and deltaYCm), the remaining time to reach the point (deltaTimeMs), and the vehicle heading and speed when it reaches the point (headingValue and absSpeed). The container also includes information such as the heading, speed or acceleration.
The RSU Suggested Maneuver Container is used by RSUs to support maneuver coordination. This container includes a list of driving advice or suggestions sent to vehicles to help them coordinate the maneuvers (i.e. the current proposal envisions the road infrastructure to support and not control the driving). Four types of advice have been defined: car following advice, lane advice, Transition of Control advice, and safe spot advice [24].

IV. EXPERIMENTAL TESTBEDS FOR THE COLLECTION OF V2X MESSAGES
The V2X messages used in this study have been obtained from field trials carried out using standard-compliant testbeds developed during the H2020 TransAID project [24]. In TransAID, different CAV and RSU prototypes have been implemented ( Figure 5) that have been used to demonstrate how infrastructure-assisted traffic management solutions can reduce safety risks and traffic disruptions. The developed CAV and RSU prototypes are capable of exchanging basic awareness information (CAMs), information about detected objects (CPMs) and information to manage and coordinate maneuvers (MCMs) using V2X communications. The CAVs and RSUs are equipped with a V2X module that enables V2V and V2I communications. The V2X module is implemented using a Cohda Wireless's MK5 (see Figure 5b). The MK5 is compliant with the latest ETSI standards for V2X communications at the different layers of the ETSI ITS Architecture [11], including the ITS-G5 radio access technology.
CAMs and MCMs were collected in field trials that showcased the maneuver coordination using V2V when CAVs performs a lane merge in a highway entrance scenario. In this scenario, one CAV was driving on the highway while another CAV entered through an on ramp with certain risk of collision. These two CAVs periodically exchanged CAMs and MCMs to detect each other and coordinate their maneuvers to perform the lane merge maneuver efficiently and safely. The CAVs combine the information collected from the V2X messages with the information perceived by their on-board sensors. The combined information is used by the Autonomous Driving Software (AD SW) module to plan and execute the CAV's autonomous maneuvers. In addition, the AD SW module provides information to the V2X module to create the V2X messages to be transmitted (e.g. the planned and desired trajectories).
CPMs were generated by an RSU installed at the Tostmannplatz intersection in Braunschweig. This intersection has four approaches which are controlled by traffic lights. There are two lanes per direction on the main road and additional lanes for left-turning. The RSU uses an hemispheric camera of type Samsung PNM-9020V [30] to periodically generate CPMs that contain information about the detected objects. This information was also used as input to a Traffic Monitoring module to derive the infrastructure-assisted traffic management measures based on context conditions.

V. COMPRESSION OF V2X MESSAGES
This section analyzes the compression gain that can be achieved for the three V2X message types (CAMs, CPMs and MCMs). To this aim, we first analyze the statistical properties of the V2X messages collected in the field trials using the experimental platforms described in the previous section. We then compare the compression gain that we achieve with the three compression algorithms. We identify the best compression algorithm for the V2X messages, and we compare the compression gains achieved by this algorithm to its theoretical compression limit. Finally, this section quantifies the time needed to compress and decompress the considered V2X messages. MCMs were transmitted by 2 vehicles in a merging highway scenario. All MCMs included the planned trajectory. When a maneuver coordination was required, they included both the planned and the desired trajectories. As a consequence, the collected MCMs have 2 different sizes (329 Bytes or 608 Bytes) depending on whether they contain the desired trajectory or not. Figure 6b shows that it was more common for vehicles to broadcast MCMs without a desired trajectory during the trials.
• K = 256. Each symbol is represented with 8 bits without compression. The alphabet can also be represented as the set of all pairs of possible hexadecimal symbols (00, 01, 02, . . . , FE and FF) or as integers (from 0 to 255).
• K = 4096. Each symbol is represented with 12 bits without compression. The alphabet can also be represented the set of all groups of 3 possible hexadecimal symbols (000, 001, 002, . . . , FFE and FFF) or as integers (from 0 to 4095).

VOLUME 8, 2020
We have analyzed all collected messages to calculate the probability of each symbol per V2X message type and alphabet. Figures 7, 8 and 9 show the PDF of the symbols for the collected CAMs, CPMs and MCMs. The case of K = 4096 is not shown due to visibility reasons as it has too many symbols. Figures 7, 8 and 9 show that some symbols of the alphabet have higher probability than others, which increases the potential to achieve large compression gains.

B. COMPRESSION GAINS
This section compares the compression gain that can be achieved for CAMs, CPMs and MCMs messages using We have implemented our own source code to construct the dictionaries of codewords in Shannon-Fano. We then use existing search and replace libraries to replace message symbols with their corresponding codewords. To this aim, we use as input the PDF of the symbols (Figures 7, 8 and 9). Shannon-Fano algorithm assigns a higher number of bits to those symbols with lower probability. As an example, Table 1 presents the dictionaries of codewords for CAMs, CPMs and MCMs when applying the Shannon-Fano algorithm for K = 16 symbols. Without compression, 4 bits are necessary to represent each symbol. This number of bits is reduced with compression for the symbols with higher probability (e.g. symbol ''0''). The same procedure has been followed for K = 256 and K = 4096, but their dictionaries are not shown due to their size. Figure 10 shows the compression gain achieved with the different algorithms for CAMs, CPMs and MCMs. The bars represent the average values and the vertical lines the 5th and 95th percentiles. Positive compression gains mean that it was possible to compress messages while negative ones indicate that the message size actually increased after compression.   The message size can impact the compression gain. Figure 11 depicts the output size (S o ) -after compression-as a function of the input size (S i ) -before compression-for CAMs and MCMs. 1 The dashed line represents the case where S o = S i . All values below the dashed lines represent a successful compression (i.e. positive compression gain). All values above the dashed lines represent scenarios where the compression algorithm increased rather than decreased the message size. The solid lines represent the linear models that best fit the output-input values for each compression algorithm and message. These models are expressed as: 1 All CPMs had the same size in our experiments. This is why Figure 11 does not represent the case for CPMs. The size of CPMs depends on the number of sensors and the number of connected automated vehicles in the scenario. Since this number was small in our testbed, there is no variability for CPMs. However, future CPMs in scenarios with many more automated vehicles will be more variable and hence higher compression gains than those observed in our experiments could be achieved. where α and β are the parameters of the models and their values are presented in Table 2. Parameter β is equal to zero for CPMs because the same compression was achieved for all collected CPM messages 1 . The community can use the linear models plotted in Figure 11 to consider the impact of compression as a function of the original message without having to implement the compression algorithms. We use Figure 11 to derive Figure 12 that depicts the compression gain as a function of the size of the input messages. The figure shows that the Shannon-Fano algorithm always achieves a positive compression gain, but the gain slightly decreases as the size of the uncompressed message (input) increases. This trend is observed for both CAMs and MCMs, and, is due to the fact that most of the messages have a small size (see Figure 6). As a consequence, the constructed codewords are better suited to the symbol probabilities of the small messages, and therefore provide better compression for these messages. Figure 12 shows that Gzip and Compress experience the same trend for MCM messages and also achieve a positive compression gain. In general, the Shannon-Fano algorithm outperforms Gzip and Compress except when the size K of the alphabet is very low. Gzip and Compress achieve opposite trends with CAMs and their compression gain improves with the input size. This is the case because the probability to find repeated substrings is higher when the input size is larger. However, the compression gain remains always negative and the Shannon-Fano algorithm clearly outperforms Gzip and Compress. This is the case because Gzip and Compress add headers that reduce the compression gain, and also because they are designed to achieve good compression gains when messages are of larger size than CAMs. This is already partly visible in Figure 12b when considering MCMs that are larger than CAMs.

C. COMPRESSION LIMIT
The previous section has demonstrated that the Shannon-Fano algorithm achieves the highest compression gain. We then derive for this algorithm its theoretical compression limit to check how close our implementation is to this limit.
The compression limit for entropy compression is established by the Shannon's source coding theorem. This theorem shows that it is impossible to compress the data such that the average number of bits per symbol is less than the Shannon entropy of the source that generates the symbols [17]. In other words, n independent and identically-distributed random variables each with entropy H (X ) can be compressed into more than n·H(X ) bits with negligible risk of information loss [17]. This limit can be useful in this study to estimate how close the tested algorithms are to this limit.
To calculate the data compression limit, we must first define the Shannon entropy of a source H (X ). The entropy of a memoryless source can be defined as the average amount of information acquired by observation of a single symbol on the source output [18]. The amount of information acquired by observation of a given symbol a i is given by the formula: where P(a i ) is the probability that the source generates the symbol a i . In our study, this probability is equivalent to the probability that the symbol a i appears in a V2X message. We consider that the source has an alphabet of K possible symbols, X = {a 1 , . . . , a K }, and each of them is generated with probability P(a 1 ), . . . P(a K ). Therefore, the Shannon entropy of a given source can be calculated as: Without compression, each symbol is represented by log 2 (K ) bits, and therefore the number of bits needed to represent a V2X message with n symbols is: For example, if the alphabet contains K = 16 symbols, each symbol can be represented by log 2 (16)=4 bits, from 0000 to 1111. As a result, a V2X message with n = 200 symbols can be represented by B nc = 800 bits without compression. Given Shannon's source coding theorem, the minimum number of bits of the message is bounded by: The compression gain limit of a V2X message with n symbols that can be then expressed as a percentage: This limit does not depend on the length of the V2X message. It depends on the number of symbols in the alphabet (K ) and the entropy (H (X )), which in turn depends on the probabilities of the different symbols.
To derive the compression limits, we compute the entropy H (X ) for V2X each message type and value of K using equation (3) and the PDF of the symbols (Figures 7, 8 and 9). Then, we compute with equation (6) the theoretical limit of the compression gain achievable with entropy compression, i.e. with the Shannon-Fano algorithm. Table 3 reports this limit per V2X message type and size K of the alphabet. The table also reports the entropy values computed. The compression gain increases with the number of symbols in the alphabet. For K = 4096, compression gain limits up to approximately 50% could be achieved, which highlights again the potential of entropy compression to reduce the channel load and improve the V2X communications reliability. Larger alphabets would be possible and would provide higher compression gains at the expense of processing power. It is important to highlight that the Shannon-Fano algorithm implemented for this study obtains average compression gains ( Figure 10) that perfectly match with the theoretical compression limits shown in Table 3. This further validates our implementation and highlights again the potential of V2X message compression.

D. COMPRESSION AND DECOMPRESSION TIME
The previous sections have shown that compression can provide significant gains to V2X communications by decreasing the size of the transmitted messages and consequently reducing the channel load. However, compression and decompression requires some processing time that must be small given the low latency requirements of vehicular communications. This section evaluates the time needed to compress and decompress each V2X message type with the different algorithms considered. For this evaluation, we have used an off-the-shelf Intel Xeon Gold 6130 CPU with 2.10GHz base frequency, 22MB of L3 cache size and 2666 MHz of maximum memory speed. We have used existing open source libraries for Gzip and Compress, but have implemented our own source code to evaluate the Shannon-Fano algorithm. Figure 13 plots the compression and decompression times obtained with the different algorithms. The times are depicted separately for CAMs, CPMs and MCMs. The bars in the figure represent the average values and the vertical lines represent the 5th and 95th percentiles. Figure 13 shows that Gzip and Compress achieve the lowest compression and decompression times (between 3 ms and 8 ms approximately) independently of the message type. The Shannon-Fano algorithm can achieve similar compression and decompression times to those observed with Gzip and Compress when K = 16 symbols. However, the time needed to compress and decompress with the Shannon-Fano algorithm significantly increases when the number of symbols K increases because of the larger number of codewords. The obtained results in Figure 10 and Figure 13 clearly show the existing trade-off between computing time and compression gain, since larger compression gains are achieved with higher values of K .
It is also interesting to highlight in Figure 13 that each algorithm achieves compression and decompression times that are of the same order of magnitude, independently of the message type. However, the number of messages that a vehicle needs to compress at the transmitter side is FIGURE 13. Compression and decompression time. VOLUME 8, 2020 significantly lower than the number of messages that it receives from all neighboring vehicles and has to decompress at the receiver side. At the transmitter side, a delay of a few milliseconds (e.g. below 10 ms) created by message compression could be tolerated given that the time between V2X message transmissions is typically equal or higher than 100 ms [31]. However, at the receiver side, a decompression time below 1 ms could be needed to be able to decompress all V2X messages in real time under highly loaded scenarios. This is the case because the number of V2X messages that can be successfully received per second using IEEE 802.11p in a 10-MHz channel is around 1200 for a channel load corresponding to a Channel Busy Ratio (CBR) of 60% 2 [32]. The CBR is defined as the percentage of time that the radio channel is sensed as busy. To decompress 1200 messages per second in real time, the average decompression time of a message should be lower or equal than 1/1200 = 0.83 ms. The results presented in this study show the potential of V2X message compression but also reveal that further work is needed for its real time implementation, e.g. through dedicated hardware and software implementations that reduce the decompression time.

VI. IMPACT OF MESSAGE COMPRESSION ON V2X COMMUNICATIONS
This section analyses the impact of V2X message compression on the reliability of V2X communications. To this aim, we have conducted a simulation study using the network simulator ns-3. All vehicles are equipped with an ITS-G5 transceiver based on IEEE 802.11p and operate in the same channel. All vehicles generate and transmit CAMs, CPMs and MCMs. CAMs are generated following the ETSI generation rules [25]. These rules specify that CAMs should be generated every 100ms to 1s. A vehicle should generate a new CAM if any of the following triggering conditions is satisfied [25]: • The distance between its current position and the position included in its previous CAM exceeds 4 m.
• The absolute difference between its current speed and the speed included in its previous CAM exceeds 0.5 m/s.
• The absolute difference between its current heading and the heading included in its previous CAM exceeds 4 • .
• The time elapsed since the last CAM was generated is equal to or higher than 1 s. The size of each CAM is randomly selected following the PDF of Figure 6.
CPMs are also generated following the ETSI generation rules [26] 2 We consider a 60% target since congestion control protocols, like DCC in ITS-G5, control and maintain the load around this target value. vehicle, or if any previously detected vehicle satisfies any of the following conditions: • its absolute position has changed by more than 4 m since the last time its data was included in a CPM; • its absolute speed has changed by more than 0.5 m/s since the last time its data was included in a CPM; • its absolute velocity has changed by more than 4 • since the last time its data was included in a CPM; • the last time it was included in a CPM was 1 (or more) seconds ago. In a new CPM, the vehicle includes all new detected vehicles and those previously detected vehicles that satisfy at least one of the previous conditions. The on-board sensor used in this study for the detection of vehicles is a 360 • sensor with 150 m range [26]. In the simulation, the size of each CPM depends on the number of detected objects and the generation rules. The MCM generation rules are still under definition. We have then considered that all vehicles transmit MCMs at a fixed rate equal to 5 Hz. The size of each MCM is randomly selected following the PDF of Figure 6.
When V2X message compression is enabled, the size of the compressed message is calculated as a function of the original uncompressed message using the models presented in Figure 11 and Table 2.
The traffic scenario is a six-lane highway (three lanes per each direction) with 5 km length and a lane width of 4 meters. To avoid boundary effects, statistics are only taken from the vehicles located in the 2 km around the center of the simulation scenario. We simulate three different traffic densities: 60 veh/km, 120 veh/km and 180 veh/km. The configuration of the scenario is summarized in Table 4. The propagation effects are modelled using the Winner+ B1 propagation model following 3GPP guidelines in [33]. The communication parameters are summarized in Table 5. The impact of V2X message compression on V2X networks is first analyzed by means of the channel load. The channel load is measured using the CBR. Table 6 shows the average CBR obtained for the three traffic densities considered and all compression algorithms evaluated. The table shows between parentheses the relative CBR difference with the scenario without compression. Table 6 shows that the highest reduction of the channel load at the network level is achieved with sf4096. This is in line with the results presented  in the previous section. sf4096 can reduce the CBR up to around 27% in the best scenario. This is a non-negligible result given that data compression does not reduce the amount of information transmitted or the number of messages transmitted. It is interesting to note that the average reduction of the CBR is not equal to the average compression gain achieved. For example, the average compression gain with sf4096 was between 27% and 52% approximately ( Figure 10) but the CBR is reduced between 18% and 27%. This effect is produced because the compression is applied at the Facilities layer, and the headers added at the Transport & Network, Access and PHY layers are not compressed, to minimize the impact on existing standards and facilitate its practical implementation. It is also worth noting that the relative reduction of the CBR decreases if the traffic density increases. This effect is related to packet collisions. When the traffic density and the CBR increase, the number of packet collisions also augment. When two or more packets overlap in time, their contribution to the CBR is reduced. When no compression is applied, higher packet collisions are produced, especially for the high traffic density scenario. This explains why the CBR improvement decreases with the traffic density, although it is clear that in this case it is not a positive effect.
The reduction of the channel load has a positive effect on the reliability of V2X communications. When the channel load decreases thanks to message compression, the number of packets that are lost due to packet collisions and interferences also decreases. The improvement in reliability is observed when analyzing the PDR (Packet Delivery Ratio) that is defined as the probability of correctly receiving a V2X message at a certain distance to the transmitter. Figure 14 plots the PDR experienced in the low, medium and high traffic density scenarios when using the different compression algorithms. The PDR is shown as a function of the distance between transmitter and receiver. The compression algorithms with highest compression gains (and thus higher CBR reductions) achieve the highest improvement of PDR compared to the scenario without compression. This improvement augments the distance at which a connected vehicle can be detected with CAMs, the distance at which an object can be detected with CPMs, or the distance at which a maneuver coordination can safely take place with MCMs.
A common metric to compare the reliability of different solutions is the distance at which a given PDR threshold is achieved. This distance is shown in Figure 15 for a PDR threshold of 0.7 to more clearly show the gain achieved with data compression. This figure shows, for example, that in the high-density scenario the distance at which a PDR of 0.7 is obtained is more than double with sf4096 compared with the scenario without compression. Figures 16 and 17 plot the communications range improvement that would be achieved with different PDR thresholds thanks to V2X message compression. The communications range improvement is computed in Figure 16 as the difference of the distances   at which the corresponding PDR threshold is achieved with and without compression. It is therefore expressed in meters. The results obtained show that the PDR threshold does not significantly influence the communications range improvement when measured in absolute values. Figure 17 plots the relative communications range improvement as a percentage. As it can be observed, the improvement increases as the PDR threshold increases. This is the case because the communications range decreases as the PDR threshold increases. The obtained results demonstrate that the use of V2X message compression significantly increases the distance at which a given PDR is achieved. It should be reminded that this is achieved without modifying the communication parameters (hence, without reducing the communications range) or the message rate.

VII. CONCLUSION
This paper proposes the compression of V2X messages to reduce the communication channel load and improve the scalability and reliability of vehicular networks. Contrary to conventional congestion control protocols, V2X message compression can reduce the channel load without modifying the communication parameters (and hence the range) or the message rate. This has a positive effect on the execution of applications compared to current congestion control protocols that are based on packet dropping [7]. The obtained results have shown that V2X messages (CAM, CPM and MCM) can be compressed up to around 40-50%. This compression is performed at the upper layers and is therefore agnostic from the underlaying radio access technology. We have also demonstrated that this compression can reduce the channel load and hence improve the packet delivery ratio and thus the communications range. The gains achieved by compression depend on the compression algorithm utilized and the V2X message type and size. The selected algorithm also impacts the compression and decompression times and a balance between compression gains and compression times is necessary for supporting low latency V2X communications. The compression algorithms are independent of the underlying radio access technologies. However, 3GPP-based technologies such as LTE-V2X or 5G NR V2X organize differently the radio resources and the access to the wireless medium. It would then be interesting that future studies quantify the gains (channel load and reliability among others) that compression can bring to LTE-V2X and 5G NR V2X.