A New Affinity Propagation Clustering Algorithm for V2V-Supported VANETs

Clustering is an efficient method for improving the communication performance of Vehicular Ad hoc NETworks (VANETs) that adopt Vehicle to Vehicle (V2V) communications. However, how to maximize the cluster stability while accounting for the high mobility of vehicles remains a challenging problem. In this paper, we first reconstruct the similarity function of the Affinity Propagation (AP) clustering algorithm by introducing communication-related parameters, so the vehicles with low relative mobility and good communication performance can easily be selected as cluster heads. Then, by formally defining three scaling functions, a weighted mechanism is designed to quantitatively assess the effect on the cluster stability when a vehicle joins it. Base on them, from the perspective of global balance, a new AP clustering algorithm for the whole clustering process is proposed. To ensure the validity of simulations, we use the vehicular mobility data generated on the realistic map of Cologne, Germany, and perform a series of simulations for eleven metrics commonly adopted in similar works. The results show that our proposed algorithm performs better than other algorithms in terms of the cluster stability, and it also effectively improves throughput and reduces packet loss rate of VANETs over the classical APROVE algorithm and the NMDP-APC algorithm.


CR
One

ER
One-hop effective range of a vehicle d i,j Distance between V i and V j s (i, j) Similarity function between V i and V j r (i, j) Responsibility sent from V i to V j R i Responsibility list sent from V i to its onehop neighbor vehicles a (i, j) Availability sent from V j to V i A j Availability list sent from V j to its one-hop neighbor vehicles CH i Cluster head of V i ID (CH i ) ID of V i 's cluster head CCH L i Candidate cluster head list of V i C i Cluster i whose cluster head is V i CM N C i Number of cluster members of C i CM N max Maximum number of cluster members of a cluster CM L i Cluster member list of V i whose state is Cluster Head (CH) VOLUME 8, 2020 This work is licensed under a Creative Commons Attribution 4.0 License. For more information, see https://creativecommons.org/licenses/by/4.0/ v C i Average velocity of C i x C i , y C i Central position of C i VSF i,C k Velocity scaling function between V i and C k PSF i,C k Position scaling function between V i and C k CRSF i,C k Communication rate scaling function between V i and C k CAF i,C k Compound assessment function between V i and C k CHFF i Cluster head fitness function of a CH vehicle V i

I. INTRODUCTION
With the rapid increase of vehicle ownership, safe driving, traffic congestion and environmental pollution are the three main disturbing problems in the traffic field currently. The Intelligent Transportation Systems (ITSs) provide an effective way to solve these problems by addressing the complex relations among objects such as vehicles, roads, people, etc.
In this process, due to their capability of delivering accurate status data for all participants in the real time, Vehicular Ad hoc NETworks (VANETs) have played important roles in ITSs, especially with the rise of self-driving vehicles and 5G communications [1]. As is well known, clustering has been introduced into networks to overcome the scalability problem [2]- [6]. VANETs, as inherently large-scale networks, are derived from Mobile Ad hoc NETworks (MANETs) but have their own varying and uncertain characteristics due to the high mobility of vehicles [7]. Thus, the clustering algorithms originally proposed for MANETs seem to be unable to cope with VANETs and negatively affect the duration of links, the packet delivery ratio, the routing overhead and so on [8]- [12]. Therefore, it is necessary and urgent to study new clustering algorithms that are aimed at VANETs.
Considering the high mobility and the road topology, the vehicles in a cluster will inevitably disconnect from VANETs. Thus, the cluster stability has become the main metric to evaluate the performance of clustering algorithms for VANETs. Higher cluster stability requires a better clustering algorithm [13]. To this end, we propose the Global Affinity Propagation Clustering (GAPC) algorithm based on mobility-related parameters and communication-related parameters. An optimized maintenance method of clusters is designed with respect to the global balance to achieve better cluster stability. The contributions of this paper are as follows: 1) First, we reconstruct the similarity function which is the foundation of the Affinity Propagation (AP) clustering algorithm by introducing communicationrelated parameters into the original form. Therefore, the vehicles that have low relative mobility and good communication performance can easily be selected as cluster heads. These robust cluster heads are helpful to improve the cluster stability in the cluster formation phase. 2) Second, based on the formal definitions of three scaling functions, a weighted mechanism is designed to quantitatively assess the effect on the cluster stability when a vehicle joins it. This contributes to selecting an optimal cluster from multiple candidate clusters. 3) Finally, we define three states for vehicles in a hybrid distributed system model of VANETs and design a complete transition process related to them. On this basis, we propose a GAPC algorithm that is responsible for four different phases in the clustering process. The distinguishing feature of the GAPC algorithm is that all vehicles in the cluster will be involved when a vehicle selects a cluster to join. The algorithm can achieve not only better cluster stability but also higher throughput and lower packet loss rate of VANETs over other clustering algorithms. The extensive simulations can validate the effectiveness of our algorithm.
The rest of the paper is organized as follows: Section II introduces the VANETs clustering algorithms and the AP clustering algorithm. Section III describes the proposed GAPC algorithm with respect to the cluster head selection, the cluster formation and the cluster maintenance. Section IV presents our simulation scenarios and analyzes the GAPC algorithm from the views of multiple performance metrics. Section V summarizes the paper and briefly describes future work.

II. RELATED WORKS
As an important problem in the study of VANETs, clustering has attracted considerable interest of researchers in this field. Some representative research works are provided in this section. Furthermore, for the convenience of the following discussion, the core of the original AP clustering algorithm is also provided.

A. VANETs CLUSTERING ALGORITHMS
To address the data interaction between fast moving vehicles on the motorway, Santos et al. [14] first applied the clustering idea of MANETs on vehicles and opened up the study of clustering algorithms of VANETs. However, the proposed CBLR clustering algorithm was insufficient in the cluster stability and the cluster scale because it selected cluster heads only by the number of cluster members. Gunter et al. [15] introduced the mobility-related parameters into the clustering and proposed a mechanism to select cluster heads based on the weighting of the average difference of the velocity and the distance related to their neighbor vehicles. It effectively avoided large-scale clusters. Aiming at the effect of urban intersections on the cluster stability, Hadded et al. [16] proposed the AWCP clustering algorithm by considering the vehicles with the same highway ID and the same direction as their neighbors in the cluster formation and maintenance, and then used a multi-objective genetic algorithm to optimize the parameters of the algorithm.
Since the above clustering algorithms only took into account indicator values at a single time point, the cluster stability was susceptible to the randomness and transience of the vehicle motion. Considering that the aggregate mobility could improve the cluster stability, Souza et al. [17] proposed the Aggregate Local Mobility (ALM) clustering algorithm. The algorithm replaced the received signal strength with the relative distance between vehicles and adopted the ALM including two continuous relative distances to determine state transitions of cluster heads. It improved the MOBIC clustering algorithm proposed by Basu et al. [18] which adopted the relative mobility metric for MANETs. Aiming at the destructive effect of the fast moving vehicles on the cluster stability, Avcil and Soyturk [19] proposed the ReSCUE clustering algorithm. The algorithm kicked those fast moving vehicles out according to the standard deviation of the vehicle velocities in t time and selected cluster heads according to the defined coherence indicator value.
The ultimate purpose of clustering is to achieve efficient communication between vehicles in VANETs. Although the effective clustering structure for vehicles in the region can be realized with the help of the mobility-related parameters that represent the characteristics of the mobility and distribution of vehicles, the desirable communication efficiency is unavailable. The key reason is that the communicationrelated parameters of vehicles are not involved in the clustering. To solve the problem, the idea of adding the communication-related parameters of vehicles as a common indicator in the cluster head selection has been introduced into the VANETs clustering. For a heterogeneous 5G-VANET, Duan et al. [20] grouped vehicles in the region according to the arrival angle, the received signal strength and the transmission range, and then used the Signal-to-Noise Ratio (SNR) and the velocity as constraints to optimize the cluster head selection. Thus an adaptive clustering algorithm supporting the selection of dual cluster heads was proposed. Under the assumption that clusters had been formed, Chai et al. [21] defined a utility function based on the vehicle densities, the average difference values of the vehicle velocities and the available communication resources. Then the vehicles with the maximum utility value were selected as cluster heads. Furthermore, considering each other from two perspectives of the vehicle itself and the cluster to be joined, a cluster switching mechanism was presented. It satisfied the delay-sensitive and the throughput-sensitive QoS requirements simultaneously. Bali et al. [22] identified an initial leadership according to the ranking of the number of neighbor vehicles and defined the average number of information successfully received over a period of time as the vehicle connectivity. Finally, the cluster heads were selected from those leaderships by comparing the connectivity with a given threshold.
The abovementioned clustering algorithms essentially adopted a unidirectional and exclusive mechanism. In other words, a vehicle or a base station was used as a selector to calculate the relative indicator values with other neighbor vehicles. On this basis, the clustering and the cluster head selection could be realized in a centralized or distributed way. In order to further improve the cluster stability and shorten the formation time of clusters to suit for high mobility nature of VANETs better, Shea et al. [23] and Hassanabadi et al. [24] first introduced the AP clustering algorithm [25], used for the data clustering without specifying the number of clusters in advance, into the VANETs clustering research and proposed the APROVE clustering algorithm. By means of the responsibility and the availability, the algorithm calculated the relative indicator values from the perspective of the selector and the selectee, and took into account the relative indicator values with other neighbor vehicles simultaneously. Then the clustering and the cluster head selection were realized in a competitive and distributed way. The simulation results showed that the algorithm had good performance on the major evaluation indicators of the cluster stability such as the cluster head duration, the cluster member duration and the cluster head change rate, etc. However, it ignored the cluster scale, which might lead to large-scale clusters and network congestion in some scenarios, and it only used the position parameters of vehicles to construct the similarity function, which could not fully reflect the motion characteristics of vehicles in VANETs. For the urban environments, Liu et al. [26] proposed the DC-TDCA algorithm to improve the cluster stability by adding the lane and the destination factors to the similarity function of the APROVE algorithm. The DC-TDCA algorithm segmented the given road according to the one-hop communication range of vehicles and regarded them as the initial clusters. And the scale of clusters was controlled by the preset maximum number of cluster members. Although the destinations of vehicles were involved, the DC-TDCA algorithm ignored the instantaneous motion characteristics of vehicles. Moreover, it used different units of parameters in the similarity function. That might affect the accurate evaluation of the similarity between two vehicles, and reduce the cluster stability. Koshimizu et al. [27] proposed the NMDP-APC algorithm by reconstructing the similarity functions of the APROVE algorithm and the DC-TDCA algorithm from two aspects. Firstly, the future position factor was replaced by the velocity factor. Secondly, all parameters were normalized. Although the simulation results showed that the better clustering structure could be achieved, the algorithm ignored the unstability of the clustering structure caused by the irregular moving vehicles existing in realistic traffic environment. In addition, the algorithm did not take into account the communication-related parameters in the similarity function, which could not ensure satisfactory communication performance for VANETs.

B. AFFINITY PROPAGATION CLUSTERING ALGORITHM
The AP algorithm is a clustering algorithm based on message passing between nodes. It has the following advantages: 1) There is no need to specify the number of clusters in advance.
2) The selected cluster heads are the actual nodes in the data set. 3) A mutual evaluation mechanism is used to improve the cluster stability.
The input of the AP clustering algorithm is a similarity matrix related to all nodes. The matrix element s (i, j), which VOLUME 8, 2020 can be calculated by the specified method, denotes the similarity between node i and node j. Essentially it indicates the suitable degree of node j to be the cluster head of node i. In particular, for any node j, the self-similarity s (j, j) indicates the preference degree of node j that will be selected as a cluster head, rather than the similarity to itself. The larger s (j, j) is, the more likely node j is to be selected as a cluster head. The algorithm initially treats all nodes as potential cluster heads, and sets their preferences to a same value.
During the clustering process, two kinds of messages, the availability message and the responsibility message, are exchanged between nodes to select appropriate cluster heads. The responsibility r (i, j), which is sent from node i to node j, indicates the suitable degree of node j to be the cluster head of node i from the view of node i. It is defined as follows: The availability a (i, j), which is reversely sent from node j back to node i, indicates the suitable degree of node j to be the cluster head of node i from the view of node j. The availability a (i, j) and the self-availability a (j, j) are defined as follows: The AP clustering algorithm runs iteratively under the influence of two kinds of messages. At the beginning of the clustering process, for each node, the self-availability and the availabilities related to other nodes are set to zero, and the self-similarity is set to the median or minimum value of the similarities between nodes. So in the first iteration, each node can calculate the self-responsibility and the responsibilities related to other nodes by Formula (1). For node i and node j, the responsibility r (i, j) is equal to the difference between the similarity s (i, j) and the maximum similarity of node i to other nodes except for node j. This means that node j is less likely to be the cluster head of node i if there are other more suitable nodes.
All responsibilities including the self-responsibilities are exchanged between nodes by messages. In subsequent iterations, each node can calculate the self-responsibility and the responsibilities related to other nodes by Formula (1), the self-availability by Formula (3), and the availabilities related to other nodes by Formula (2). For node i and node j, the availability a (i, j) depends on the total sum of the selfresponsibility r (j, j) and the sum of positive responsibilities between node j and other nodes except for node i. If the total sum is positive, a (i, j) is set to zero, otherwise it is equal to the total sum. This means that the more other nodes select node j as their cluster head, the more likely node j is to be the cluster head of node i. All responsibilities and all availabilities including the self-responsibilities and the selfavailabilities are exchanged between nodes by messages.
In addition, in order to avoid numerical oscillation, the AP clustering algorithm adopts a damped message update mechanism. The update formula is as follows: (4) where λ ∈ (0, 1) is a damping factor. message old denotes the message of the previous moment and message new denotes the message of the current moment.
Based on the two kinds of messages, the AP clustering algorithm selects cluster heads in two ways. During the process of cluster head selection, for any node i, if r (i, i) + a (i, i) > 0, it will be selected as a cluster head. During the process of cluster formation, the formula for selecting the cluster head of node i is as follows: where CH i denotes the cluster head of node i. j denotes a cluster head node. Once the selected cluster heads no longer change, the AP clustering algorithm will converge.

III. PROPOSED GLOBAL AFFINITY PROPAGATION CLUSTERING ALGORITHM
Although the AP algorithms for VANETs have significant advantages over the classical algorithms, there are two aspects to be improved. The first one is that communicationrelated parameters are not involved in the clustering process. This may affect the communication performance of clusters with good stability in VANETs. The second one is that the effect on the cluster stability is ignored when a vehicle selects a cluster to join. This may incur extra costs for the cluster maintenance. To this end, the GAPC algorithm, a new AP clustering algorithm from a global perspective, is proposed in this section. It introduces some communication-related parameters into the original similarity function and pays attention to all vehicles in the cluster, instead of focusing on the CH vehicle when a vehicle selects a cluster to join.

A. SYSTEM MODEL
As the foundation for the clustering research, a hybrid distributed system model of VANETs, which adopts LTE-V communication protocol, is shown in Fig. 1. The three states of vehicles in the model are UnDefinition (UD), Cluster Head (CH) and Cluster Member (CM). They can transfer each other by the GAPC algorithm in each vehicle. The state of UD represented by the color red in Fig. 1 is an initial state for each vehicle and a specific state for some vehicles in the clustering process. An UD vehicle does not belong to any cluster. The state of CH represented by the color yellow in Fig. 1 is a specific state for some vehicles in the clustering process. A CH vehicle belongs to only one cluster, and a cluster has only one CH vehicle. A CH vehicle can communicate with both each vehicle in the same cluster and other neighbor CH vehicles using the LTE-V-Direct link. In addition, it can communicate with a base station using the LTE-V-Cell link. The state of CM represented by the color blue in Fig. 1 is a specific state for most vehicles in the clustering process. A CM vehicle belongs to only one cluster, and it can communicate with both other CM vehicles and the CH vehicle in the same cluster using the LTE-V-Direct link.
Considering the realistic traffic environment, there is a class of abnormal vehicles which are significantly different from other vehicles on the road in terms of motion characteristics. Although few in number, they have destructive effect on the cluster stability of VANETs and bring redundant overhead for the cluster maintenance. Therefore, it is necessary to identify such vehicles. By analyzing the behavior of vehicles, there are two factors which should be taken into account. The first one is the direction of travel. The vehicles in different directions are abnormal neighbors. The second one is the relative velocity. The two vehicles with absolute velocity difference greater than a certain threshold are also abnormal neighbors. The formal definition of an abnormal neighbor can be described as follows: . . , V n } is a set of vehicles on the road. az and v denote the azimuth angle and the velocity of a vehicle, respectively. For All vehicles which are within the one-hop distance and not abnormal neighbors of V i constitute the normal neighbor list of V i (denoted as NN L i ).
Where Th az is a threshold value for the difference of two azimuth angles, and usually takes a value of 90 in order to identify vehicles which are turning and on the opposite lanes.
Th v is a threshold value for the difference of two velocities, and is decided by the speed limit of the road. In our VANETs system, each vehicle is assumed to be equipped with a Global Position System (GPS) that provides the motion information for it, including the velocity, the location, and the moving direction. In order to be closer to reality, it is assumed that different vehicles have different communication capabilities. Besides that, each vehicle is assumed to be equipped with a transmitting antenna and a receiving antenna respectively and operates in the Single Input Single Output (SISO) mode. Furthermore, there are two kinds of messages in the system, beacon messages and data messages. In any case, a vehicle can exchange beacon messages with vehicles within its one-hop distance by broadcasting. As for data messages, on the one hand, a CM vehicle can exchange them with both other CM vehicles and the CH vehicle in the same cluster by V2V communications. On the other hand, a CH vehicle can exchange them with other CH vehicles within its one-hop distance by V2V communications. When there are no CH vehicles within its one-hop distance, a CH vehicle can exchange them with a base station by Vehicle to Infrastructure (V2I) communications alternatively. Motivated by the received beacon messages and the states of vehicles, the clustering process corresponding to the GAPC algorithm is performed periodically in each vehicle. As shown in Fig. 2, a communication cycle including the clustering process for a vehicle is divided into three parts: T col , T clu and T data . In T col , under the control of the selected communication protocol, a vehicle reports its own clustering parameters by broadcasting a beacon message, and achieves other clustering parameters through the receiving beacon messages from its neighbor vehicles. In T clu , according to the current state, a vehicle selects one of the three phases (the cluster head selection, the cluster formation and the cluster maintenance) to execute, so as to achieve clustering results. With the execution of these phases, the complete transition process related to the three states (UD, CH and CM) is shown in Fig. 3. It should be noted that T col and T clu are also regarded as a clustering cycle in this paper. In T data , a vehicle can communicate with the target vehicle directly or indirectly through a data message when necessary. Furthermore, considering clustering efficiency and data communication efficiency synthetically, the duration of T col should be a little longer than that of T clu and much shorter than that of T data . In this way, a vehicle can receive beacon messages from its neighbor vehicles as completely and accurately as possible while ensuring data communication efficiency.
Next, the above phases of the clustering process corresponding to the GAPC algorithm will be presented in detail.

B. COMMUNICATION AND ANALYSIS OF BEACON MESSAGES
Various information characterizing neighbor vehicles is the foundation of the clustering process, as well as the GAPC algorithm. It comes from the broadcast and received VOLUME 8, 2020 beacon messages. Therefore, the communication of beacon messages is an original and essential phase for the clustering process in each clustering cycle. It is necessary to describe this phase firstly. To satisfy the need of the subsequent phases, sufficient parameters of a vehicle should be included in its beacon message. The contents of the beacon message are showed in Table 1. From which we can see that there are eleven vehicle parameters in the beacon message such as identity number, azimuth angle, position, velocity, etc.
As shown in the dashed box at the top of Fig. 3, for a vehicle V i , a timer used for beacon message collection is started when a clustering cycle begins. Until the timer runs out, V i retrieves the similarity related to each vehicle in the normal neighbor list of V i . Then, V i calculates the new responsibility and availability according to Formula (1)-(4), and composes the beacon message with the latest parameters. After that, V i broadcasts its own beacon message and receives other beacon messages from its neighbor vehicles under the control of the adopted communication protocol. Once V i receives a beacon message from V j , it will judge whether V j is an abnormal neighbor according to the above definition. If not, it will calculate the new similarity related to V j according to the reconstructed similarity function (presented in Section III-C), and extract the responsibility and availability from the received beacon message. Furthermore, if V j has been in the normal neighbor list of V i , V i will update the record related to V j in its normal neighbor list. If not, V i will add a record including some relevant parameters into its normal neighbor list. The contents of a record related to V j in the normal neighbor list of V i (denoted as NNL i,j ) are shown in Table 1. The communication and analysis of beacon messages for V i are described in Algorithm 1. Calculate self-responsibility r (i, i) by Formula (1); 10: Add self-responsibility r (i, i) to responsibility list R i ; 11: Calculate self-availability a (i, i) by Formula (3); 12: Add self-availability a (i, i) to availability list A i ; 13: Broadcast a beacon to one-hop neighbor vehicles; 14: Receive beacons from one-hop neighbor vehicles; 15: for a beacon from each neighbor V j do 16: if az i − az j < T h az and v i − v j < T h v then 17: Calculate similarity s (i, j) by Formula (6); 18: Extract responsibility r (j, i), availability a (i, j) from the beacon; 19: if V j ∈ NN L i then 20: Update NN L i,j ; After the first communication and analysis phase, each vehicle in any of the three states (UD, CH and CM) has acquired various parameters from its neighbor vehicles and established the normal neighbor list. The next phase is to determine the CH vehicles for VANETs, namely cluster head selection. It is shown on the right side of the dashed box at the top of Fig. 3. If the normal neighbor list is null, the vehicle remains in or changes to the UD state according to its current state. If not, the vehicle calculates the responsibility and availability related to each vehicle in the normal neighbor list. Once the sum of the self-responsibility and the self-availability is greater than zero, the vehicle remains in or changes to the CH state according to its current state.

Algorithm 1 Communication and Analysis of Beacon
In view of the fact that the CH vehicles in the above system model not only communicate with their CM vehicles, but also with other CH vehicles or base stations for the exchange of data messages, the CH vehicles should have the excellent communication capability. For this reason, the GAPC algorithm reconstructs the similarity function of the original AP algorithm by introducing communication-related parameters. This makes that the vehicles with low relative mobility and good communication performance are easier to be selected as cluster heads. For vehicles V i and V j , the similarity function is defined as follows: where v i and v j denote the velocity of V i and V j . v max denotes the max velocity limit for the current road. (x i , y i ) and x j , y j denote the positions of V i and V j . ER denotes the one-hop effective range of V i . OCR i and OCR j denote the owned communication rate of V i and V j . The parameter ER is decided by the communication range and the current electromagnetic environment of vehicles, and can be calculated as follows: where θ ∈ [0, 1] is a coefficient representing the quality of the electromagnetic environment. The more serious the electromagnetic interference is, the larger the value of θ is, and vice versa. It can be obtained with a dedicated hardware. CR denotes the one-hop communication radius of vehicles. The parameter OCR indicates the data transmission capabilities of vehicles and can be represented by the maximum amount of data transmitted per second under given channel conditions. In this paper, it is regarded as a known parameter and equivalent to the transmission rates of the hardware communication modules equipped on vehicles. It should be noted that, in order to be closer to reality, each vehicle has its own OCR value.
In the original AP algorithm, the similarity function has two levels of characteristics. The superficial one is to describe the similarity between two vehicles. The implicit one is to describe the suitability that one is selected as the cluster head of the other between two vehicles. Clearly, the additional communication factor in our similarity function should not destroy these characteristics. To elaborate that, three cases with respect to the communication factor OCR are given as follows: Case 1: V i and V j have the same OCR value. For ease of analysis, suppose that OCR i = OCR j = 1 in case 1, OCR i = 2 and OCR j = 1 in case 2, and OCR i = 1 and OCR j = 2 in case 3. It is apparent that the new similarity function with the additional communication factor satisfies the superficial characteristic. Because the velocity factor v and the position factor (x, y) have the same effect in all three cases, these two factors can be ignored and only the effect of the communication factor OCR is concerned. According to Formula (6), the similarity of case 1, case 2 and case 3 are −1, −2 and −1/2 respectively. Due to −1/2 > −1 > −2, it means that the two vehicles in case 3 are more similar than in others. Meanwhile, for the highest OCR value, V j is more suitable as the cluster head of V i in case 3 than in others. Therefore, the new similarity function with the additional communication factor also satisfies the implicit characteristic. This completes the verification.
As for the responsibility, the availability and the strategy for the cluster head selection, the GAPC algorithm stays the same with the original AP algorithm. As described in Section II, they are not repeated here.

Algorithm 2 Cluster Head Selection and Cluster Formation
else 5: for each V j ∈ NN L i do 6: if OCR j > RCR i + V k ∈CM L j RCR k and CH j == V j then 7: Add V j to CCH L i ; 8: Calculate CAF i,C j 9: end if 10: end for 11: if CCH L i = ∅ then 12: if length (CCH L i ) == 1 then 13: CH i ← CCH L i [0]; 14: else 15: In a word, through the analysis of the cluster head selection phase of the GAPC algorithm, it can be seen that a vehicle with the low relative mobility and the high OCR is easy to be selected as a cluster head. Moreover, the cluster formed around it has good stability and communication performance. The details of the cluster head selection phase of the GAPC algorithm, for any vehicle V i , are described in Algorithm 2.

D. CLUSTER FORMATION
After the cluster head selection phase of the GAPC algorithm in a clustering cycle, one or more CH vehicles appear in VANETs. In the next clustering cycle, an UD vehicle selects a CH vehicle as its own cluster head and becomes a cluster member of the cluster which is managed by the CH vehicle. Once an UD vehicle joins a cluster, it changes to the CM state. This phase is namely the cluster formation. It corresponds to the state transition from UD to CM in Fig. 3.
Specifically, for an UD vehicle, it extracts all CH vehicles matching some conditions from its normal neighbor list to establish a candidate cluster head list. Then, it selects an optimal CH vehicle from the candidate cluster head list as its cluster head with some mechanism. From Formula (5), we know that the mechanism of the original AP algorithm is to select a CH vehicle with the maximum sum of relative responsibility and relative availability. Although it selects an optimal CH vehicle from the view of mutual assessment between two vehicles, it ignores the effect on the stability of the cluster and the communication load capacity of the CH vehicle when an UD vehicle joins a cluster. This contributes little to the better cluster stability. To this end, a new weighted mechanism is adopted in the GAPC algorithm from the view of global assessment for all vehicles in a cluster. The core of the weighted mechanism is a compound assessment function. This function is defined to quantitatively assess the effect on not only the stability of the cluster but also the communication load capacity of the cluster head when a vehicle joins a cluster.
Next, three scaling functions that constitute the compound assessment function are defined as follows: The velocity scaling function is used to quantitatively assess the effect on the cluster stability in terms of velocity when a vehicle joins a cluster. For the kth cluster C k , when a vehicle V i joins it, the new cluster is denoted as C k , namely C k − C k = {V i }. The velocity scaling function of V i related to C k can be defined as follows: where CMN C k denotes the number of cluster members of C k . v C k denotes the average velocity of C k and can be calculated as follows: Through the analysis of Formula (8), it can be seen that, for a vehicle, the smaller the value of the velocity scaling function is, the smaller the effect on the cluster stability is.
The position scaling function is used to quantitatively assess the effect on the cluster stability in terms of position when a vehicle joins a cluster. The position scaling function of V i related to C k can be defined as follows: where x C k , y C k denotes the central position of C k and can be calculated as follows: Through the analysis of Formula (10), it can be seen that, for a vehicle, the smaller the value of the position scaling function is, the smaller the effect on the cluster stability is.
The communication rate scaling function is used to quantitatively assess the effect on the communication load capacity of the cluster when a vehicle joins a cluster. The communication rate scaling function of V i related to C k can be calculated as follows: where RCR i and RCR j denote the required communication rate of V i and V j . The parameter RCR indicates the requirements of vehicles for data transmission via their CH vehicles and can be represented by the transmission rate that guarantees the completion of data transmission before the deadline. For a vehicle, the value of RCR is equivalent to the quotient of the amount of data transmitted via its CH vehicle and the valid time of data. It should be noted that, for the success of data transmission, the maximum value of RCR should be less than or equal to the minimum value of OCR. Through the analysis of Formula (12) and Constraint (13), it can be seen that, for a vehicle, the value of the communication rate scaling function is between 0 and 1, and the smaller the value is, the smaller the effect on the communication load capacity of a cluster is.
The compound assessment function, based on the linear weighted sum of the above three scaling functions, is used to quantitatively assess the overall effect on the cluster when a vehicle joins a cluster. It is defined as follows: where w 1 , w 2 and w 3 are the weighted factors satisfying w 1 , w 2 , w 3 ∈ [0, 1] and w 1 + w 2 + w 3 = 1. Considering that in general traffic scenarios, the vehicle velocity embodies abundant information (e.g., weather, traffic and roads) and has major effect on the performance of a cluster [19], [20], the value of w 1 corresponding to this function should be much greater than those of w 2 and w 3 . As for w 2 and w 3 , their values depend on the communication performance of the vehicles. If the signal coverage area is narrow, the value of w 2 should be greater than that of w 3 . In this way, the boundary vehicles that may affect the cluster stability can be excluded by the increased influence of the position scaling function. If the communication rate is low, the value of w 3 should be greater than that of w 2 . In this way, the vehicles with heavy communication load that may affect the communication performance of clusters can be excluded by the increased influence of the communication rate scaling function. In other cases, w 2 and w 3 can take equal values. In particular, for sparse traffic scenarios, the major contributor to the cluster stability is the vehicle velocity, while the vehicle communication load has little effect on the cluster stability, so the numerical relationship of the three weighting factors is w 1 > w 2 > w 3 . For dense traffic scenarios, the major contributor to the cluster stability is the vehicle position, while the vehicle velocity has little effect on the cluster stability, so the numerical relationship of the three weighting factors is w 2 > w 3 > w 1 .
Synthetically, through the analysis of Formula (14), it can be seen that, for a vehicle, the smaller the value of the compound assessment function is, the smaller the effect on the cluster is. As shown in Fig. 4, for an UD vehicle V i , there are multiple cluster heads satisfying Constraint (13) within its one-hop distance. As mentioned above, to achieve the better cluster stability from the view of global assessment for all vehicles in a cluster, the GAPC algorithm selects the cluster head for V i as follows: The details of the cluster formation phase of the GAPC algorithm, for an UD vehicle V i , are also described in Algorithm 2.

E. CLUSTER MAINTENANCE
Normally, after three previous phases of the GAPC algorithm, a stable clustering structure has appeared in VANETs. However, due to the complexity of wireless networks and the dynamics of vehicles, there inevitably exist some exceptions. To deal with them, the GAPC algorithm provides the cluster maintenance phase. This phase includes three types of processes according to different exceptions.

1) DETECTING A BETTER CLUSTER
For a CM vehicle, when there are multiple CH vehicles within its one-hop distance and Constraint (13) is satisfied, the cluster maintenance phase of the GAPC algorithm selects the optimal one among them according to Formula (15) just like the cluster formation phase. If the optimal CH vehicle is VOLUME 8, 2020 not its current cluster head, the CM vehicle updates its cluster head to switch to a new cluster and remains the state CM. This process corresponds to the state transition from CM to CM in Fig. 3.

2) COMPETITION OF CLUSTER HEADS
For a CH vehicle, when there are one or more other CH vehicles within its one-hop distance and at least one is better than it, to reduce the number of clusters and improve the efficiency of VANETs, it should give way to any of the better clusters and change to the UD state.
To accurately assess which of the two CH vehicles is better, the GAPC algorithm introduces the cluster head fitness function. For a CH vehicle V i , this function is defined as follows: (16) where OCR i denotes the owned communication rate of V i . C i denotes the cluster whose cluster head is V i . V j denotes any CM vehicle in C i . RCR j denotes the required communication rate of V j . CMN C i denotes the number of cluster members of C i . CMN max denotes the maximum number of cluster members of a cluster in VANETs. It is a preset constant and is decided by the performance of communication hardware equipped with vehicles. In Formula (16), the former part indicates the balanced communication capability of V i considering the communication requirements of its CM vehicles. If a cluster head with small value is retained, the probability of communication congestion will be high when new vehicles join its cluster. Compared with the former part, the latter part indicates the scale capability of V i . If a cluster head with the large value is changed, its CM vehicles will lose their cluster head and have to join other suitable clusters or change states to UD and so on. This seriously affects the cluster stability. Based on the analysis, it shows that the cluster head with a larger cluster head fitness function value should be retained.
In addition, for a CH vehicle, due to the aforementioned switching process or unforeseeable behavior of its CM vehicles, there is a possibility that its cluster member list is empty. Once this happens, the CH vehicle should change to the UD state similarly. In this way, the new UD vehicle can find the cluster that meets the constraints and join it in the next clustering cycle. This process corresponds to the state transition from CH to UD at the bottom of Fig. 3.

3) LOSS OF CLUSTER HEADS
For a CM vehicle, when its cluster head has changed the state for some reasons, it will detect a conflict between the real state (CM or UD) from the beacon message and the saved state (CH) in the normal neighbor list with respect to the cluster head in the next clustering cycle. Besides that, when a CH vehicle has a breakdown in the communication hardware or leaves the road for some reasons, its CM vehicles will not receive a beacon message from it and detect an exception that the cluster head is not in the normal neighbor list. Whether in the first case or the second case, it means that those CM vehicles lose their cluster heads. If there are no other CH vehicles that satisfy Constraint (13) within the one-hop distance, those CM vehicles will change to the UD state. Then they can join a new cluster through the cluster formation phase of the GAPC algorithm in the next clustering cycle. This process corresponds to the state transition from CM to UD in the middle of Fig. 3.

IV. SIMULATION RESULTS
In this section, the all-round performance of the proposed GAPC algorithm is evaluated from multiple perspectives. Firstly, an overall description of the VANETs scenarios, including the selected urban area, the vehicular mobility data generated on it, and the settings of main simulation parameters, is given. Secondly, a series of performance metrics concerning the stability and the communication of clusters are elaborated one by one. Finally, the simulations of the GAPC algorithm, the APROVE algorithm and the NMDP-APC algorithm are performed in the MATLAB environment respectively, and a detailed comparative analysis of different traffic density scenarios is provided according to the simulation results.

A. VANETS SCENARIOS
To ensure the validity of simulations, we use the vehicular mobility dataset TAPASCologne [28] to build the simulation scenarios we need. This dataset covers a region of 400 square kilometers in Cologne, Germany, for a period of 24 hours in a typical working day, and comprises more than 700,000 individual car trips. As shown in Fig. 5, for the ease of simulations, we select an urban area of 1 square kilometer from the whole region as the simulation area. Meanwhile, to keep in line with the existing research [19], [29], we build seven simulation scenarios corresponding to 40, 80, 120, 140, 160, 180 and 200 vehicles based on the vehicular mobility data of the selected area. It should be noted that the maximum speed limit of the road is a constant in these scenarios. Table 2 shows the settings of main simulation parameters, such as the size of the simulation area, the transmission range of vehicles, the threshold values for identifying abnormal vehicles, and the values of weighting factors of the compound assessment function, etc.

B. PERFORMANCE METRICS
To evaluate the performance of the proposed clustering algorithm comprehensively and objectively, eleven performance metrics, commonly adopted in the VANETs clustering research, are given from the view of the stability and communication, and elaborated as follows:

1) PERFORMANCE METRICS OF STABILITY
The cluster stability consists of the stability of cluster heads and the stability of cluster members, and it can be evaluated from two aspects: the duration of clusters and the scale of clusters. Based on this, seven performance metrics of stability are adopted here.
• The average duration of cluster heads is a performance metric of stability in terms of duration to represent the average survival time of all CH vehicles in VANETs until they change to other states. It can be calculated by the quotient of the sum of the survival time of all CH vehicles divided by the number of the CH vehicles.
• The average duration of cluster members is a performance metric of stability in terms of duration to represent the average survival time of all CM vehicles in VANETs until they change to other states. It can be calculated by the quotient of the sum of the survival time of all CM vehicles divided by the number of the CM vehicles.
• The change rate of cluster heads is a metric of stability in terms of duration to represent the number of state changes for all CH vehicles per unit time during the clustering process. It can be calculated by the quotient of the total number of state changes for all CH vehicles divided by the total time of the clustering process.
• The number of cluster heads is a performance metric of stability in terms of scale to represent the total number of all CH vehicles. Since there is only one CH vehicle in a cluster, it is equal to the number of clusters.
• The number of cluster members is a performance metric of stability in terms of scale to represent the total number of all CM vehicles.
• The number of isolated vehicles is a performance metric of stability in terms of scale to represent the total number of all UD vehicles.
• The clustering efficiency is a performance metric of stability in terms of scale to represent the degree of the effective state vehicles during the clustering process. It can be calculated by the percentage ratio of the number of all CH and CM vehicles to the total number of vehicles in VANETs.

2) PERFORMANCE METRICS OF COMMUNICATION
Since throughput, packet loss, and packet delay are the three main communication problems of VANETs, four performance metrics of communication are adopted here.
• The average throughput of cluster heads is a performance metric of communication to represent the average throughput of all CH vehicles within the specified time. It can be calculated by the quotient of the sum of the intra-cluster and inter-cluster throughput of all CH vehicles winthin the simulation time divided by the number of CH vehicles.
• The average throughput of clusters is a performance metric of communication to represent the average throughput of all clusters within the specified time. It can be calculated by the quotient of the sum of the intracluster throughput of all CM vehicles in clusters within the simulation time divided by the number of clusters.
• The average packet loss rate of clusters is a performance metric of communication to represent the average packet loss rate of all clusters within the specified time. It can be calculated by the quotient of the sum of the packet loss rates of all vehicles in clusters divided by the total number of those vehicles. Where, the packet loss rate of a vehicle can be calculated by the quotient of the number of failed data packets divided by the total number of data packets within the simulation time.
• The average packet delay of clusters is a performance metric of communication to represent the average packet delay of all clusters within the specified time. It can be calculated by the quotient of the sum of the packet delays of all vehicles in clusters divided by the total number of those vehicles. Where, the packet delay of a vehicle can be calculated by the quotient of the sum of the time intervals between the sending and receiving of data packets divided by the total number of data packets within the simulation time.

C. PERFORMANCE ANALYSIS
To validate the performance of the proposed algorithm, a series of simulations of the GAPC algorithm together VOLUME 8, 2020  with the APROVE algorithm and the NMDP-APC algorithm, which are two classical clustering algorithms for VANETs based on the AP algorithm, have been performed for the above seven scenarios in the MATLAB environment. Moreover, to eliminate the effect of simulation errors on performance analysis, each simulation has been performed 100 times. Based on the results of those simulations, a detailed comparative analysis of the two categories of performance metrics is as follows: 1) STABILITY PERFORMANCE ANALYSIS Fig. 6 compares the duration stability of the three algorithms in seven scenarios. Specifically, the GAPC algorithm is respectively 15% and 20% longer than the APROVE algorithm and the NMDP-APC algorithm in terms of the average duration of cluster heads. As for the average duration of cluster members, the GAPC algorithm is close to the other two algorithms. The change rate of cluster heads is obtained from the GAPC algorithm with 83% and 82% respectively less than the APROVE algorithm and the NMDP-APC algorithm, and almost keeps unchanged with the change of traffic density. Fig. 7 compares the scale stability of the three algorithms in seven scenarios. Specifically, the GAPC algorithm is close to the APROVE algorithm and 40% less than the NMDP-APC algorithm in terms of the number of cluster heads. As for the number of cluster members, the GAPC algorithm is respectively 33% and 20% larger than the APROVE algorithm and the NMDP-APC algorithm. The number of isolated vehicles of the GAPC algorithm is close to the NMDP-APC algorithm and 21% less than the APROVE algorithm. Moreover, in terms of the clustering efficiency, the GAPC algorithm is close to the NMDP-APC algorithm and 20% greater than the APROVE algorithm.
In summary, as for the cluster stability, it can be concluded that the GAPC algorithm is better than the APROVE algorithm and the NMDP-APC algorithm in terms of performance values and the numerical stability of them in seven scenarios. The reasons are shown as follows: • The GAPC algorithm eliminates the interference of abnormal neighbors on the cluster stability. In contrast, the APROVE algorithm and the NMDP-APC algorithm do not take that into account.
• The GAPC algorithm takes into account the one-hop effective range of a vehicle, and it can effectively select the vehicles with the low mobility related to other vehicles as cluster heads.
• In the cluster formation phase, the GAPC algorithm reduces the effect of vehicles on the cluster stability by selecting a cluster with the smallest compound assessment function value. In contrast, the APROVE algorithm and the NMDP-APC algorithm only pay attention to the effect of vehicles on the cluster head, not on the cluster. Fig. 8 compares the communication performance of the three algorithms in seven scenarios. Specifically, the GAPC algorithm is respectively 57% and 64% higher than the APROVE algorithm and the NMDP-APC algorithm in terms of the average throughput of cluster heads. As for the average throughput of clusters, the GAPC algorithm is respectively 26% and 102% higher than the APROVE algorithm and the NMDP-APC algorithm. The average packet loss rate of clusters of the GAPC algorithm is respectively 82% and 74% less than the APROVE algorithm and the NMDP-APC algorithm. Moreover, in terms of the average packet delay of clusters, the GAPC algorithm is 39% greater than the APROVE algorithm and 14% less than the NMDP-APC algorithm.

2) COMMUNICATION PERFORMANCE ANALYSIS
In summary, as for the communication performance of clusters, it can be concluded that the GAPC algorithm is also better than the APROVE algorithm and the NMDP-APC algorithm in terms of performance values (except for the average packet delay of clusters) and the numerical stability of all performance values in seven scenarios. The reasons are shown as follows: • The excellent duration stability of the GAPC algorithm makes it have better communication performance than the APROVE algorithm and the NMDP-APC algorithm.
• The relatively good scale stability of the GAPC algorithm also makes it have better communication performance than the APROVE algorithm and the NMDP-APC algorithm.
• The GAPC algorithm takes into account the OCR parameters of cluster heads, and it selects the vehicles with maximum OCR values as cluster heads through the similarity function. In contrast, the APROVE algorithm and the NMDP-APC algorithm ignore the communicationrelated parameters.
• Fig. 7 (a) shows that the number of clusters of the GAPC algorithm is close to that of the APROVE algorithm, whereas Fig. 7 (c) shows that the number of isolated vehicles of the APROVE algorithm is greater than that of the GAPC algorithm. This means that the vehicle density of clusters of the APROVE algorithm is lower than that of the GAPC algorithm. This is the reason why the GAPC algorithm is not as good as the APROVE algorithm in the average packet delay of clusters in Fig. 8(d).

3) COMPREHENSIVE PERFORMANCE ANALYSIS IN THE SCENARIO OF 200 VEHICLES
Considering the effect of extreme traffic density on VANETs communication, the scenario of 200 vehicles is extracted to compare the three algorithms more comprehensively and deeply. Related to the eleven performance metrics mentioned above, four major analysis indicators such as mean, minimum, maximum and median are adopted. Fig. 9 and Table 3 show the comprehensive performance comparisons of the three algorithms with respect to the scenario of 200 vehicles in two different forms. Through the analysis of the performance comparisons, the following conclusions can be drawn: • The duration of cluster heads using the GAPC algorithm is greater than those using the APROVE algorithm and the NMDP-APC algorithm.
• The duration of cluster members using the GAPC algorithm is close to that using the APROVE algorithm and is greater than that using the NMDP-APC algorithm.
• The change rate of cluster heads using the GAPC algorithm is much less than those using the APROVE algorithm and the NMDP-APC algorithm.
• The number of cluster heads using the GAPC algorithm is greater than that using the APROVE algorithm and is much less than that using the NMDP-APC algorithm.
• The number of cluster members using the GAPC algorithm is greater than those using the APROVE algorithm and the NMDP-APC algorithm.
• The number of isolated vehicles using the GAPC algorithm is greater than that using the NMDP-APC algorithm and is less than that using the APROVE algorithm.
• The clustering efficiency using the GAPC algorithm is greater than that using the APROVE algorithm and is less than that using the NMDP-APC algorithm.
• The throughput of cluster heads and the throughput of clusters using the GAPC algorithm are greater than those using the APROVE algorithm and the NMDP-APC algorithm.
• The packet loss rate of clusters using the GAPC algorithm is less than those using the APROVE algorithm and the NMDP-APC algorithm.
• The packet delay of clusters using the GAPC algorithm is greater than that using the APROVE algorithm and is less than that using the NMDP-APC algorithm.
In general, the GAPC algorithm has better comprehensive performance than the other two algorithms under high traffic density.

4) PERFORMANCE ANALYSIS OF THE PROPOSED GAPC ALGORITHM UNDER DIFFERENT WEIGHT COMBINATIONS
In the cluster formation phase, a weighted mechanism is adopted in the GAPC algorithm to select an optimal cluster head. To evaluate the effect of different weight combinations on the performance of the algorithm, we perform additional simulations with several weight combinations in seven traffic scenarios. Through the analysis of Formulas (14) and (15), it indicates that the compound assessment function with weighted factors plays a role in cluster head selection of UD vehicles and CM vehicles. This means that the weight combination only has direct effect on the duration of cluster members. Therefore, only the performance metric (the average duration of cluster members) needs to be considered in the simulations. The detailed simulation results are shown in Fig. 10. From Fig. 10, it can be concluded that, in terms of different traffic densities, the weight combinations have different effect on the performance of the GAPC algorithm. But overall, as long as the three weighted factors (w 1 , w 2 and w 3 ) follow the numerical relationships presented in Section III-D and maintain the same numerical proportionality, the performance of the GAPC algorithm remains almost constant under different weight combinations. In addition, the results also verify the correctness of the analysis for the numerical relationship of the three weighted factors in extreme traffic scenarios.

V. CONCLUSION
In this paper, aiming at the effect of the high mobility of vehicles on V2V-supported VANETs, we propose the GAPC algorithm so as to achieve a clustering structure for VANETs which has the better stability and communication performance in contrast to the traditional clustering algorithms. On the one hand, via beacon messages, the GAPC algorithm identifies abnormal vehicles within the one-hop distance and VOLUME 8, 2020 establishes the normal neighbor list for each vehicle. On the other hand, the GAPC algorithm introduces communicationrelated parameters into the similarity function of the original AP clustering algorithm. Based on that, the GAPC algorithm uses a weighted mechanism to quantitatively assess the effect on the cluster stability when a vehicle joins it, and improves the cluster formation of the original AP algorithm by selecting the cluster with the lowest compound assessment value. The simulation results under the seven scenarios show that the GAPC algorithm is superior to the APROVE algorithm and the NMDP-APC algorithm concerning not only the stability and communication performance of clusters but also the algorithm robustness.
In the future work, to further improve the precision of the GAPC algorithm, the weights in the compound assessment function can be determined by using the popular machinelearning algorithms and changed according to the application requirements. In addition, for the superior cluster stability, a joint determination mechanism can be adopted to substitute the self-determination mechanism of the GAPC algorithm which is responsible for selecting a cluster head.