A Simple and Robust Clustering Scheme for Large-Scale and Dynamic VANETs

Clustering is a promising technique to manage network resources efficiently and, in vehicular communications it is used to group vehicles with similar characteristics managed by a selected vehicle called a Cluster Head (CH). Due to the highly dynamic topology in vehicular networks, a CH selection process becomes a challenging task. Thus, this paper presents a new clustering scheme, namely, Efficient Cluster Head Selection (ECHS) scheme to select the most suitable CHs. The proposed ECHS scheme introduces important conditions pertaining to the methods deployed in constructing clusters before starting the CH selection. For instance, based on the ECHS rules the ideal CH is the one that centralizes the cluster. This is because it will remain connected as long as possible with its neighbors. The ECHS scheme also guarantees proper clustering distribution in the network, so that the distance between two consecutive clusters are adjusted carefully. Such conditions are guaranteed to effectively cluster vehicles in the road and make the ECHS scheme works better than its counterpart. Simulation experiments are conducted to examine the performance of the ECHS and the results demonstrate that the ECHS scheme achieves the design objectives in terms of CH lifetime, Cluster Member Lifetime (CML), Packet Loss Ratio(PLR), Overhead for Clustering(OC), Average Packet Delay (APD), and Cluster Number (CN).


I. INTRODUCTION
Nowadays, shortcomings of the traditional transportation systems are eliminated significantly by employing intelligent Vehicular Ad hoc Networks (VANETs). Due to the rapid development of wireless sensors and Internet of Vehicles (IoVs) [1], VANETs can be integrated with other technologies such as Cloud and Fog computing [2]- [4]. This sort of integration makes VANETs easy to deploy and include more traffic management applications. Basically, communication in VANETs can be divided into two categories, depending on the types of the running applications and requested services. The first one, is Vehicle-to-Vehicle (V2V) communication, which is commonly used when vehicles share local traffic information among each other without using infrastructure configuration [5]- [7]. The second one is Vehicleto-Everything (V2X) communication which combines any type of communication between vehicles and infrastructure nodes such as a Road-Side Unit (RSU), a Fog Node, The associate editor coordinating the review of this manuscript and approving it for publication was Yan Huo . a Base Station, and a Cloud Center. (V2X) communication usually helps vehicle to collect information about different zones inside a city to deduct traffic congestions and to discover free-congested roads [8], [9]. It is also used for controlling and managing virtual and physical traffic light systems [10], [11].
In contrast to V2X communication, end-to-end channel connectivity among vehicles in V2V communication cannot be always guaranteed. This is because vehicles are moving at high speeds and different directions, which leads to a frequent network partitions problem. In fact, discontinued connection results in poor network performance, packet dropped and further affects exchanging traffic safety information. Since V2V model is categorized as a decentralized self-organizing network, bandwidth and channel contention are not managed by a centralized vehicle. Because a single vehicle has a limited transmission coverage, it cannot maintain a global knowledge for a large-scale and dynamic network as VANETs. Thus, clustering technique has been proposed in which V2V communication can be managed by selected vehicles based on certain criteria [12]- [17].
The basic principle of clustering technique is to divide vehicles into virtual groups. Each group represents a cluster and contains vehicles having similar characteristics such as vehicles density, velocity and geographical locations [18]. Before creating the cluster, vehicles that relatively move in the same directions and speeds agree on selecting one CH. It serves as a local coordinator between its members, collects and disseminates traffic information. Communication between clusters is established via Cluster GateWay (CGW) vehicles that belong to two clusters at the same time. Generally, the established cluster must make sure its running for long time and reducing network overhead [19].
In this paper, we present robust clustering scheme which is called an Efficient Cluster Head Selection (ECHS) scheme, to construct reliable and stable clusters in VANETs. The ECHS scheme applies the following rules to meet its proposed requirements: • The CH is preferred to be positioned at the middle of the cluster to avoid cluster gap problem.
• Each CH should determine its Cluster Gateway Candidates (CGCs), for the sake of proper selection of the next CH, and for adaptive cluster distribution among vehicles.
• Reducing coverage overlap between adjacent clusters to eliminate unnecessary CGWs.
• Eliminating number of redundant retransmissions from CGWs that have the same additional coverage area. The rest of paper is organized into following sections. Section 2 discusses related work. Section 3 presents the system model of the proposed scheme. Section 4 describes the simulation environment and shows the obtained results. Section 5 concludes the paper with possible directions for the future work.

II. RELATED WORK
In VANETs, unpredictable vehicular mobility either in terms of a variety of vehicle speeds or vehicle movements in different directions, imposes extra constraints in designing cluster formation schemes. Therefore, several clustering techniques based on mobility features have been proposed with the aim of mitigating challenges posed by creating clusters. Since most of them adopted similar techniques, we shed the light on the most important ones, and interested readers can refer to other works [20]- [23].
A new dynamic mobility-based clustering scheme is proposed in [24] for urban environment. The scheme builds clusters in VANETs based on some several important metrices such as vehicles' relative velocity, and link lifetime estimation. In [25], a cluster protocol namely Clustering Formation for Inter-Vehicle Communication (CF-IVC), is used to cluster group of vehicles based on different speed intervals, where each vehicle joins a cluster of similar velocity. The Affinity PROpagation for VEhiclar networks (APROVE) scheme is proposed in [26], which it employs a similarity function to create stable clusters. The similarity function is defined as a combination of vehicular position (current and future positions) and mobility. Nevertheless, APROVE scheme is a sort of a distance-based clustering algorithms, which often suffers from frequent re-clustering when the speed of vehicles changes dramatically. Moreover, APROVE scheme incurs an extra delay time for cluster formation process due to need several iterative loops. A lane-based clustering algorithm is presented in [27] for a CH selection based on the majority traffic flow and mobility information of vehicles. Vehicles that will turn to the left or to the right line are not allowed to become CHs, but only vehicles that continue driving in the same direction. Another clustering based direction approach is presented in [28], where CHs selection process takes place based on vehicles travelling directions. Vehicles that move in opposite direction are not allowed to be elected as CHs due the short communication period between the CH and its CMs. This assists to reduce the cluster reconfiguration cost. Similarly a novel algorithm to form stable clusters for a highway environment was presented in [13]. Clusters are formed by vehicles traveling in the same direction and with the same speed level. Hence, vehicles are classified their neighbors into stable neighbors and non-stable neighbors based on their relative speed. The authors in [29] presented a concept of having two CHs inside the same cluster. The primary cluster head (PCH) and a secondary cluster head (SeCH). The PCH has the highest weight value that is aggregated from different metrices such as mean speed and distance of a vehicle to its neighbors. The SeCH acts as a backup to the PCH to improve the cluster stability and takes its responsibility when leaving the cluster.
The Double Head Clustering (DHC) method for VANETs has been introduced in [30], which used new metrices for a CH selection to increase the cluster stability and efficiency. Besides speed of the vehicle, direction, and position, it considers the link quality and the link expiration time (LET).
Furthermore, a considerable number of important clustering techniques has been proposed with the aim of efficient routing and data dissemination. In [31], after selecting CHs based on the direction and distance metrices, it is used for route discovery and to deliver data packets. The authors in [32] proposed a Novel Real Time Vehicular Communication (RTVC) scheme for VANET. Clusters are developed between vehicles based upon average speed and direction, and multicast routing protocol is presented to route data packet from a source to a destination vehicle in two phases. In the first phase, the CH is established the route from the source vehicle, while in the second phase the CH is formed the route to destination via intermediate CHs. Cluster-based On-demand DElay tolerant routing (CODE) algorithm for a highway scenario is proposed in [33]. CODE elects CHs on the basis of vehicles direction and relative speeds to establish route discovery between pairs of source and destination. Similarly, in [34] a cluster is formed based on vehicles direction and location to disseminate data adaptively and to optimize network bandwidth. A clustering scheme based on the driving directions of vehicles is presented in [35]. In the formed cluster, each CM is employed a non-deterministic approach based on the number of received packet to disseminate data between vehicles, and the CH forwards the received packet toward the transmission direction.
Benefit of the Clustering technique is also extended to efficient dissemination safety messages in VANETs [36]. A Novel Segment based Safety message broadcasting in Cluster (NSSC) has proposed with three main functions, namely, Cluster Formation, Collision Avoidance and Safety Message Broadcasting. Variant based Clustering (VbC) scheme based in Chaotic Crow Search (CCS) algorithm is used in NSSC to elect a CH based on the two different metrics that are mobility and connectivity metrics.
Obviously, most of the suggested schemes in the related work handle clustering problem in VANETs by exploiting vehicles mobility characteristics. However, some critical factors in creating clusters have been neglected and have not been explicitly defined and modeled, such as clusters distribution pattern among vehicles. In addition, CHs have been selected by applying simple calculation comparison between vehicles in terms of speed, direction and distance without considering a CH location inside its cluster. Furthermore, in the above related works a CGW is nominated only if it is related to two clusters concurrently, but reducing duplicate retransmissions from neighboring CGWs have not been addressed clearly. Therefore, to fill these gaps this paper presents a new ECHS scheme with the aim of finding an optimized solution for clustering technique in VANETs.

III. SYSTEM MODEL A. MODELING OF EFFICIENT CLUSTERS
The VANET topology can be modeled as undirected graph G (V, E). V is defined as a set of vertices representing the vehicles in the network, and E is the set of edges representing the communication links among vertices (i.e. vehicles). There is a direct communication link (i, j)∈ E, if and only if vehicles i and j are in each other's transmission range: where Tr i , Tr j represents the maximum transmission range for vehicle i and j, respectively. dis(i,j) is the distance between vehicles i and j. Suppose that N i is a set of one-hop neighbors for vehicle i then, N i can be represented as follows: The one-hop neighbors of vehicle i, is defined as: N if (v i ) and N ib (v i ) represent the cardinality of the 1-hope neighborhood set of vehicles i from forward and backward direction, respectively. A vehicle can determine a neighborhood direction by calculating the angel degree from received messages. The coordination of current position and previous of each vehicle is defined as (X c , Y c ) and (X p ,Y p ), respectively. Then, the direction angle θ is calculated as follows: Then, forward direction is confined to the closed interval: and similarly, backward direction: Fig .1 shows the relative backward and forward directions for vehicle v3. Cluster head should be selected carefully to reduce selection overhead, and to prolong its lifetime. In addition to speed, direction and density factors which are generally used in election of a cluster head, the following factor must be considered. It is crucial for each vehicle to know if it's both directions are full, empty or one direction is semi-full of neighbors, so it can be decided if it is a potential CH or not. This is due to high likely to leave its cluster or to join another one quickly. A vehicle that is located at a tail or at a front of a cluster and does not have one-hope neighbor communication from front or backward directions, should be excluded from being a CH. From (3) and (4), we can know if a vehicle is semi-full of neighbors if: But a vehicle is considered full of neighbors if both of its directions are: and if both directions are empty: A CH is always preferably to be located approximately at a center of the cluster. To achieve this requirement a CH should meet the following condition:  This means if a vehicle has an equal or an approximately similar number of neighbors in both directions, it is more likely to become a CH. The vehicle that is located at a center of the cluster is always the best one to manage it. It is high probably to stay with its cluster for long time because its speed and movement are related to vehicles that surround it.
As show in Fig.2, v2 is located at a tail of the cluster and does not have neighbor's knowledge from behind, since v1 is located beyond its transmission range of 250m. Similarly, v6 is located at a front of the cluster and maintain neighboring information from only behind neighbors (i.e. v5, v4, v3, v2). However, v6 cannot receive neighborhood information from front vehicles due to transmission gab between it and v7. Hence, v2 could slow down its speed and join another cluster, and v6 could speed up and join the next cluster. This will cause frequent network fragmentations and repeat a CH election producer several times. To handle this problem, tail and front vehicles should not be elected as CHs. In this case, each vehicle is responsible to monitor the direction of its neighbors.
If a vehicle does not receive any hello message from at least one direction, this indicates the vehicle is located at a tail or at a front of the cluster with a transmission gap. Then, this vehicle will be unable to send a Cluster Head Announcement (CHA) to its neighbors. As shown also in Fig.1, V3 cannot send CHA as it receives hello message from only forward direction v4 and v5, since v1 and v2 are located outside of V3's transmission range, and it cannot receive any information from backward direction. Hence, V3 has one direction semi-full neighbor, and does not meet CH requirements. When the same rules are applied on the scenario of Fig.2, we can conclude that v4 and v9 are privilege to become CHs as their both directions are full of neighbors, while v2, v6 and v7 are not.

B. CLUSTER GATEWAY SELECTION
In some cases, as shown in Fig.3 a few CMs of a given cluster have already declared themselves as CGWs for two CHs. Rest of CMs can be reached from both clusters' CGWs. For instance, assume that there are three CHs; CH1, CH2 and CH3. CH1 has CGW1, CGW2 and CGW3, and CH3 has CGW4 and CGW5, which are already CMs of CH2. When CH1 disseminates a message, its CGWs (i.e. CGW1, CGW2 and CGW3) are responsible to deliver this message to CH2. Any transmission from CH2 must be suppressed, since these CGWs can cover all CMs of CH2. Hence, cluster 2 which managed by CH2 does not serve any benefit and should not be created initially. The following methodologies are implemented in this paper to prevent the unwanted cluster problem such as CH in Fig.2.
It is clearly appeared when CMs of a given CH can be covered by CGWs of surrounding clusters. Once a CH is anticipated it immediately selects its Cluster Gateway Candidates (CGC) based on the following conditions. CGCs must be located at the CH's boarder as it the most potential candidate of being a future CGW. Each CM keeps a record for a distance between itself and its CH and shares it with surrounding neighbors. If a CM does not receive a distance value larger than its to his CH, it announces himself as a CGC. Before any vehicle sending CHA, it collects  neighbor's information and starts a backoff timer based on the following equation: where Time max is defined as the maximum differed time, D CGC,v indicates the difference distance between a vehicle v i and CGCs, and D max is the maximum allowed transmission range. T random is added to avoid a contention between two vehicles close to each other. If a vehicle very close to any CGC, it will initiate a long backoff timer which decrease its opportunity to become a CH. Applying this methodology can minimize number of unwanted clusters and optimize a network performance. In Fig.4, suppose that CH1 is announced himself as a cluster head, and CGC1, CGC2 and CGC3 are anticipated as Cluster Gateway Candidate, since they are located at CH1's border. CGCs broadcast their statues to surrounding neighbors including v1, v2, v3 and v4. Each vehicle initiates a backoff timer based on (12), before sending CHA to its neighbors. Vehicles close to CGCs such as v1 and v2 wait long time before sending CHA and this reduce their opportunity of becoming CHs. On the other hand, backoff timer for vehicles furthest from CGCs such as v3 and v4 is set to short, and they are highly possible to be elected as CHs. v1 is not also appropriate CH candidate because it is approximately covering the same area that CH1 has already covered. Furthermore, several CGWs will be created due to increasing number of CMs that will hear CH declaration from CH1 and v1 if it becomes CH2. In this scenario to guarantee efficient distribution and selection of CHs among vehicles, rules described in (11) and (12), are employed. Hence, v3 is considered as the best CH because it is a center vehicle, and it keeps a reasonable distance to CH1.

C. REDUCTION NUMBER OF UNNECESSARY RETRANSMISSON
CGWs in most cases cannot be a single vehicle, since several vehicles inside the same cluster can be nominated as CGWs. As shown in Fig.5, three vehicles are nominated as CGWs; CGW1, CGW2 and CGW3 because they belong to two the CH1 and CH2 at the same time. When the CH1 or the CH2 propagates a message only CGW1, CGW2 and CGW3 are privilege to forward it to another CH. It is clear that one broadcast from CGW1, CGW2 or CGW3 can cover the target area. However, all CGWs will perform the same task with no additional coverage area. To handle this problem each CGW initiates a backoff timer relative to its distance from its CH as the following equation: (13) where Time max is defined as the maximum differed time, D CGW ,CH indicates the difference distance between a CGW and its CH, and D max is the maximum allowed transmission range. T random is added to avoid a contention between two vehicles close to each other. CGW 1 forwards the received message from the CH 1 as it is the furthest one and its timer will be expired first. Once CGW 2 and CGW 3 hear the same message from CGW 1 during the backoff time, they cancel their retransmission immediately. If the backoff timer VOLUME 8, 2020 is expired and CGW 2 and CGW 3 do not receive the same message from CGW 1 , they take place and perform forwarding the message.

D. CLUSTERING VEHILCES BASED ON SPEED LIMIT
In RTVC scheme [32], vehicles are grouped to a cluster based on a Cluster Speed Limit (CSL) as follows: where (f 1 to f n ), (m 1 to m n ) and (s 1 to s n ) represents cluster members of C f , C m , and C s with a direction of θ, respectively.

E. CLUSTER FORMATION PROCEDUER
In the beginning all vehicles are declared themselves as Norma Vehicles (NVs). Only NVs which meet aforementioned cluster head selection requirements can send CHA to their surrounding neighbors which summarized as follows: • A NV before sending CHA should check if it verifies condition of (11), to make sure it is centered at the cluster.
• If the NV deducts itself as a neighbor to CGC, it should start backoff timer equal to the one described in (12).
• Each NV has neighbor list of all vehicles with V avgspd and CTV values. Only NV that its average speed similar or equal to CTV ± u value can send CHA [32]. Once a NV achieves above requirements it sends out CHA and announces itself as a CH i , if it has not yet received any CHA from other vehicles during CH timer . Any NV receives CHA from CH i , tries to join the cluster by sending Cluster Member Request (CMR) to it. If CH i accepts its request, the NV becomes a CM, and it should check if it is a potential CGC. CM or CGC is considered left the cluster if it does not receive any information from its CH with an interval time. Then, it should change its status again to a NV. Algorithm 1,2 and 3 describe CH election, cluster formation and CM and CGC formation steps, respectively.

Scheme 1 CH Election
NV v i collects its neighborhood information.
If v i is_neighbour of CGC then Set backoff timer for v i as in equation (12

IV. PERFORMANCE EVALUATION
In this paper, the proposed scheme (i.e. the ECHS is compared with three well-known previously clustering scheme adopting the same technique in clustering vehicles, which are the RTVC scheme [32], the CF-IVC [25] and the APROVE scheme [26]. The experimental results are generated by using Network Simulator 3 (NS3) version 3.21 [37]. Simulator of Urban Mobility (SUMO) which is known as a micro-traffic simulator is used to generate realistic mobility traces of vehicles at different densities [38]. For communication between

Scheme 3 CM and CGC Formation If NV v i receives the confirmation message from CH j then v i becomes CM j Else
Go to scheme 1 End_IF If CM j is located at CH j s boarder then CM j becomes CGC j End_IF vehicles we use DSRC channels, which implement the WAVE module with IEEE802.11p standard for both the physical and MAC layers [39]. In all simulated scenarios three lanes highway and two-lanes urban per direction are used with length's road of 3km and 10km, respectively. The vehicles speed varied from 0km/h where vehicles stopped at traffic light intersections or paused due to traffic congestions inside urban city, and 120k/h for highways speed. The total number of vehicles for each direction ranges from 50 to 150 vehicles. Summary of the simulation parameters are shown in Table.1.
To compare efficiency of the ECHS scheme with its counterpart the following six important metrices are considered, since they are widely used in previous related works [13], [24]- [28], and show the capability of the proposed scheme in handling cluster problems in VANETs: • Cluster Head Lifetime (CHL): represents the interval time starting from when a vehicle changes its state to CH to when it becomes non-CH.
• Cluster Member Lifetime (CML): represents the interval time starting from when a vehicle changes its state to CM (when it joins the cluster) to when it changes its state (when it leaves the cluster).
• Cluster Number (CN): the number of clusters created during the simulation time.
• Packet Loss Ratio (PLR): represents a ratio of a number of sent packets to a number of received packets at the destination in a unit of time.
• Overhead for Clustering (OC): shows additional signaling overhead to form and maintain the cluster structures • Average Packet Delay: refers to as the average delay between times at which the data packet was transmitted from the source vehicle until the time it is received at the destination vehicle.
A. CLUSTER HEAD LIFETIME (CHL) Fig.6 show the average CHL for the proposed scheme against the APROVE, the RTVC and the CF-IVC schemes, under different vehicles speed and different transmission ranges VOLUME 8, 2020 (100m, 200m, 300m). The results demonstrate that when a vehicle's speed increases the average CHL for all proposed schemes decreases relatively. This is due to when the vehicles move faster, the network topology changes dramatically, and as a result it's very difficult for the CH to keep long connectivity with its CMs. The results also reveal that when the transmission range increases the CHL also increases. It is clearly that wider coverage always enables the CH to connect longer time with CMs and reduces number of joining/leaving times for vehicles. The ECHS scheme performs better than its counterparts as it imposes strict conditions when electing the CH that always is located at the central of group vehicles.   decreased by increasing its speed, and it is increased when the transmission range increases. It is clearly from the Fig.7, the ECHS scheme outperforms APROVE, RTVC and CF-IVC schemes in terms of CML.

C. CLUSTER NUMBER (CN)
The performance of three clustering schemes in terms of CN is illustrated in Fig.8. As CN decreases network performance increases. We observed from the results in the figures, that increasing the transmission range reduced CN to be formed. The logical explanation of this fact is the coverage area of the current cluster increases in a wide range of transmission, and that makes the CH to include several CMs to serve them. The proposed approach exhibited better performance for all the different transmission ranges.

D. AVERAGE PACKET OVERHEAD, LOSS RATIO AND DELAY
Results in Fig.9 shows the average CO by changing the number of vehicles, while the transmission range is set to 100m and vehicles speed to 60km/h. The obtained results are expected because increasing the number of vehicles is normally increases the number of clusters created and total messages exchanged. Generally, CHA and CMR messages are increased when the number of vehicles increases, since when a cluster is formed normal vehicles will send more CMR messages. Obviously, the overhead generated by the ECHS scheme is less than the overhead generated the APROVE, the RTVC and CF-IVC schemes, since the ECHS scheme keeps long communication links with its member thus decreasing an unnecessarily re-clustering process. As a result, total clustering related messages are reduced by the ECHS scheme as a vehicle does not need to switch to another cluster frequently. PLR for all schemes under different vehicle densities is presented in Fig.10. As the number of vehicles is increased the packet loss ratio is increased also. This is due to frequent collisions between packets and high contention between vehicles on the same wireless channel. Since the ECHS generates lowest average CN with maximum CHL and CML compared to other schemes, it also incurs minimum packet loss ratio. In general, a cluster instability can increase the number of lost packet and total overhead. Fig.11 shows the average delay at maximum vehicle densities for the proposed clustering algorithm. The results show that the proposed ECHS achieves lower delay compared to the referenced schemes. High clustering stability and low communication overhead among the clusters and cluster members are the main reasons for low end-to-end delay in the proposed scheme. Another reason for lower network delay in the proposed scheme ECHS is the careful selection of CGWs with using adjusted backoff timer.

V. COCLUSION AND FUTURE WORK
In this paper we developed a new robust clustering scheme for large-scale and dynamic VANETs. Based on the general mobility information and neighborhoods knowledge for vehicles, the ECHS scheme outperforms existing counterparts in different aspects. For instance, it reduces unnecessary clusters by introducing new conditions when nominating CHs. Vehicles located at the both edges of the cluster are not allowed to become CHs, but only those at the middle as they have longer connectivity with their neighbors. The ECHS also groups vehicles that have similar speeds and directions which guarantees cluster stability and reliability. CGWs in the ECHS scheme are selected very carefully and performing retransmissions is assigned to the most appropriated CGW in the cluster. This can reduce duplicated retransmissions and utilize network channel.
The performance of the ECHS scheme is compared with the RTVC [32] and the CF-IVC [25] schemes, in terms of important metrics such as Cluster Head Lifetime, Cluster Member Lifetime and Cluster Number. The ECHC scheme shows superior performance over its counterparts and demonstrates that vehicles speed and direction are not always enough to build stable clusters in the dynamic VANETs topology.
In our future work, we will investigate the performance of the proposed scheme in more complex scenarios and other VANETs challenges. For instance, the ECHS scheme can be incorporated with routing protocols to enhance routing strategy under a high dynamic environment. It can be also used for fast data dissemination in several safety related applications. Thus, the advantages of the ECHS scheme can be explored further with our previous works such as in [40].