A Center-Based Stable Evolving Clustering Algorithm With Grid Partitioning and Extended Mobility Features for VANETs

VANETs clustering is an emerging research topic that serves in the intelligent transportation systems of today’s technology. It aims at segmenting the moving vehicles in the road environment into sub-groups named clusters, with cluster heads for enabling effective and stable routing. Most of the VANETs clustering approaches are based on distributed models which make the decision of clusters creation lacking the global view of the vehicle’s distribution and mobility in the environment. However, the availability of the LTE and long ranges of base station motivated researchers recently to provide center-based approaches. Unlike existing center-based clustering approaches of VANETs, this article uses the road segmenting phase named grid partitioning before providing summarized information to the clustering center. Furthermore, it presents an integrated approach as a combination of all the clustering tasks including assigning, cluster head selection, removing, and merging. Evaluation of the proposed approach named center-based evolving clustering based on grid partitioning (CEC-GP) is proven superior from the perspective of efficiency, stability, and consistency. An improvement percentage of the efficiency in (CEC-GP) over the benchmarks Center based stable clustering (CBSC) and evolving data clustering algorithm (EDCA) is 65% and 394% respectively.


I. INTRODUCTION
Over recent years, the technology of intelligent transportation systems (ITS) has been developed significantly. Various models in assisting this technology were proposed and implemented. This ranges from the level of network aspects such as vehicular ad hoc network VANET routing [1], clustering [2], reliability analysis [3], congestion solutions [4], internet of vehicles [5], etc., to the level of the road protocols for both vehicle-to-vehicle (V2V) and vehicle-to-infrastructure (V2I) communications [4]. Also, with the assistance of technology of long term evolution LTE [6] and handover [7], it became important to develop mature VANET models to solve various The associate editor coordinating the review of this manuscript and approving it for publication was Chun-Wei Tsai .
issues of networks performance in terms of quality of service QoS [8], throughput, stability [2], [9], scalability [10], security [11], [12], robustness,. . . .etc. Clustering in VANETs is essential in various applications. Some researchers have used it for building a data dissemination scheme to prevent network flooding based on various techniques. For example, [13] has been proposed probabilistic forwarding to handle message broadcasting without network flooding. Another application of clustering is its assistance in the part of the MAC schedule to prevent collision and to utilize idle channels efficiently as the work of [14].
VANET clustering is regarded as an essential model for boosting the performance of the network as a whole. It is associated with more than one aspect. For example, stable clustering is reflected on a good performance of the MAC scheduling, efficient routing, and as a result, a reliable and stable network [15].
Clustering in VANET has numerous types and categories. In terms of topology: single-hop vs. multi-hop [16].
In term of models: center-based [9] vs. distributed [2], in terms of the environment: highway [17] vs. urban [18], in terms of density: dense vs. sparse, in terms of speed, high speed vs. slow speed. Information used for VANETs clustering can be topology information, mobility information such as position, speed, and acceleration, contextual information such as type of vehicle, the intention of travel of driver; etc. Such information can be gathered locally using V2V or gathered globally using V2I, as shown in fig. 1.  [19]).
In the recent years, the fast development and increment of the road infrastructure have motivated researchers to explore the potential of using V2I in creating an efficient, stable and good performance of the VANETs clustering [9], [20]. This article aims to tackle VANETs clustering from this perspective. More specifically, this study aimed to exploit the center-based infrastructure by enhancement a mathematical based clustering model to solve certain challenging issues in VANET clustering such as the highly evolving and dynamical nature of the vehicle's mobility in the environment. This study proposed a novel evolving aware VANETs clustering by exploiting higher moments of the mobility variables, e.g., acceleration in addition to velocity and position. The contributions of this article can be summarized as follows: • This article proposes a novel V2I based VANET clustering framework using a modified evolving clustering algorithm with adopting the concept of the grid in VANET clustering for the first time.
• It has developed a novel traffic generator that includes in addition to driving behavior a novel lane change probabilistic model.
• It proposes grid partitioning for the road environment before doing the clustering, which makes it suitable for the high density of the highways.
• This article also proposes an extended mobility feature that combines in addition to the relative position and velocity of vehicles, a relative acceleration which makes the clustering more dynamical aware of the higher moments when the mobility variables can be added.
• It provides the functionality of merging two clusters, which is needed in VANET to eliminate interference caused by adjacent clusters using a geometrical criterion.
• It also considers generating the simulation scenarios using different parameters including vehicle generation model, vehicle mobility model, and driver behavior models.
• It extensively evaluates the proposed framework using the generated VANETs scenarios based on the standard evaluation measures of VANETs clustering.
The remaining of the article is organized as follows. The literature survey is given in section II; the methodology is provided in section III; experimental results and analysis are given in section IV; and finally, the conclusion and future work are provided in section V.

II. LITERATURE SURVEY
Clustering in general is used in more than one networking sector such as wireless sensor networks [21] and VANETs clustering [22]. This literature survey focuses more on VANETs clustering. Some approaches of VANETs clustering uses statistical clustering such as K-means [23], or other models that apply density metrics in clustering. In the work of [24], an approach for clustering based on density metric is proposed. The approach adopts two assumptions of density, namely, the points in the low-density regions are assumed to have a similar density to their nearby points and the points in the high-density regions are assumed to have the same class to their nearby points.
These assumptions were used in local density and density adaptive metrics to develop a clustering algorithm. Such an approach is good for data clustering; however, it does not capture the dynamic of evolving clusters which is what is faced in VANETs clustering. The concept of local density was used also in [25] under (fuzzy neighborhood density peaks) clustering algorithm which uses fuzzy neighborhood relation to define the local density in FJP (fuzzy joint points) algorithm. This gives more robustness to the algorithm; however, it still does not cover the evolving aspect. Evolving clustering for data points was also proposed by some researchers [26]. VANETs clustering is decomposed of two main categories from the perspective of the cluster builder: center-based and distributed clustering schemes [27], [28].
In distributed approaches, the clustering model uses an election strategy to select the cluster head based on local information gathered by the vehicle and its surrounding vehicles. In the work of [10], neighbor sampling was done to collect relative distance and velocity from the nearby area VOLUME 8, 2020 of each vehicle, then a cluster head is elected based on distributed arbitration. The link lifetime was taken also into consideration for clustering.
The authors in [10] have organized the clustering approach within a framework that includes other operations for maintaining clusters such as merging and backup cluster head selection. This study does not exploit the existence of eNodeB or any center-based unit, which causes ignorance of global information. Naturally, the global view of the vehicles and their mobility pattern in the road environment leads to more stable clustering.
The authors have extended the work of [29], which was only focusing on the safety message-based clustering in a reactive way. From the literature findings, this study found that the interests of the researchers in their clustering differ from one method compared with the other according to the issue that is resolved by clustering. Some methods have focused on the overlapping and interference aspect. In [30], a clustering algorithm based on emphasizing the number of the nodes within the region of the cluster more than the speed similarity of the nodes of the clusters was considered. Hence, the weighting algorithm was used to elect the cluster head based on the number of the nodes, the difference between the speed of the node and the average speed of its neighbor and the standard deviation. The concept was to avoid overlapping clusters. However, such criterion affects the stability of the cluster. Other researchers have focused more on the reliability aspect of the clustering. A multi-hope clustering algorithm was proposed in [16] which maintained the reliability and robustness of the VANETs clustering through enabling merging strategy to prevent interference between two overlapped clusters including stability that maintained by enabling a strategy of the priority-based neighbor following to maintain the stability of the vehicles as cluster heads. This direction of the reliability has led the researchers to propose some clustering approaches that minimize the number of clusters, increase cluster stability or cluster lifetime etc. They have provided an argument on the need of designing a clustering that increases the number of connections or makes them as redundant as possible. This is to ensure reliability. This study provides an example of the work of multi-homing clustering [31].
Typically, reliability is related to emergency and safe message delivery. Hence, some researchers have performed clustering for special applications of the data disseminations such as emergency or security. The [32] has applied clustering for target tracking. In this work, the nodes of the clusters are classified under two levels: level 1 for nodes that can observe the target and level 2 for nodes that are not able to observe the target at the current time; however, they are expected to observe the target after a period of the time.
Hence, the clustering considers safety or emergency criteria for the clustering technique. This approach requires homogenous sensing functionalities of the vehicles which is not applied in current vehicles.
The difference in proposing the clustering concepts and approaches appear clearly in other models [27]. Some researchers have involved a wider range of vehicle attributes for CH selection than the typical road environment information. For example, VANETs clustering was among the approaches in the perspective of commercializing the CH selection and that was by including the price offer of CH as an additional variable to the mobility and topology information [33]. An example is a work of [34] where fuzzy-based logic was used for clustering control and game theory was used for CH competition. Hence, multi-criteria decision-making scheme was offered to a vehicle to join a candidate cluster head. Apart from the orientation or goal of clustering, another categorization aspect is the methodological one. While some researchers have proposed clustering based on meta-heuristic searching, others instead have proposed the game theory, and others have proposed geometrical models and graph-based models.
In the work of [35], particle swarm optimization based clustering for VANET was proposed. The approach considers V2V and multi-hop based clustering, the fitness function combines two terms. The first one is the standard deviation of the velocities; and the second one is the number of the hops; then they are presented in an inversed fraction to maximize the fitness function.
A constraint of the maximum hopes is added to the algorithm. An obvious drawback is the need to tune the value of the strength of the two terms to make a tradeoff between cluster stability (caused by the first term) and delay (caused by the second term). The selection of cluster heads usually use a multi-criteria approach which is based on delay, relative velocity and other factors that have been used by other studies [36].
Another work that has used a meta-heuristic approach for the VANETs clustering is the work of [37] in which honey bee optimization was integrated with the genetic algorithm to construct VANETs clustering in a distribute way. The fitness value has considered various factors, namely, neighbor quality, vehicle degree, and vehicle mobility. The factors were formulated as a single objective function.
The clustering structure categorizes vehicles into four categories: cluster head, ordinary, border, and member of the vehicles. The ordinary nodes include all the vehicles before generating the clustering results where the node will be either a member or a cluster head. Also, the cluster member might be selected to be a border. Geometrical models for assisting in clustering in a distributed way were also proposed.
In [38] Voronoi diagram was used to decompose the environment into local regions in which cluster head is select based on newly defined metrics named vehicle lifetime to connect between cluster head selection and stability of the clustering. The metrics uses a prediction model for the future movement of the vehicle based on the comparison with the needed overhead for changing the cluster. The cluster head selection process perform to make the clustering more efficient. Graph utilization is proposing by other researchers. In [39], the mobility rate of the vehicles is use as a criterion for clustering using the internet connection.
The actual implementation of the protocol assumes that the vehicles have access to the internet and scanned neighbors list to the internet where the algorithm of cluster head selection works based on utilizing the graph searching algorithm. Hence, this work is a combination of distributed local information gathering and center-based clustering by the internet. The previous literature on the VANETs has focused on using 802.11p for data exchange to build clusters in a distributed way. IEEE802.11p is based on broadcasting data or sending them directly to nearby vehicles.
However, with the increasing number of vehicles, an explosion of the number of control packets is a concern, which causes many issues such as data collision, packet loss, which creates an obstacle in the VANETs scalability and reliability. To overcome this problem, the existence of an LTE base station is crucial in managing various aspects of the VANET networks such as maintaining the number of control packets, and assisting in the clustering and routing. According to [18], the importance of LTE base station is much higher than that of IEEE 802.11p for these tasks. Hence, the typical architecture of the VANETs combines both vehicle-to-vehicle V2V communications under IEEE802.11p and vehicle to infrastructure communications V2I under LTE base station. Hence, having an LTE to serve in the clustering which has been called a center-based clustering based on V2I communications [40].
In the work of [41] a center-based VANETs clustering is proposed based on both modified k-means and Floyd-Warshall algorithm.
The cluster head is selected to be the vehicle that achieves the least variance in terms of velocity and to have a centerbased position in terms of distance. The number of clusters is selected to be constant (3 clusters) which is not practically considering the traffic dynamic. The problem with kmeans based clustering is its assumption for static nodes in clustering, which conflicts with the dynamic aspect of the VANETs. Hence, some researchers have aimed at solving this by proposing dynamic k-means [42]. In the work of [20], the researchers have proposed center-based clustering in VANETs using eNodeB and k-means approach for segmenting the vehicles into appropriate clusters. The main role of the eNodeB is to maintain the interaction between the standalone vehicle and CH before it decides the appropriate cluster for the vehicle based on its mobility information. Using k-means carries an implicit restriction of spherical shapes of clusters, which is most of the time applicable in the road environment. [20] suggested application delivery of safety messages which required high reliability and an avoidance of the data explosions due to re-broadcast in classical routing.
The work of [43], eNodeB was used for the goal of clustering management in VANETs. A protocol was proposed for this purpose with the name of LTE4V2X. This work has used extended mobility information that describes the vehicle's movement, which is strongly associated with clusters persistent, i.e., acceleration is a key variable for stable clustering.
Overall, the majority of the methods have performed clustering in a distributed manner considered that the base station will not take any role in the clustering algorithm itself. Also, excluding the base station from the role of clustering causes a clustering result that ignores the global view of the distribution of the vehicles in the road environment.
On the other side, the incorporation of center-based clustering decision providers requires a multi-level of data aggregation, this means that the clustering decision is based on data reduction and the summary phase; this can be accomplished by proposing a grid partitioning. This article will focus on the studies that have carried clustering in VANETs by exploiting the existence of base stations on the road which increasing in today's infrastructure.

III. METHODOLOGY
This section introduces the developed methodology for CEC-GP.
It starts with presenting the framework in sub-section A. Next, the traffic generator is provided in sub-section B. The feature extraction is discussed in sub-section C. The clustering algorithm is presented in sub-section D. Afterwards, the cluster head selection is explained in sub-section E. Next, the removing and merging are explained in sub-sections F and G respectively.

A. FRAMEWORK
The framework of developing center-based VANETs clustering is as in fig. 2. The framework starts with the traffic generation model, which aims at providing various scenarios of the dynamical aspect of the vehicle generation, vehicle mobility, and driver's behavior. Next, the information is collected by the center-based LTE unit, which is responsible for extracting the mobility features in a global way. Next, the network features are added to the feature vector and entered into the phase of the cluster creation, which is responsible for creating clusters and selecting CH. After a cluster is created, the phase of the clustering management is called cluster maintenance. This phase consists of three separate tasks: cluster joining and leaving, clustering merging and then cluster removing. The details of each of the phases are as follows.

B. TRAFFIC GENERATOR
This study has made a simulation on traffic generation that occurs on the highway. The highway model is adapted from the model developed by [44] after adding the lane change model. The vehicles are generated in the environment based on two probability density functions: the first one is for generating the vehicles in the highway as batches and it follows a normal distribution, and the second one is for generating the time interval between each batch and it follows an exponential distribution. Equations 1 and 2, show the generating the vehicles and the time interval between each batch respectively.
where µ denotes the expected size of one batch σ denotes the standard deviation of the batch size pdf (N ) denotes the probability of generating batch with a certain size N .
where T denotes the time interval between two consecutive batches 1 λ denotes the expected value of the time interval between two consecutive batches In this model, the vehicle motilities are based on the generated accelerations and their integrations. Basically, each vehicle is assigned an acceleration that is generated using Equation 3 and then integrated using Equation 4, to obtain the velocity, which is integrated using Equation 5, to obtain the distance.
The approach of generating the acceleration is based on two random variables U 1 and U 2 . The goal of the random variable U 2 is to give the vehicle random value of the acceleration within [0, A max ] or the deceleration [−D max , 0], while the goal of the random variable U 1 is to give the vehicle one of three decisions (acceleration, deceleration, or neither of them).
The acceleration decision is controlled by acc i and p r and the deceleration decision is controlled by dacc i and p r , and both of them are controlled by the aggressiveness of driving behavior (AGG). The goal of AGG is to give to the model more probabilities of the acceleration than the deceleration with a percentage of 70%. This is based on statistical studies of the drive's behaviors on the highway. Equations 3, 6 and 7, show the generating the acceleration of the model.
The vehicle velocity after integration in Equation 1 has to be clipped according to the maximum and minimum velocities V max and V min as it is shown in Equations 8 and 9.
where i denotes the vehicle index t denotes the time This study extends the existing mobility model by adding the lane change technique, which uses two probabilities: p 1 , meant the probability of preserving the lane in the case of the relative distance between the vehicle and its leader lower than the pre-defined threshold d th and p 2 , referred to the probability of the lane preserving in the case of the relative distance between the vehicle and its leader higher than the predefined threshold d th . When a lane-changing event happens, it divided the complementary probability p 1 and p 2 into two equal probabilities: The first one probability is for lane change to the right, and the second one is for a lane change to the left. This is in general for all the lanes except for the borders lanes where there is only one option for the lane change, either right or left. Equations 10 and 11 show the technique of the lane changing process.

C. FEATURES EXTRACTION
The extracted features have two types of features: the first one is the network features and is summarized by the ID of the vehicle. The second is the mobility's features and is defined based on three variables for each vehicle and their projection of both x and y axis of an inertial frame. The position, which is defined by x i and y i , the velocity, which is defined by v x,i and v y,i , and the acceleration, which is defined by a x,i and a y,i . The relation between these components and the body frame is shown in Equation 12.
where a xb,i , a yb,i denotes the acceleration in the body frame of the vehicle and is refreshed by an accelerometer connected to the vehicle v xb,i , v yb,i denotes the velocity in the body frame of the vehicle and is refreshed by the odometer of the vehicle x gps,i , y gps,i the GPS coordinate of the vehicle , a x,i , a y,i represents the mobility variables of the vehicle after mapping it from the body frame to an inertial frame using rotation and translation matrices R (θ) and Trans(x gps,i , y gps,i ) respectively θ denotes the angle between the road of the vehicle and the inertial frame

D. CLUSTERING ALGORITHM
The main algorithm of clustering uses the concept of the grid which consists of a set of steps. Firstly, the central unit decomposes the road environment into a set of cells based on grid granularity. Each cell provides the smallest resolution to collect information from the road about the distribution and density of the vehicles. The core of the algorithm is based on an infinite loop that checks on the entry of any new vehicle to the stream or the change of the position of any current vehicle in the stream with no-cluster or outlier assignment yet to decide its possible addition to an existing cluster or using it to create a new outlier. The condition of adding a vehicle to an existing cluster or outlier is based on the Euclidian distance to the nearest structure (cluster or outlier), which has to be smaller than the radius of the coverage. Once a structure is found, the vehicle is added to the structure and the information of the structure is updated. In the case of the outlier structure, the condition of converting the outlier to the cluster is checked and the conversion is conducted in the case of meeting condition of the threshold value. The cluster head is found based on the Equation 13. It is important to point out that any vehicle that does not have a nearby structure with a relative distance less than the threshold, it will be regarded as an outlier and its outlier structure will be added to the outlier list for a potential conversion to cluster in the case of receiving new vehicles that join the outlier. In addition, mapping the vehicle to an existing cell is based on the geo-graphic distance. While, finding the nearest structure is based on the feature distance, which includes in addition to the position value, the velocity, and acceleration. Another task of the algorithm is the removal of an existing vehicle from its cluster head; this is done by the cluster head using another loop that checks the existence of the vehicle. The vehicle is lost when the distance between the vehicle and the cluster head is higher than the radius.
In this case, the removal of the vehicle from the cluster is done and the cluster head coordinate is updated. Equation 14 shows updating the coordinate of a cluster head. The pseudocode of the cluster creation and cluster head selection is presented in Algorithm 1.

E. CLUSTER HEAD SELECTION
The cluster head of an existing cluster with N vehicles CH N is calculated based on the features of the vehicles inside the cluster; by using the index features index j t, C j to find the vehicle that has the closest coordinate to this index. This is done by Equation 13.
ID index j denotes the ID of vehicle i that carries the value that is closer to the index i C j denotes the size of cluster j x i denotes the feature vector of the vehicle i inside the cluster t denotes the time To update the cluster head in a recursive way whenever a new vehicle is added to the cluster, equations 14 and 15 are used.
CH N−1 t, C j = ID index N−1 t, C j (14) index N t, C j = index N−1 t, C j N − 1 N + x N (15) where index N t, C j the cluster head until reaching the vehicle N index N−1 t, C j the cluster head until reaching the vehicle N − 1 x N the features of the vehicle N that was lastly added to the cluster In case of losing one vehicle from the cluster with a coordinate x N , the cluster head is updated. Equations 16 and 17 show the updating of the cluster head. CH N−1 t, C j = ID index N−1 t, C j (16) where The enabling of the algorithm of a cluster head change is only done when the current cluster head loses partial coverage of VOLUME 8, 2020 outlier.lifetime = Threshold 20: end 21: end 22: end 23: end current moment. As is presented in Algorithm 2; in the case of a cluster, the update information is collected by the cluster head and sent to the eNodeB while in the outlier the update information is collected by the eNodeB from each vehicle of the outlier separately. This has been called an interval lifetime. At each update of the cluster or outlier, the time interval is refreshed by assigning a pre-defined threshold to it. In the case of not receiving an update, then the lifetime is decreased by one. When the lifetime becomes zero the cluster or the outlier will be removed. This is done by removing the cluster or the outlier from the list.

G. MERGING
The merging aims to prevent the case of overlapping between two clusters close to each other which leads to interference. Thus, in order to do the merging, the distance between the two cluster heads of the clusters is calculated and

Algorithm 3 Merging Clusters Algorithm
Input: clusterList mergingDistance Output: clusterList Start: 1: build adjacency matrix of distance between clusters in clusterList 2: for each two clusters with distance less than mergingDistance 3: combine clusters in one cluster and add it to clusterList 4: remove two clusters from clusterList 5: end 6: end compared with the pre-defined value named as the merging distance.
After merging, a new cluster will be created and the two merged clusters will be removed from the cluster list. For the new cluster, the cluster head will be selected as the vehicle that has the closest coordinate to the weighted average of the cluster head of the two clusters.
Also, the information of the two clusters will be updated in terms of the number of vehicles and the lifetime. The pseudocode of merging is presented in algorithm 3.

H. COMPLEXITY ANALYSIS
The role of the complexity analysis is to obtain an analytical formula for the increase of computation by increasing the number of vehicles in the road. This can be performed for the cluster head selection equation (equation 13).
The operation of finding the cluster head is O(N ) where N = C j t . This is done for each newly added node, for the N node, it becomes O(N 2 ) However, after doing the update on the cluster-head equations (equations 16 and 17), the operation is now only O(1), and for N nodes it becomes O(N ).

I. EVALUATION METRICS
The evaluation of clustering approaches in the VANETs concentrates on the stability of the generated clusters. More specifically, the longer state of the vehicle in terms of its role as cluster head or cluster member can be used as the clustering performance metrics which been called the cluster head duration and cluster member duration respectively. Another aspect of the performance is the clustering efficiency, which indicates the percentage of vehicles that participate in the clusters.
This metric shows a view of the effectiveness of the clustering approach. Linking this measure to the other aspects of the performance such as the number of clusters that need to be minimized, can provide a wider view of the stability and effectiveness [2], [29], [45]- [47].

1) CLUSTERING EFFICIENCY
The clustering efficiency is defined as the percentage of vehicles participating in a clustering procedure during the simulation.
It is calculated by dividing the number of vehicles that were part of the clusters (cluster member or cluster head) over the total number of vehicles.

2) AVERAGE CLUSTER HEAD DURATION
The average of the cluster head duration indicates the average time of being in the state of the cluster head before changing the cluster head. As it has been mentioned earlier, the longer CH duration is an indicator of more stability of the clustering approach. The approach of calculating the average cluster head duration is by dividing the total cluster head period over the number of the changes to cluster head from any other state.

3) AVERAGE CLUSTER MEMBER DURATION
This measure is an indicator to the stability of the clustering approach. It refers to the average period of being in the state of a cluster member. Thus, for each conversion to cluster member, we calculate the time and we divide it over the total number of changing to the state of cluster member. Our goal is to maximize the cluster member duration. VOLUME 8, 2020

4) NUMBER OF CLUSTERS
The number of clusters defines how many clusters were resulted from the clustering algorithm during the whole lifetime of the network. Our goal is to minimize the number of clusters.

IV. EXPERIMENTAL WORKS AND RESULTS
This section presents an analysis of the generated evaluation's measures for the VANETs clustering using various simulation scenarios and comparing each with three benchmarks. The benchmarks of this study: the first is an evolving data clustering algorithm EDCA [26]. The second benchmark refers to center-based stable clustering CBSC [9] and the third refers to Mutated k-means algorithm [42].
This study found that EDCA is originally used for data clustering. However, it considers the data as a stream that makes it applicable to the VANETs clustering if it been considered as a data point that represents the feature associated with the vehicle. Table 1 shows the settings of the main simulation parameters.
There are three scenarios used in this study: level 1, level 2, and level 3. A change from level 1 to level 2 and level 3 is controlled by changing the parameter of the two probabilistic density functions of the expected number of incoming vehicles and interval time between generating one batch and the other. The total number of generated vehicles is 120, 200, and 300 within 300 sec. for level 1, level 2 and level 3 respectively. Three main scenarios were evaluated: in the first one the value of AGG is 0.2, in the second one the value of AGG is 0.5 and in the third one the value of AGG is 0.8. For each scenario, four metrics were generated: the efficiency, average cluster head duration, average cluster member duration and the average number of clusters. The fig. 3 shows the clustering efficiency is more than 92% for CEC-GP. It is interpreted by the grid approach that provides the capability of decomposing the environment into a set of the adjacent regions to describe the entities of vehicles' groups geometrically before using their features to generate clusters from outliers. This enables sensitivity to scattered  vehicles, which provides higher efficiency. The performance of CEC-GP was superior over CBSC, which has increased only from 40% to 67% when the level of traffic increases. Besides that, EDCA shows a decreasing efficiency when the level of traffic increases, which is interpreted by the biasness of the algorithm toward the denser areas of the vehicles to combine clusters from them and ignoring the less scattered regions.
Besides, EDCA has shown a low efficiency with the percentage of below 25%, which is due to various factors such as its non-consideration of extended mobility feature similar to CEC-GP, and the update equation of the cluster head that does not consider the need of preserving the cluster head in the geometric center of the cluster. Another aspect of the good performance of CEC-GP compared to EDCA is the merging of a cluster that increases the efficiency due to enabling bigger clusters. k-means was the least performance approach in terms of clustering efficiency.
The four approaches were evaluated also based on increasing the AGG factor from 0.2 in fig. 3 to 0.5 in the fig. 7 and 0.8 in the fig. 8. This shows similar relative performance between the approaches with lower efficiency due to the increase of the aggressiveness in the driving behavior of the vehicles.
For more elaboration, this study presented the detailed time series of each of CEC-GP and the three benchmarks for the number of lanes equal to four in fig. 4 for level 1 and fig. 5,  fig. 6 for level 2 and level 3 respectively. 169916 VOLUME 8, 2020   The results reveal that CEC-GP has low volatility comparing with the high volatility of CBSC in level 1 and level 2 of traffic density. This is another indicator of good performance.
On the other hand, this study found that the worst performance in terms of efficiency was for k-means. This is of its incapability of handling the dynamic changes in the environment.
Another metrics is the cluster head duration which indicates the stability of the clustering approach has been observed in fig. 9, fig. 10, and fig. 11. As observed, CEC-GP has accomplished a higher normalized cluster head duration   comparing with CBSC. This indicates that in CEC-GP, the cluster head, which is the essential vehicle in the cluster stayed for a longer time than its equivalent time in CBSC. Hence, CEC-GP can maintain the cluster for a longer time than CBSC. Also, we monitor a decreasing trend for EDCA, k-means, and CEC-GP when the level of traffic increases, which is caused by more possibilities of a non-stable group's mobility in the scenarios. However, the effect of the increased VOLUME 8, 2020 level of the traffic on average cluster head duration was not observed in CBSC. This is interpreted by that CBSC relies on image processing, which becomes more sensitive with dense clusters.
However, in all cases, CEC-GP has a longer cluster head duration which reached 183.2sec for AGG = 0.2 has been observed in fig. 9 and has not decreased to lower than 168sec as shown in fig. 11. In addition, has been observed that kmeans had the least value of cluster head duration because of the non-stable clusters provided by this approach.
A third metric is analyzing the cluster member duration that presents a third aspect of the performance. As it is shown in fig. 12, fig. 13 and fig. 14.  The cluster member duration for CEC-GP was higher which shows an indication of the stability of the clustering in addition to the cluster head duration. CEC-GP provide higher values of the cluster member duration with improvement when the traffic level was increased.
Besides that, has been found that CBSC has generated higher value of the cluster member duration comparing with EDCA.   Also, from the observation found that the increasing AGG leads to lower value of ACM duration. Similarly to the cluster head duration, k-means has the lowest cluster member duration comparing with the other approached.
The last metrics is generated the number of the clusters. In any clustering algorithm for the VANETs that aimed to provided higher efficiency with lower number of the clusters.
As from the observation in the fig. 15, fig. 19, and fig. 20 the number of the clusters is lower for CEC-GP for 169918 VOLUME 8, 2020   all levels of the traffic comparing with higher number of the clusters for other approaches CBSC, EDCA and k-means.
This study depicted the time series of the number of clusters within the experiment period. As shown in fig. 16, fig. 17 and fig. 18 for level 1, 2 and 3 density of traffic.
As from the observation, the clusters numbers have more volatility in CBSC than CEC-GP in the case of traffic levels 1 and 2. This is interpreted by the stability of CEC-GP even when the number of vehicles is lower and sparse.
Another observation is the consistency of the value of the number of clusters for ECE-GP regardless of increasing the level of the traffic, which was not observed in the other three benchmarks. Hence, ECE-GP is superior from this perspective, i.e. it provides a lower number of clusters with higher efficiency. Lastly, from the observation found that k-means has the highest value of the number of the clusters compared with the other approaches.

V. CONCLUSION AND FUTURE WORKS
This article has discussed the clustering of the VANETs in highway environment using a center-based approach in order to exploit the increasing range of V2I communication for LTE and to provide a global view-based clustering, which makes the decision more stable and predictive. The approach is based on incorporating vehicles' mobility information based on an effective phase of data summary named grid partitioning, which has the role of dividing the environment into grids. Each grid gives a high-level entity for assisting in the clustering decision of creating outliers or clusters. The approach is presented as a general framework for the VANETs clustering that covered all the processes of clustering including assigning, cluster head selection, merging and removing. It was evaluated in three levels of the traffic generation and the clustering performance metrics were generated and analyzed. The results showed the superiority of the approach over the three benchmarks that have selected based on center-based clustering. The superiority is observed from various perspectives including the efficiency, stability and lower number of clusters. Another aspect of the observed performance is the consistency in general in the performance measures regardless of the changes in the traffic level. Future work is to augment the features of the mobility to include in addition to current time mobility variables the forecasted values based on the time series predicting algorithm.