Multi-Dimensional Affinity Propagation Clustering Applying a Machine Learning in 5G-Cellular V2X

Cellular systems are facing the ever-increasing demand for vehicular communication aimed at applications such as advanced driving assistance and ultimately fully autonomous driving. Cellular Vehicle to Anything (C-V2X) has become more applicable with the release of the ﬁrst sets of 5G (5 th Generation) system speciﬁcations. The highly capable 5G systems will therefore support even a larger number of moving objects. This study aims to present a sophisticated clustering mechanism that enables cellular systems to accommodate a massive number of moving Machine Type Communication (MTC) objects with a minimum set of connections while maintaining system scalability. Speciﬁcally, we proposed Normalized Multi Dimension-Afﬁnity Propagation Clustering (NMDP-APC) scheme and applied it for Vehicular Ad hoc Network (VANET) clustering. For VANET clustering formation, our study employed Machine Learning (ML) to determine the granularity, i.e. , the size and span of clusters desirable for use in dynamic motion environments. The study achieved a sufﬁcient level of prediction accuracy with fewer training data through a learned prediction function based on the selected key criteria. This paper also proposes a system sequence designed with a series of procedures fully compliant with C-V2X systems. We demonstrated substantial simulations and numerical experiments with theoretical analysis, speciﬁcally applying soft-margin-based Support Vector Machine (SVM) algorithm. The simulation results conﬁrmed that the granularity parameter we applied fairly controls the size of VANET clusters although vehicles are in motion and that the prediction performance has been adjusted through controlling of key SVM parameters.


I. INTRODUCTION
The concept of connecting vehicles to anything (V2X) opens a new paradigm in vehicle transportation, leveraging the power of wireless communications [1]. It is easy to imagine that the way of our daily life will be greatly changed even with realization of one of the potential scenarios, e.g., autonomous driving. For realizing such highly demanded applications and use cases of V2X, aggressive inter-industry discussions have been underway. 5G Automotive Association (5GAA), a newly created cross-industry organization of automotive and telecommunication, has been working on The associate editor coordinating the review of this manuscript and approving it for publication was Daxin Tian . next-generation connected mobility and automated vehicle solutions, showing strong initiative and collaboration toward V2X service realization [2]. IEEE 802.11p-based Dedicated Short Range Communication (DSRC) technology is already available for inter-vehicular communication [3]. In response to more advanced demands and requirements and various scenarios of V2X [4], a global standard called C-V2X has been recently specified by 3GPP [5], [6] with the 5G network systems [7], [8]. C-V2X has the capability to provide both shortand long-range communication modes that work interactively to fulfill various scenarios. Scenarios of short-range direct connectivity include vehicle-to-vehicle (V2V), vehicle-toinfrastructure (V2I) and vehicle-to-pedestrian (V2P) communications. The short-range mode works independently of cellular systems in the dedicated ITS 5.9 GHz spectrum band [9]. Long-range network connectivity (V2N) of C-V2X provides broadband and inter-system communication capability through a Public Land Mobile Network (PLMN), i.e., a cellular network via LTE and advanced 5G access [10], [11]. Typical demand for low latency and broadband connectivity to V2N stems from the demand for vehicular edge computing. Automotive Edge Computing Consortium (AECC) [12] has been established specifically to drive vehicular computing infrastructure for forthcoming automotive big data applications, intelligent drive management, acquisition of 3D map data and much safer driving assistance through reliable and real-time constant connectivity between the edge cloud system and vehicles [13].
For consistent system connectivity and quality, scalable access management is essential as it helps systems to avoid access storms caused by a massive number of MTC objects. Widely available cellular networks are susceptible to access stagnation caused by spontaneous access attempts of a large number of vehicles, moving drone objects and dynamic IoT devices. The clustering capability of VANETs largely contributes to reducing the number of direct connections to cellular networks. Such clusters formed by a set of vehicles connected via short-range communication can mitigate access storms by grouping member vehicles and eventually provide further access stability for PLMNs through such aggregation effects [14].
Our study aims to identify a valid VANET clustering scheme which able to meet such demands effectively. In the process of VANET cluster formation, a number of considerations need to be taken into account since vehicles are in motion. Fig. 1 depicts an image of VANET cluster formation and its reformation due to motion dynamics. Once a cluster is formed through message exchanges via inter-vehicular communication, the identified Cluster Head (CH) becomes a proxy toward the PLMN as shown in Fig. 1(a). At time = t + n in Fig. 1(b), t and n ∈ R > 0, the form of the original cluster has changed due to further dynamics. A clustering scheme used in such dynamic environments therefore should be able to effectively adapt to the dynamics of such constantly changing traffic. To address this requirement, we selected Affinity Propagation (AP), a mathematically well-proven and highly reliable scheme and applied its logic to VANET clustering. By employing the fundamental concept of AP, this study extended its key decision function in order to find out a scheme highly adaptable to the motion dynamics of VANET in real time.
To identify an effective scheme, we had to determine a desirable VANET clustering size by exploring appropriate criteria and conditions. Specifically, we employed a concept of ML to dynamically determine the cluster size based on minimum criteria. Namely, the aim of this study is to propose a novel vehicle clustering scheme applying AP to adapt to real-time motion dynamics of target vehicles by extending previous work [15]. Moreover, this study employed a ML scheme for deducing adequate clustering granularity in consideration of the key attributes for 5G Cellular V2X.
The contributions of this paper are summarized as follows: -Vehicle clusters are autonomously formed through NMDP-APC process in consideration of real-time motion dynamics; -The cluster granularity is designed to be derived from a ML algorithm with minimum criteria; -Traffic density and cellular congestion states are specifically considered for granularity determination; -A fully distributed system is designed with its entire procedure and sequence in line with 5G C-V2X; and -Numerical evaluation and substantial simulations are performed by reflecting theoretical analysis. The proposed scheme is applicable not only to vehicular communication but also to a wide range of other moving MTC objects including a large number of mobile robots and emerging drone-like objects aiming for future applications [16]. This study therefore must have a large potential to be usefully applied to a wide variety of mobile MTC applications.
The rest of this paper is organized as follows. Section II provides a review of the related works. Section III explains the proposed VANET clustering scheme applying NMDP-APC to message sequences. Section IV introduces our granularity determination scheme employing soft-margin-based SVM machine learning to find an appropriate cluster size. This section also provides mathematical analysis of SVM. Finally, Section V provides sets of simulation results of NMDP-APC clustering and the prediction accuracy of granularity with different key SVM parameters. Finally, Section VI concludes this study by summarizing the findings.

II. RELATED WORKS A. VANET CLUSTERING TECHNIQUES
Identifying a valid clustering technique is an important research subject for VANETs not only to organize and manage VANETs but also to efficiently integrate a short-range vehicle network with a wide-range network. Vehicular clustering offers various benefits such as dynamic topology stabilization, improvement of routing efficiency and minimization of control overhead, and specifically provides scalability by limiting the number of accesses to a PLMN [14], [16]. Clustering improves system stability by localizing the control VOLUME 8, 2020 targets and reducing signals for avoiding global propagation. Additionally, as VANET clustering reduces the impact of dynamic topology changes, the structure of network system becomes more manageable and stable [16]. An overall taxonomy of clustering techniques and protocols in various VANET applications in each category are well summarized in [17]. An extensive survey on clustering techniques is conducted by Cooper et al. [18], which analyses a variety of VANET clustering techniques to address such points as cluster head selection, cluster affiliation, and cluster management.
The selection metrics of VANET clustering scheme observed vary: some schemes use single-metric and others multi-metrics. Such variations can be seen in [17], [18]. Specifically, mobility-based clustering which focuses on inter-vehicular mobility dynamics is identified as a key direction in contemporary study. A reference [19] outlines a mechanism of VANET-UMTS integration and proposes VANET dynamic clustering mechanism using specific multi-metrics for finding a minimum number of vehicular CHs. Although it proposes a comprehensive mechanism targeting a dynamic mobility environment, its realization would be challenging as it requires the inclination angles toward eNBs and between vehicles. Studies on a multi-hop clustering scheme with IEEE 802.11p and 4G hybrid systems are introduced in [20], [21]. For CH selection, this scheme proposes a relative mobility metric calculated from the averaged relative speed with respect to the neighbor vehicles. However, the proposed procedure will require a large state management effort using a specific state transition matrix to address the needs of considering multi-hop inter-vehicular relations.
Generally, VANET clustering schemes can be categorized as either centralized-or distributed-clustering schemes. In centralized clustering schemes, cluster formation is performed via a NB or Rode Side Unit (RSU) with periodical message exchanges to the target vehicles. Qi et al. [22] proposed an SDN-based centralized clustering scheme by exploiting a social pattern, i.e., knowing the vehicles' traffic pattern to deduce an expected route, where the metrics of inter-vehicular distance, relative speed and vehicle attributes are used. In distributed clustering schemes, cluster formation and CH selection are performed only between the vehicles through message exchanges without involving managements of central node in a fully distributed manner. The centralized clustering approach is advantageous in that it has more computational resources and managements available in the central entity. On the other hand, from the standpoint of nomadicity of vehicular mobility, the distributed clustering approach has advantages that it can avoid signaling concentration and a single point of failure; specifically, it allows autonomous decision making among widely spanned vehicular targets.
In contrast to those conventional clustering schemes, an advanced, fully distributed VANET clustering scheme was proposed [23] by Hassanabadi et al. [24] and by Shahwani et al. [25] based on a mathematically well-proven concept called Affinity Propagation, which was originally published in Science by Frey and Dueck [26]. In these initial Affinity Propagation Clustering (APC) applications [24], however, the metrics used in the similarity function are vehicles' distances in the future prediction time point set to 30 sec, which means individual vehicles' positions are forecasted in 30 sec later. It is obvious that it may be subject to large prediction errors as the vehicles are in motion. The scheme described above uses a message to exchange APC information every second. This messaging process need to be preliminarily iterated around 10 times, which requires 10 sec for clustering formation. This value is not trivial as vehicles are in motion and their locations could change by more than 200 meters during the elapsed time in a high mobility environment such as on a highway.
In contrast to the conventional application of APC, our proposed NMDP-APC in C-V2X is designed to target the following three key advantages. Firstly, our proposed scheme uses the vehicle's current position and velocity that are not associated with any predicted value. Secondly, our scheme autonomously deduces a desirable clustering granularity not only by knowing vehicles' motion dynamics but also by incorporating the traffic density and access congestion status by applying ML. Thirdly, it leverages the power of C-V2X technology for more improved performance, i.e., a short uplink transmission interval of 0.1 sec [27] with higher reliability and lower latency [9]. In a system-wide perspective, we designed our scheme by applying mobility-based metrics, specifically the position and velocity, in single-hop clustering formation, taking advantage of C-V2X wireless, which spans enough in one hop, and a distributed-clustering scheme, which totally fits vehicles' nomadicity.
Reviewing traffic models is an additional field that we surveyed to understand the behavior of automotive traffic dynamics. Studies in this field have been traditionally performed mainly in civil engineering and traffic science for analysis and design of safe and efficient real traffic mechanisms. Gazis-Herman-Rothery (GHR) car following model [28], which we applied in our work, is a well-studied and sufficiently proven model. We applied the traffic model in our simulation by referring to the mathematical explanation shown in [29]. This model is particularly useful as it enables us to obtain details of vehicles' motion dynamics, such as acceleration, velocity and position information of every moment. Since the data acquisition of dynamically moving objects is sometimes limited, the aforementioned GHR car following model is significantly helpful and the details obtained through the use of the model are applied onto real traffic data, which will be demonstrated in Section V.

B. ML FOR VEHICULAR COMMUNICATION
Key salient features of vehicular communication to be applied to succeed in 5G systems are summarized by Shah et al. [30]. The importance and mechanism of 5G Mobile Edge Computing (MEC) with VANETs are analyzed in [31] and Ning et al. [32]. Low latency is an essential key capability for VANET networks, especially when the target application 94562 VOLUME 8, 2020 involves unmanned vehicles that generally require ultra-low latency capabilities of the system. MEC can process and deliver data quite efficiently by pushing the computing resources to the Mobile Vehicular Cloud (MVC) located at the edge of the network as shown in [33] and Ma et al. [34]. It is also notable that a game theory for VANET clustering is employed by Khan et al. [35] to find a balance of total throughput capacity in consideration of the clustering granularity, i.e., population share. In [35], the RSU is used as a centralized controller to signal an overall observation, compute the average payoff of the entire population of clusters and then broadcast it to all clusters.
Potential application of ML over vehicular networks was investigated by Ye et al. [36]. A framework for ML on vehicular networks was reviewed by Liang et al. [37]. Tian et al. proposed an application of a Hebb neural network, which intelligently learns the topology and forms vehicle clusters [38]. An application of Q-learning for cluster head identification is introduced by Khan et al. in the proposal of two-level clustering [39]. Level-1 CHs are selected by a fuzzy logic algorithm using three metrics: relative velocity, multiple-connectivity factor and link-reliability factor. Level-2 CHs are determined by an improved Q-learning theory as an optimal gateway to the 5G node-B. However, no clear proposal has been presented as to how an identical number of clusters can be identified in these studies. Support Vector Machine (SVM) is one of the well-known classification schemes in ML, which enables categorization of data into several classes in a sophisticated manner. An organized research on classification mechanism of SVM was provided by Junli and Licheng [40]. The fundamental logic and mathematical model of SVM are well explained in [41]- [43]. The application of SVM for the purpose of data-offloading in a VANET was provided by Wu et al. [44], proposing the use of the SVM classification scheme for deciding divided tasks to be executed either locally or for offloading. Application of ML to VANET clustering still seems to be in the advent phase. Hence in this study, we employed the power of SVM specifically for the determination of the VANET granularity in fitting the characteristics of nomadic vehicle dynamics, which represents our novelty.

III. PROPOSED VANET CLUSTEIRNG SCHEME A. AFFINITY PROPERGATION FUDAMENTAL
We applied AP to VANET clustering in knowing the vehicles' short-range communication capability that enables information exchange for clustering. Such inter-vehicular communication is an essential function of APC as clusters are formed through iterative message exchanges among the surrounding neighbors. Eventually, the scheme identifies the best set of clusters by recursively exchanging specific messages until a set of exemplars, i.e., a CH is identified through evaluation of similarities via a specific function.
In APC, a similarity measurement of each data pair between the points k and l, denoted s (k, l), is iteratively evaluated, and thereby exemplars emerge, where k = (1, 2, · · · , K ), l = (1, 2, · · · , L), k and l ∈ N. This process allows each data point to be a possible exemplar, which will exchange messages with other exemplars until a set of high-quality exemplars emerges. Two kinds of messages are exchanged between the points, i.e., vehicle k and l. First, a responsibility message r (k, l) sent from k to l represents how adequate vehicle l is as an exemplar when other competing exemplars are taken into account. This is deduced from similarity evaluation (1). The responsibility message r (k, l) and the self-responsibly message r (l, l) in each iteration are given by: Second, an availability message a (k, l) reversely sent from the exemplar l back to the point k shows l s suitability to be an exemplar for k, considering the feedback from other competing potential exemplars. The availability message a (k, l) and the self-availability message a (l, l) in each of the iterations are given by: In the iterative process, the responsibility and availability messages are processed with a damping factor λ to avoid computational oscillations. This is done by message new = λ·message old + (1 − λ) · message new , where λ is a damping factor whose value is assigned between 0 to1 [45]. Once the value of net-similarity converges, the APC process terminates its iteration. Eventually, a specific data point is identified as a CH among the cluster members, and this is determined when the data point's self-responsibility and self-availability meets the following condition for the point k if r(k, k)+a(k, k) > 0, then the point k is an exemplar, or for the point k if r(l, l) + a(l, l) > 0, then the point l is an exemplar. By obtaining the result values of responsibilities and availabilities from the iterations, exemplars are identified. Upon convergence, each node k s CH is identified by: The identified CH acts as a gateway for the VANET cluster members, which greatly contributes to the reduction of the number of connections to a PLMN.

B. NMDP-APC CLUSTERING
Having the fundamental concept of APC, our proposed VANET clustering scheme NMDP-APC enhances its similarity function by combining the Euclidian special distance, i.e., the vehicle's position information along with its velocity VOLUME 8, 2020 information in normalizing each term. We identified the most critical metrics that represent motion dynamics of a VANET are the vehicle's current position and velocity [19], [21]. Some studies consider the link strength for a clustering criterion with multiple wireless hops. Instead, we designed our system in considering the inter-vehicle distance and relative velocity in the formation of single wireless-hop clusters. This results in a firm VANET cluster stability and the inter-vehicular distance that can be considered proportional to the radio strength of a pair of vehicles. Therefore, we designed our similarity function by associating these two essential terms into a single formula as shown in (6). It is formed by a combination of the negative Euclidean distance of each vehicle's current position with the negative difference of each vehicle's current velocity. In order to associate the two terms in a single similarity function, each term has been normalized from 0 to 1, as denoted ''nor'' in (6). We also assigned the weighted parameters ϕ and ψ on the normalized components as the control nobs of each term. This capability provides significant extendibility since it allows multiple dimensional terms to fit into a single similarity function with adjustment of the proportion given by the weight parameters.
We also assume that this scheme is mainly applied to the uplink connection to a PLMN. Eventually, our enhanced similarity function is shown as: where ϕ + ψ = 1 where x k is a vector of the node k s current position in the x and y coordinate and v k is a vector of the node k s current velocity in the x andy coordinate. Note that in contrast to the previous study [24], our proposed similarity function in NMDP-APC does not compose any future prediction value. This helps avoid any associated prediction errors, which is a key difference from related previous works. The following subsection explains the sequence of clustering scheme we enhanced over the PC5-based C-V2X message sequence.

C. PROPOSED CLUSTERING SEQUENCE
Through message exchanges between the points, the similarity function identifies the homogeneity of vehicles' motion dynamics and then decides the formation of sets of clusters.
In the C-V2X direct communication mode, the members use the PC5 interface designed for direct V2V communication.
The vehicles communicate with the neighbors of C-V2X PC5 radio range. All required APC messages are sent on the application layer over the short-range interface, which is considered approximately a 500 m to 1 km radius in reliable communication [27]. As previously shown in Fig. 1, cluster formation takes place at Time = t and at an interval of time later. At Time = t, a cluster is formed based on the APC clustering scheme and identified as a CH with the members denoted as vehicle-B and C in the diagram.
Reflecting the vehicles' motion dynamics, the associated members are updated at Time = t + n as in Fig. 1(b). In this case, although the CH has not changed, some members have been changed: vehicle-C has disappeared from the original wireless coverage and vehicle-D has become a new member because of its similarity evaluated by the clustering scheme.
In this diagram, vehicle-E stays independent from the cluster as it has an intention to be a stand-alone or the dynamics makes it less likely to be a member of the cluster. Now, Fig. 2 shows the enhanced signaling sequence exchanged in the above scenario. VE represents the Vehicle-UE that complies with 3GPP C-V2X scheme.
Step 1 indicates the provisioning process for one-to-many and PC5-based communication between the VEs. The provisioning includes three important aspects: a) identifying authorization policies and parameters; b) setting up V2X direct communication parameters; and c) setting up radio parameters for notserved by 5G-RAN. The authorization policies and parameters include UE's authorization parameter from a PLMN used to prevent malicious UEs from engaging. The V2X direct communication parameters include V2X-Layer2-ID, IPv4/v6 preferences and Application Layer IDs, which are used by the VEs to perform one-to-many communications. Radio parameters for VEs not-served by 5G-RAN provides the capability for one-to-many V2X communication in considering geographical dependencies of frequency bands. In Step 2, each VE starts to exchange messages to surrounding vehicles by looking at the destination and source V2X L2-IDs on a specific C-V2X short-range communication link.
Step 3 is the message exchange of NMDP-APC process using one-to-many PC5 direct communication among the neighbors. Clustering Request is a specific message sent from a VE intending to form a cluster. Such request message can be sent from any of the neighbors such as VE-A, B and C. Triggering reception of the Clustering Request message, these vehicles initiate the message exchange of NMDP-APC.
Step 4 indicates the results of clustering formation and the CH identified after the APC process for the cluster indicated in Fig. 1(a). When a VE becomes a member of the cluster, the VE will enter In-Clustering State and then inform neighbors that it has becomes a member of the cluster. Sometime later as in Fig 1(b), VE-D initiates Step 6 and 7, which are identical to Step 1 and 2 respectively, and it also intends to form a cluster. After the provisioning process, an additional Clustering Request messages is sent from VE-D as shown in Step 8, and then an updated cluster is formed via additional NMDP-APC process considering the dynamics of the neighbors. Note that VE-E remains stand-alone as it has independent dynamics or it has an intention to be in standalone communication mode. All parts shown colored in-red are the specific enhancements that we proposed over the C-V2X original sequence [46]. The Fig. 2 sequence covers most of the scenarios by which a new VE joins in and evacuates from a cluster. When CH needs to be changed or replaced, V2X-to-NW Relaying function can be applied to CH relocation process between the new and old VEs, which will be explained later in Sec IV-C with its the procedures.
We have so far explained the fundamental mechanism of NMDP-APC, referring to a scenario that takes account of VE's motion dynamics. We have yet to offer explanation, however, for the question as to how the clustering granularity should be determined. We employed a ML scheme to challenge the question while vehicles are dynamically in motion.
The following section will explain the ML scheme, starting from its concept.

IV. GRARNULARITY DETERMINATION BY SVM ML A. OVERVIEW OF VARIATIONS OF ML
ML enables computers to identify hidden insights through iterative learning of given data sets. Table 1 summarizes the variations of ML in terms of categories, objectives, algorithms and application examples in communication fields [36]. In a large perspective, each ML application is classified as either a supervised or unsupervised learning scheme. In this study, we chose a supervised scheme because the training data are fairly obtained and it enables us to evaluate the prediction results by comparing with those given from the decision tree. In comparison with SVM, Neural Network (NN) is generally considered more applicable when the dimension of the explanatory variable is large. In addition, SVM is known to enables the tuning of prediction accuracy by the limited number of parameters, which are γ and C parameters. Reflecting these properties, we decided to employ SVM to identify the clustering granularity in this study.
In general, the process of supervised ML consists of two stages: training and testing. In the training stage, a model is learned based on a set of pre-prepared training data. Once the function is educated by the training data, the trained function is applied for providing predictions in the testing stage. The next sub-section explains the details of the task vector and the decision tree we propose, both of which are the essential elements of SVM.

B. PROPOSED TASK VECTOR AND DECISION TREE
It is generally known that obtaining an applicable trained decision function f (X) requires a certain volume of training data. In this study, a predefined decision-tree is applied for identifying the level of clustering granularity, i.e., an objective variable y from a set of explanatory variables, which consists of the task-vector X. SVM is known to be well applicable for its sufficient volume of data and a clear relation of explanatory variables to the objective variable [47].
With the mapping decision of the task-vector X to an objective variable y, the classification is provided as D = {(X, y)|y = sign[f (X)]}. In this study, we decided to classify the clusters in four levels based on the size: large, medium, small-medium, and small. In consideration of the obtained real traffic data and the span of C-V2X radio propagation, we set a four-level classification, although theoretically the number of levels can be changed to any. Numerically it can be shown as D ∈ {0, 1, 2, 3} corresponding to the above four levels of cluster size. We designed the task-vector X consisting of the following minimum set of explanatory variables for deciding the cluster size. We kept the number of elements to a minimum, otherwise it would have led to a longer computational time and potential deterioration of prediction performance because the target vehicles are in motion.
In (8), I v is the communication volume, which we assumed to be 100 kbps uplink data randomly generated in a certain interval from each vehicle. F p represents an initial preference as to which mode is used to establish a session: clustered mode or individual mode, which can be denoted as F p ∈ {0, 1}. The preference value of F p = 0 represents an intention of independent session establishment, in which case a session is established directly and connected to a cellular network. A broad-band data upload could be a use case. On the other hand, F p = 1 represents an intention to form a cluster, in which case a session will be established via a CH. The third attribute constituting the task vector is the traffic density of vehicles shown as T d . We obtained real traffic data from a highway and applied them in our simulation. According to the obtained traffic data, the maximum number of observed vehicles in a 2 km span, i.e., in a 1 km radius, was almost 60 in a single direction. The vehicle density was obtained from the number of vehicles in the observed range. The last attribute constituting the task vector is the congestion status of a PLMN access, denoted as Nb c , the set of which can be denoted as Nb c ∈ {0, 1}, where Nb c = 0 represents a normal access state and Nb c = 1 represents a congestion access state. When a particular access is in the state of congestion, the number of clusters is made smaller, as aggregation effect contributes to reducing the number of the cellular access connections. This parameter is therefore factored in our task-vector. By incorporating these explanatory variables in the taskvector X, we developed the following decision tree to process a set of training data that enables deduction of an appropriate size of clusters. Fig. 3 shows our proposed four-layer decision tree with an essential and minimum set of decision criteria to deduce the level of clustering granularity. The aforementioned NMDP-APC enables the formation of a desired clustering granularity by assigning a granularity parameter-px. The following sub-section introduces the overall clustering procedure and shows how the granularity parameter-px is applied in the system perspective.

C. PROCEDURE WITH ML DEDUCED px-VALUE
Once a px value is given, NMDP-APC enables the control of the desired clustering granularity in the process. We employed a ML scheme to obtain the desired clustering size from the trained decision function f (X) as this process identifies the desired px value. Table 2 explains how NMDP-APC process uses the ML-deduced px-value segmenting three procedures parts. As Part-A has been mostly explained with the sequence in Section III-C, let us focus on the procedure in Part-B and C for the sake of paper space.
Part-B covers the procedure of an NMDP-APC reclustering case for a member change.
Step 06 and 07 is the provisioning steps identical with step 01 and 02, respectively.
Step 08 initiates re-clustering process triggered by a Clustering Request Message (CRM) from a VE which has engaged in a clustered V2X wireless range.
Step 09 starts another NMDP-APC process with a ML driven px-value to form a new cluster, aiming for cluster maintenance upon receiving the CRM.
Step 10 is a process in staying original CH and the CH updates the associated members' context information.
Step 11 is a process triggered by a VE about to leave the cluster because of wireless range-out or re-clustering request reception from a member. Then, step 12 starts another NMDP-APC process with a ML-driven px-value to form a new cluster in the same manner as in step 09. The CH only maintain the context of updated cluster members. Part-C provides a set of procedures when initial CH replacement is required. In step 13, a CH receives a CRM from a VE.
Step 14 initiates Cluster Maintenance by executing another NMPD-APC process. In step 15, an original CH is recognized to be replaced and CH NEW is identified. Then in step 16, CH OLD and CH NEW exchange the context of associated members' information through the V2X-to-NW relay [46].
Step 17 is an annex procedure that enables a recovery from an abnormal state to a normal state. Then, the following section introduces mathematical analysis of the soft-margin-based SVM classification model and the kernel operation which are specifically employed in this study.

D. SVM CLASSIFICAITION MECHANISM 1) MECHANISM OF SOFT-MARGIN-BASED SVM
The fundamental concept of SVM originates from a neural model applying a linear-threshold logical unit, which eventually provides class discrimination. When a set of training data is given in with the corresponding correct class-labels y M , the linear discriminate function can be denoted as X is a formerly mentioned task-vector consisting of explanatory variables. W is the corresponding synapse weight-factor, and h represent a threshold value. This model provides an output y ∈ {±1}. It returns +1 when the inner product of the task vector X and weight-factor W T exceeds the threshold, and it returns −1 when it is less than the threshold. Geometrically, this concept is applicable as a classification system of input data, which segments data into two fields as depicted in Fig. 4(a). The figure shows two types of class-regions R 1 and R 2 . They are distinguished by the corresponding hyper planes H 1 : W T X − h = 1 and H 2 : W T X − h = −1, which are formed by a limited number of double-circled support vectors. Notably, no data exist within the margin-area formed by the two hyper planes [42]. An optimal hyperplane H * can be identified by maximizing the margin between the hyperplanes given by 1/ W , which is an equivalent question to one for maximizing W 2 . This leads to the following constrained optimization problem: Although we assume the data are linearly separable, in practice it is often not the case. Furthermore, a too strict policy results in an excessive margin aimed at avoiding missing classification of some points. To cope with this problem, a concept of slack-variable ξ i has been introduced to allow some errors instead of having a too wide margin. This is given by replacing the above inequality constraints on to: where slack-variable ξ i allows those data to be in a margin when 0 ≤ ξ i ≤ 1 and causes it to be misclassified when 1 < ξ i . Since an input training datum is misclassified when the slack-variable value is larger than 1, i ξ i is a bound on the number of misclassified data. With the objective of maximizing the margin, i.e., minimizing 1/2 W 2 , it will be augmented to penalize the misclassification and margin error given by a term of C i ξ i . Now optimization problem can be shown as: This formula has been called soft-margin SVM as it has a specific cost-parameter C. This parameter introduces additional control capability which enables adjustment of the balance between the amount of margin allocation and the amount of slack allowance [41]. In a graphical representation on Fig. 4(a), when a smaller C is assigned, the indicated margin becomes wider. It has been known that the solution of this optimization problem can be obtained by the saddle point of the Lagrangian [40]. Introducing two Lagrange multipliers α i ≥ 0 and β i ≥ 0, reformulate the objective function into: Lagrangian duality enables this primal problem to be transformed to the Wolfe-dual problem, which is given by: By minimization with respect to W, h, and ξ i of the Lagrangian L, a partial derivative is taken on each of them: By substituting them onto (13), the optimization problem is transformed into the following dual Lagrangian formula [48]: VOLUME 8, 2020 Then, the optimal weight vector W * can be given by: Optimal threshold h * can be given by: Finally, the decision function is provided by:

2) SVM KERNEL OPERATION
The aforementioned soft-margin SVM provided additional solution by introducing cost and margin allowance to solve non-linear separable test data. This approach still faces a limitation however, when the data distribution is fundamentally linearly inseparable, i.e., stars from dots, as shown in Fig. 4(b). To cope with such cases, a kernel operation has been introduced for further enhancement of SVM. In a case where a linear boundary is inappropriate, the kernel operation maps the input vector into a higher dimensional feature space. By choosing a non-linear mapping, SVM becomes capable of constructing an optimal hyperplane in a higher dimensional space , of which image is shown in Fig. 4(b). K (X i , X j ) represents a kernel function which performs such non-linear mapping into a higher dimension feature space as in (X 1 ) T (X 2 ) = K (X 1 , X 2 ). Such kernel offers an efficient and less expensive way to transform data into a higher dimension. It allows operation in the original feature space without computing the data in the new dimension space [42], [49]. Polynomial, gaussian and sigmoid kernel have been known for practicality and computational advantages [40], [41]. Especially, Gaussian Radial Basis Function (GRBF) is one of the widely used SVM kernel types, which is given by following form: where the newly introduced γ parameter controls the degree of linearity, i.e., the curvature of the hyperplane which is more flexibly able to fit to the distribution of sample data. Therefore, applying a leverage of kernel operation, it gives SVM control capability of the prediction performance by adjusting the both C and γ parameters. This is a large rationale of its wide application of SVM to solve various classification issues. With the kernel operation, the optimization problem of (18) can be written as: with same constrains provided in (19). Once (24) is solved with constraints (19) by applying the same approach, it becomes possible to determine the Lagrange multipliers and a classifier with an optimal hyperplane in the transformed feature space.
These are the fundamental mechanisms which we employed in our classification scheme. The following section will discuss the performance evaluation describing sets of simulations conducted. We used GRBF kernel with SVM and observe the prediction performance by adjusting formerly explained parameter C and γ .

V. SIMULATION RESULTS
This section describes extensive sets of simulations we conducted in order to evaluate and verify the proposed VANET clustering scheme. Before explaining simulation results, let us introduce the concept and mechanism of the GHR car following model employed although we briefly touched upon it in Section II. A well-organized reference [29] shows how the GHR model predicts a follower's acceleration ac using the following formula: for the n th vehicle (n = 1, 2, . . .) at discreet time t, where l represents a headway exponent, m represents a speed exponent and l,m represents a sensitivity coefficient of the GHR. From the derived acceleration, both v t n and x t n are identified from the Newton's law of motion. The follower's acceleration can be observed as if a spring has been placed between the ahead vehicle as in Fig. 5 depicted. According to [28], [29], we applied specifically the following parameters in the GHR model: l = 1.0, m = 0, l,m = 18 in order to perform simulations. For this simulation, we captured real traffic data on a four-lane Inter City highway (I-294) in Chicago, Illinois, USA on a day in August 2017. From the sampled real traffic data, we obtained leading vehicles' position and velocity information and then identified following vehicles' traffic information via the aforementioned GHR car following model. A leading vehicle is identified with a sufficient distance from a vehicle in front, which eventually tends to drive at its own pace. In contrast, a following vehicle tends to follow the leading car's driving pace as it has a limited headway distance. Using the data obtained with the above process, we traced each vehicle's traffic data in four lanes at 5 sec intervals from 5 to 60 sec in one-minute. Then, the data were processed in our NMDP-APC MatLab simulation to analyze how the proposed scheme identifies clusters with the CHs in the real traffic data.

A. GRANULARITY PARAMETER px AND CLUSTERS
The number or granularity of clusters can be controlled by varying the granularity parameter px in NMDP-APC process. Fig. 6 shows the relation of the number of clusters formed with the changing the values in our simulation. The px represents the degree of the multiplexity to the median of the distribution of similarity function in APC. The graph indicates how the cluster granularity can be changed when the value is adjusted from 0.1, 0.2, 0.5, 1.0, 1.5 and 2.0. The result clearly shows that the clusters are controllable through the adjustment of the value in the NMDP-APC. As mentioned in Fig. 3, when a small px-value is applied, a large number of small clusters are formed; in contrast, when a large px-value is applied, a small number of large clusters are formed. Now again, the px-value is deduced from the trained SVM prediction function and it is applied during the NMDP-APC according to the procedure shown in the Table 2. The following section shows more comprehensive sets of simulation results of NMDP-APC clustering at different observation time points.

B. CLUSTERING FORMATION IN SIMULATION
This sub-section shows the sets of results of NMDP-APC simulation conducted on Matlab to observe cluster formation for different granularity parameter px values which derived from the proposed ML scheme. With one-minute real traffic data available at hand, we evaluated the clustering performance based on the data collected at different time points. Fig. 7-9 display the clustering results we obtained until the observation field is filled by using the traffic data collected at 35 sec, which is the middle part of the entire data.    our consideration of the practical radio propagation span of C-V2X. We set the observation range to within 1 km, i.e., a 500 m radius, in this simulation as the reliable transmission range is considered to be a 500 m to 1 km radius [1], [27].    range, i.e., 1 km radius, seems to be a critical distance for reliable communication on PC5 based C-V2X [27]. Fig. 7 provides the result at t = 15 sec, the snapshot data from the overall observation in 35 sec. The top row in Fig. 7(a) shows the original vehicle distribution with the number of vehicles being indicated as N = 16. Fig. 7(b) shows the NMDP-APC clustering result with px = 2.0, which was   assigned to produce a large group of fewest clusters. The proportion of NMDP-APC weight parameter ϕ represents a position component and ψ represents a velocity component, which are assigned as ϕ = 0.8 and ψ = 0.2 respectively, and the same values were applied throughout this simulation. The number of clusters formed in this condition is Clusters = 3. CHs are located mostly in the middle of each group with the associated members linked by dotted lines. From the results, it is easy to understand that few large-sized clusters were produced because a large px-value has been assigned. In this granularity assignment, it can be observed that vehicles are forced to be clustered due to the extreme px-value applied. Fig. 7(c) shows the Net Similarity value in the APC iteration process. By reaching the stable net similarity value, the APC process identifies the final shape of stable sets of clusters. Fig. 8 and 9 provide the result sets applying the same px-value, then the NMDP-APC clustering status is observed at t = 25 and 35 sec respectively during the overall observation time of 35 sec. It can be observed that a higher number of vehicles come into the observed space, and the NMDP-APC is producing the same levels of granularity as the same px-value has been applied. Fig. 10(b) displays the NMDP-APC clustering result obtained with the granularity parameter px = 1.4 assigned to form medium sized clusters on the data at t = 35 sec where the observation space is filled. The number of clusters has increased to Clusters = 5 and it clearly shows that the number of members in a cluster tends to be smaller. Similarly, it is observed that CHs are located mostly at the center of the group and linked with the members in dotted lines. The rest of parameters are assigned in the same manner. Fig. 10(c) provides the evidence that the clusters have reached stability. Fig. 11(b) displays the clustering result obtained from assigning the granularity parameter px = 0.5, which is targeted at forming small-to-medium sized clusters on the same data at t = 35sec. As a smaller value of granularity parameter is assigned, the number of clusters has changed to Clusters = 6 and it is observed that members are more inclined to form independent clusters compared to Fig. 10(b). Fig. 12(b) displays the clustering result obtained by assigning the smallest granularity parameter value px = 0.1 at t = 35 sec on the same data used in Fig. 9-11. Now, the number of clusters largely has increased to Clusters = 13 and it is easily observed that the members are more isolated to other clusters as the extreme px value has been applied. Fig. 13-15 display NMDP-APC clustering results at t = 60 sec, the time when the entire traffic data are used. Now the observation span has exceeded 2 km and the number of vehicles has reached N = 61. In the same manner, NMPD-APC is applied for different values of clustering parameter px = 1.4, 0.5 and 0.1 respectively to observe the clustering status on the traffic data. Fig. 13(b) displays the clustering result obtained by applying px = 1.4 to form medium sized clusters. The number of clusters in this condition is shown as Clusters = 6. The same values of NMDP-APC weight parameter ϕ = 0.8 and ψ = 0.2 are applied respectively. Comparing this result with Fig. 10(b), although medium sized clusters are formed, Fig. 13(b) tends to consist of more members per cluster. We attribute this increase to the observation range that spans 2 km wide and the number of target vehicles that becomes almost double. The number of target vehicles performing NMDP-APC directly relates to the wireless propagation range of C-V2X.
As NMDP-APC process will be performed for only those vehicles able to reach the wireless propagation, we assume a 1 km radius to be a critical span. Fig. 14(b) displays the clustering result obtained by applying px = 0.5, which aims to form small-to-medium sized clusters on the same data at t = 60 sec and the number of clusters identified as Clusters = 9. Finally, Fig. 15(b) displays the clustering result obtained by applying px = 0.1, which aims to form smallest sized clusters on the same data at t = 60 sec and the number of clusters made in this condition rises to Clusters = 18 as an extreme px value applied, and the same tendency has been constantly observed through the simulations.

C. PREDICTION PERFORMANCE IN SIMULATION
In this part, we present sets of simulation results on the prediction performance of the soft-margin-based SVM-ML with GRBF kernel operation. We conducted these simulations in a Python coding environment as it has a series of well-prepared functions and libraries specifically for ML. As shown in Table-1, SVM and NN are the major candidate algorithms for classification. Simulation results obtained in using NN however provided an inferior performance compared to those in SVM in this simulation. We recognized that this is because our task vector X has only four dimensions and that algorithms using NN would be more applicable to a large dimensional problem case. Thus, we conducted further simulation focusing on the SVM by varying the formerly explained C and γ parameters. Fig. 16 shows the prediction accuracy obtained by applying constant C = 1.0 and then varying two types of γ -parameters, which are SVM-default γ and SVM-scaled γ . The SVM-default uses γ = 1/feature number, i.e., 0.25 as our task vector is 4 dimensional features. The SVM-scaled γ = 1/feature number multiplying by the variance of the task vector X. As the result indicates, SVM-default γ provided better prediction performance compared to SVM-scaled γ parameter. It is notable that reached 100% of production accuracy with the sampling data given around at 140. It is remarkable that such high performance was obtained even with the small number of training data. This is important specifically in a dynamic vehicular application scenario. We recognize that this is attributable to the compact structure of proposed decision tree, which only consists of essential criteria with a four-dimensional task vector. Now in Fig. 17, we demonstrated prediction performance evaluation comparing default-γ with C = 1.0 where best performance obtained in above, versus scaled-γ by changing parameter-C to observe how the value influence the result. As the theoretical analysis reviewed, the costparameter C controls the balance of margin allowance and miss-classifications. In this simulation, by increasing C value, improvements in the prediction performance have been observed in SVM scaled-γ . However, 100 % accuracy has not been reached in SVM scaled-γ although increasing the C value and even large volume of training data applied. From these results, it can be noticed that applying combination of  SCM default-γ with C = 1.0 potentially provides desirable prediction performance in the simulation condition.
Finally, we observed the statistical average access latency in 5G compared with LTE through a simulation, as in Fig. 18. The access latency was statistically measured from a CH to gNB and eNB, i.e., the Uu interface by assigning the target latency as the median value, i.e., µ 5G = 4 ms and µ LTE = 11.5 ms [3], [50]. The results are obtained by applying a truncated normal distribution [51] to avoid extreme and unrealistic values being produced. The simulation applied 10,000 times of sampling by changing different values of latency distribution variance σ 2 = 1.0, 5.0 and 10.0 then, we observed how the system stability influenced the average access latency in this simulation. The outcomes showed that the average access latency observed in 5G was 3.9 ms at σ 2 = 1.0, 5.8 ms at σ 2 = 5.0 and 7.8 ms at σ 2 = 10.0 respectively. In LTE access, the average access latency observed was 11.0 ms at σ 2 = 1.0, 12.0 ms at σ 2 = 5.0 and 15.4 ms at σ 2 = 10.0 respectively. A clear tendency observed was that when a large variance is applied, i.e., the system is instable, the average access latency has deteriorated. The result can provide an understanding that the more stable access is provided, the less access latency can be achieved. Therefore, our objective of minimizing the number of accessing objects can also contribute to provide the reduction of access latency, contributing to the system scalability for avoiding access storms from individual vehicles and mobile MTC terminals potentially to be deployed in a large number in the future.

VI. CONCLUSION
This paper has proposed an advanced VANET clustering scheme called NMDP-APC to form stable single-hop clusters in a distributed control manner. Specifically, in consideration of the real time motion dynamics of vehicles, two parameters, i.e., inter-vehicular Euclidian distances and its velocities, are assigned as the metrics without associating them with any prediction values. In the NMDP-APC process, we controlled the clustering granularity level by adjusting the granularity parameter. Particularly, this study deduced the desirable clustering size, applying a ML scheme which employs softmargin-based SVM-ML with Gaussian Radial Basis Kernel Function.
As ML has been known as a powerful tool in decision making, we applied it to find an ideal VANET granularity incorporating minimum sets of decision criteria. Relative to the limited and essential criteria, the ML prediction performance achieved satisfactory results with fewer training data. In addition, we designed a message sequence and procedure of NMDP-APC by associating them with the ML deduced clustering granularity. The procedure and sequence are enhanced on the existing 3GPP C-V2X specifications. With the PC5 interface, therefore, the proposed scheme can be easily implemented with emerging 5G cellular systems. Especially, the proposed scheme is designed through a distributed control approach to adapt nomadic vehicles' mobility. Through the simulations, the cluster formation and granularity control has been clearly observed through different values of the px-parameter. For improving ML prediction performance, two key parameters C and γ were adjusted to tune the prediction performance. The simulation results indicated that a particular selection of parameters resulted in a better performance. We also observed the performance with lesser access latency in a stable PLMN system via the simulation, which explains that the clustering capability is contributing to providing aggregation effects and thus it provides a stable system even when a large number of objects are connecting to a PLMN.
Although this study was originally targeted to vehicular applications, it has potential applicability to any machine type moving objects expected to increase reflecting the growing demand for such applications in 5G and beyond. Future work should focus on enhancing the proposed scheme by applying more complicated scenarios and/or using different types of ML algorithms.