UniDRM: Unified Data and Resource Management for Federated Vehicular Cloud Computing

The demand for computational resources in vehicular environments has increased due to the deployment of numerous intelligent transportation systems in the last decade. The federated vehicular cloud, a variant of vehicular cloud computing where resources embedded in individual vehicles are organized as a single unit to provide cloud services, is considered as an emerging alternative to the conventional cloud platforms for the execution of computationally intensive and delay-sensitive applications. However, the federated vehicular cloud is beset with a capacity-constrained communication channel and limited resource capacity in individual vehicles, leading to challenges in data and resource management. To address these challenges, we propose UniDRM, a unified data and resource management framework for the federated vehicular cloud. The UniDRM organizes vehicles on the road into clusters based on their mobility and resource characteristics, such as resource cost, resource credibility level, resource type, and available resource capacity. The data of computationally intensive tasks are then partitioned using our proposed analytical model and assigned to individual vehicles in the cluster for parallel execution. Three data partitioning and scheduling schemes: time-aware, cost-aware, and reliability-aware, are proposed in this study to execute time-critical tasks, low-cost tasks, and high-security tasks, respectively. Through realistic simulations, a comparative analysis of the proposed partitioning and scheduling schemes is presented.


I. INTRODUCTION
In the last two decades, Intelligent Transport System (ITS), which is the introduction of technology to make transportation safer and more efficient [1], has experienced many advancements. Notable developments in ITS include the design and deployment of smart vehicles that communicate with other devices.
Communication among vehicles, personal devices, and infrastructure along roads is enabled by Vehicular Ad-hoc Network (VANET), an infrastructure-less wireless network of vehicles. Through the 802.11p [2] and IEEE604 [3] VANET protocols, vehicles can connect to and communicate with devices in their transmission range, resulting in the Vehicleto-Everything (V2X) communication network.
Currently, V2X is receiving much attention and is expected to be deployed in cities and urban areas as an essential The associate editor coordinating the review of this manuscript and approving it for publication was Jagruti Sahoo. component of smart city systems. The full deployment of V2X will lead to a wireless network with hundreds to thousands of connected vehicles and devices. This envisioned large network in vehicular environments has led to the emergence of new computing paradigms such as the Internet of Vehicles (IoV).
IoV is defined as the intelligent integration of vehicles, people, and devices as a network to provide services [4]. It is expected to support ITS and smart city applications such as collision avoidance systems and citywide intelligent traffic control systems. With advances in communication technologies such as 5G [5], we believe that in the next decade, IoV will also serve as a dynamic backbone network framework for sharing Big Data and vehicular resources.
Through the IoV, Owners of vehicles with underutilized onboard computing resources can donate them via Vehicular Volunteer Computing(VVC) [6] or lease them in the form of cloud services. Similar to utility computing [7], clients can access the cloud services on a pay-as-you-go basis.
This novel paradigm of vehicular resource sharing is referred to as vehicular cloud computing. It was first proposed by [8]- [10] to provide cloud services such as storage as a service (StaaS), computation as a service (CaaS), network as a service (NaaS), and cooperation as a service (CaaS) [10], [11].
In order to efficiently provide the different cloud services, two main classes of vehicular cloud provisioning models have been proposed in the literature: the peer-to-peer provisioning model and the federated provisioning model [12]. This classification is based on the type of services provided, the number of vehicles involved in the service provision, and the resource management scheme used.
The peer-to-peer provisioning model is a decentralized model in which vehicle owners individually rent their resources to customers in the form of cloud services. Considering that the services are provided independently of the resources of other vehicles, all resource management tasks, including resource monitoring and task scheduling, are performed by the individual resource owners.
Since the peer-to-peer vehicular cloud provisioning model consists of resources from a single vehicle, it may not be suitable for running applications that require high resource capacity beyond what a single vehicular resource node can provide.
In the federated provisioning model, a centralized resource manager in a Region of Interest (RoI) of a road acquires resources from different vehicle owners and organizes them in the form of a pool to provide cloud services to clients as a single logical entity. A typical example of a federated vehicular cloud proposed in [13] is a citywide vehicular cloud platform consisting of resources from different vehicles under the management of a city government to provide security services to vehicles.
Unlike the peer-to-peer provisioning model, the federated provisioning model can serve applications that require resources with high computing and storage capacities by pooling many locally available resources together to form a single but larger resource image.
The federated vehicular cloud model is promising due to its potential commercial benefits and ability to perform computationally-intensive and data-intensive tasks. However, the main challenges of the federated vehicle cloud, namely, the capacity-constrained communication channel and limited resource capacity in individual vehicles, need to be addressed.
To address these challenges, we propose the UniDRM, aUnified Data and Resource Management framework for managing data and resources in a federated vehicle cloud by forming resource-based clusters, partitioning and scheduling data to the vehicular resources in the clusters.
The main contributions of the study are summarized as follows: • We propose and implement a resource-based clustering algorithm scheme that groups vehicles with similar mobility and resource characteristics to form a dynamic federated vehicular cloud.
• We modeled a mathematical scheme inspired by divisible load scheduling schemes for data partitioning and scheduling in dynamic federated vehicular cloud. A probabilistic approach to determine network delay values is modeled and also considered in the data partitioning and scheduling scheme.
• Using the formulated mathematical scheme and the multi-criteria decision tool TOPSIS, three different data partitioning and scheduling schemes for dynamic federated vehicular cloud, namely, Cost-Aware, Time-Aware, and Reliability-Aware partitioning and scheduling, are proposed in this study. The rest of the paper is organized as follows. Section II discusses the motivation and related work of the study. Section III provides details of the architecture of the proposed UniDRM. In section IV, the resource-based clustering algorithm for forming a federated vehicular cloud is presented. Section V presents the proposed data partitioning and scheduling schemes, and Section VI shows the performance analysis of the proposed schemes. Finally, section VII concludes the paper with a summary of the results and future work.

II. MOTIVATION AND RELATED WORK
The process of forming a federated vehicular cloud in an RoI on the road consists of organizing vehicles with sufficient resources into clusters and selecting cluster heads to serve as the cloud controller to manage the resources in the cluster.
Clustering is the grouping of vehicles in a region of interest based on predefined criteria such as speed, direction, and density of vehicles [14]. Many clustering techniques have been proposed in the literature to overcome node instability and provide an efficient message dissemination strategy in VANET.
Although clustering in VANET and vehicular cloud are all carried out in a vehicular environment, some key differences exist. In VANET clustering, the focus is to make network protocols such as routing and medium access control more scalable [15] for communication among all the vehicles in an RoI.
On the other hand, vehicle cloud clustering also attempts to optimize the selection of vehicles with sufficient resources and similar mobility characteristics as part of clusters to provide cloud services. Therefore, only nodes with sufficient resources willing to contribute to the vehicular cloud are considered part of the clusters.
Considering the differences between VANET and vehicular cloud clustering, existing VANET clustering techniques need to be improved to optimize their performance in the vehicular cloud.
In this regard, Arkain et al. [16] considered the formation of clusters in an RoI of a road to provide vehicular cloud services. A cluster head selection scheme based on a fit factor computed for each node was modeled using fuzzy logic. The study was focused on cluster head selection for vehicular cloud and did not provide any specific algorithm or criteria for clustering vehicles.
Studies in [17] proposed a framework that organizes vehicles into non-overlapping clusters with varying numbers of communication hops to offer Data as a Service (DaaS). They also proposed a scheduling algorithm that determines when transmission links are activated and deactivated within a cluster to ensure contention-free medium access among cluster members within a cluster. The scheduling scheme maximized the throughput and minimized the delays in providing vehicular cloud services.
In [18], virtual partitions serving as logical cluster zones were marked on roads in an RoI. The vehicles in a partition were considered as cluster members to provide vehicular cloud services. A leading vehicle is then selected as the cluster head based on its expected duration in the virtual partition. The lead vehicle is assigned the responsibility of managing the resources in its virtual partition and the communication between its cluster members and that of other virtual partitions.
For stationary vehicular cloud, linear clusters of parked vehicles were formed along roads in an RoI to provide storage services in [19]. The most suitable vehicles at the front and rear ends of a linear cluster were selected as cluster heads to perform resource and data management tasks.
Although some clustering algorithms have been proposed for vehicular cloud computing, none of them except for one considers the characteristics of the resources embedded in the vehicles, such as the type of resource, the cost of the resource, and the reliability level of the resource as criteria for the formation of vehicular cloud clusters.
In the only exception, Ridhawi et al. [20] proposed a service provisioning framework that organizes vehicles with similar mobility and service types into clusters. A cluster head, which serves as a directory for vehicles and the services they offer, is selected based on link duration, connection distance, neighbors, and service availability. The clustering scheme mainly considers the scenario where individual vehicles provide cloud services independently of other vehicles (peer-to-peer provisioning model). The scenario where a group of vehicles works together as a logical unit to provide only one type of cloud service (federated vehicular cloud) was not considered.
Distinctively different from the approach considered in Ridhawi et al. [20], we propose a clustering scheme where vehicles with the same resource types (i.e., storage, computation) that meet the requirements of the vehicular cloud to be provisioned are grouped as a logical unit to provide only a particular cloud service under centralized management, i.e., a cluster head (federated vehicle cloud). This study also addresses the challenge of data partitioning and scheduling for federated vehicular cloud.
Data partitioning is very significant in vehicular cloud as it ensures that parallelism is achieved through the Single Program Multiple Data (SPMD) paradigm. In the SPMD, data for a program is partitioned into multiple independent points and executed simultaneously by coordinated computing units [21]. Examples of SPMD based applications include feature extraction and edge detection in image processing.
Several techniques have been used to partition data in distributed computing environments. Key among them is the MapReduce technique. In MapReduce, data is analyzed in parallel by decomposing tasks and specifying a map function for processing and a reduced function for aggregating tasks [22], [23].
In [24] and [25], the MapReduce model was considered for partitioning tasks among compute nodes in conventional cloud computing and vehicular cloud computing, respectively. However, the MapReduce method does not explicitly consider vehicular nodes' frequent departure and joining of the vehicular cloud. This leads to higher waiting times when transmitting the results of partitioned tasks [26].
An optimal partitioning and allocation scheme based on the dynamic characteristics of vehicular cloud resources was proposed by [26]. In the study, the arrival and departure rates of vehicles in a parking lot were considered to model a distribution to determine the available resource capacity of vehicles. Based on the available resource capacities of vehicles in a parking lot, the optimal number of partitions for a given job and the chunk for each vehicle were determined.
Although the proposed algorithm is promising, the capacity of the communication link between vehicles and the delay in data transmission due to channel access delay were not considered in the algorithm design. Moreover, the static vehicular cloud environment (parked vehicles) used in the algorithm design may lead to performance degradation if the proposed scheme is deployed in a dynamic vehicular cloud environment (moving vehicles on the road) without any enhancement.
Considering that scheduling schemes that consider both resource capacity (computation and storage) and network bandwidth capacity requirements perform better than those that consider only either of them [27], we proposed and implemented an optimal load partitioning and scheduling algorithm for dynamic vehicular cloud that takes into account the resource capacity and network data transmission capacity under different network congestion scenarios in vehicular environment.
Our proposed partitioning and scheduling scheme was inspired by divisible load partitioning and scheduling schemes such as the Periodic Write-Read-Compute (PWRC) [28] and the Divisible Load Theory (DLT) [29], [30].
Divisible load partitioning and scheduling schemes provide a tractable mathematical framework that considers both the data transfer cost and computational costs for partitioning and scheduling data to multi-connected computing systems to minimize the execution time.
Application in data mining [31], multimedia, and video processing systems [32], [33] have been developed and deployed on different distributed computing platforms such as sensor networks [34], [35] and conventional cloud computing [36] using divisible data partitioning and scheduling schemes. However, to the best of our knowledge, no divisible load theory-based partitioning and scheduling schemes have been implemented or deployed in vehicular cloud computing.
In this study, we, therefore, propose and implement a divisible data partitioning scheduling scheme that considers the mobility and characteristics of the resources, such as the availability of the resource, the cost of the resource, and the credibility of the resources. Considering these characteristics of resources, three different task-based partitioning and scheduling schemes, namely, the expected completion timeaware, cost-aware, and reliability-aware, are proposed and implemented in this study.

III. THE UniDRM ARCHITECTURE
The UniDRM, as a unified framework, is designed to manage resources, application data, and communication network of the vehicular cloud. It can be deployed in the roadside cloud (layer-1) and the inter-vehicular cloud (layer-2) as shown in Figure 1. The roadside cloud consists of roadside infrastructure such as traffic lights and billboards equipped with highperformance resources such as sensors, storage, and computing units organized to provide cloud services. The roadside cloud controller manages the roadside cloud resources and registers vehicles, assigns, and manages the credibility score of vehicles in the RoI on the road. In addition, the UniDRM performs tasks such as service level agreement negotiations with clients, task data partitioning, and scheduling in the roadside cloud.
The inter-vehicular cloud layer consists of mobile vehicles in an RoI on a road with sufficient resources. The UniDRM organizes resources in the inter-vehicular cloud in the form of a pool to provide cloud services to clients as a single logical entity. Figure 1 is an illustration of the UniDRM deployed in a section of the road. The vehicles in a cluster are grouped into task-specific sub-clusters to execute different tasks.
The architecture of the UniDRM shown in Figure 2, consists of a data management layer, resource management, and network management layers. In the data management layer, requests from clients are accepted after negotiating the terms of the service level agreement. The task class is then determined at this layer to select the appropriate partitioning and scheduling scheme. Finally, resource management activities such as resource clustering, resource ranking, and selection of appropriate resources for task execution are carried out at the resource management layer.
The network management layer determines the channel capacity and network access delay values that are considered in the mathematical model to determine the data fractions for vehicles (further details are provided in Section VI). The network management layer also ensures successful data transmission by the vehicular nodes.
For brevity, this study focuses on the operations performed by the UniDRM in the inter-vehicular cloud layer, specifically the formation of federated vehicular cloud clusters, the partitioning, and scheduling of application data to resources in a cluster.

IV. THE FORMATION OF FEDERATED VEHICULAR CLOUD
The process of forming a federated vehicular cloud consists of organizing vehicles into resource-based clusters and selecting the most suitable vehicle as the cluster head.
Considering that various types of resources are embedded in vehicles, different resource-based clusters can be formed in an RoI on the road. However, this study provides details on the formation of only computation as a service (CaaS) and storage as a service (StaaS) clusters.
In the formation of the CaaS and StaaS clusters, willingness and credibility values were assigned to each vehicle in the network. The willingness flag is a binary value that indicates whether a vehicle agrees to be part of the federated vehicular cloud cluster.
The credibility value is assigned and managed by the cloud controller in the roadside cloud. This ensures that only trusted nodes join the resource pool of the federated vehicular cloud. Vehicle owners who perform unethical and malicious activities, such as abrupt termination of task execution, receive low credibility scores.
In this study, two different models, namely, the infrastructure-based model and the infrastructure-less model, are considered for the formation of vehicular cloud clusters. These models are based on the type of node that initiates the formation of vehicular cloud clusters.
In the infrastructure-based clustering model, a Road Side Unit (RSU) in an RoI initiates clustering by sending a join request message to all nodes in its transmission range. A node willing to become part of the cluster then replies with a response message containing its resource and mobility details. The response message consists of the direction of the vehicle, type of resource, the capacity of the resource, availability time, credibility value, and cost of the resource per unit time.
The RSU then organizes the vehicles that meet the resource capacity and credibility requirements (more than a given threshold) into clusters based on the type of resource and the vehicle's direction of travel. The node with the highest availability time is then selected as the cluster head.
The infrastructure-less cluster formation model is employed in vehicular environments with little or no RSUs, such as highways. In the infrastructure-less model, the cluster formation process is initiated by vehicles in an ROI through the exchange of messages. The procedure for forming the infrastructure-less cluster formation model is illustrated by the flowchart in Figure 3 and explained below.
A node that decides to join a federated vehicular cloud cluster (inquiry node) broadcasts a discovery message to find nodes in its transmission range. The discovery message consists of the mobility and resource properties of the node, namely direction, speed, current location, resource availability time, type, credibility, and capacity of the resource node.
Upon receiving the discovery message, unclustered nodes willing to become part of the federated vehicular cloud cluster then respond with an update message to establish a neighborhood association. If there are no cluster heads in the established neighborhood, nodes moving in the same direction and with credibility and capacity values more than a threshold are organized into StaaS and (or) CaaS clusters depending on the type of resources they declare. The node with the highest availability time is selected as the cluster head.
If the inquiry node has a cluster head moving in the same direction in its transmission range, the cluster head ascertains whether the inquiry node meets the cluster's resource credibility and capacity requirements. The inquiry node that meets the requirement is sent a join-accept message to update its cluster member status. The inquiry node that does not meet the cluster requirements is sent a join-reject message from the cluster head. If the inquiry node receives only join-reject messages or does not respond to its discovery message after a given wait period, it rebroadcasts discovery messages until it receives a response or exists the RoI.
After the cluster formation, the cluster head is then selected, which computes the expiration time of the communication link between it and all the cluster member nodes.
According to [37], the Link Expiration Time (LET i,j ) between the cluster head(i) and any cluster member node (j) in the same transmission range (r), can be calculated considering the velocities (v i , v j ), position coordinates (x i , y i ) and (x j , y j ), and the directions θ i and θ j (0 ≤ θ i , θ j < 2π ) of the vehicles i and j as shown in equation (1).
The computed (LET ) and other parameters of the vehicular resource are used to rank the resources to determine the lead node for a given task, as explained in more detail in the subsequent sections of this paper.

V. DATA PARTITIONING AND SCHEDULING
The user task data partitioning and scheduling process considered in this study is shown in Figure 4. It consists of user task admission, application task classification, resource ranking, data partitions determination, and data partitions to resources mapping.

A. USER TASK ADMISSION AND TASK CLASSIFICATION
A client who wants to access the federated vehicular cloud negotiates the terms of the Service Level Agreement (SLA). If the terms are accepted, the task of the client are admitted by the cloud controller at the task admission phase. During the SLA negotiations, the cloud controller also determines the class of the task based on the user requirements.
The classification of tasks plays a vital role in determining the type of partitioning and scheduling scheme used by the cluster head to meet the user requirements. This study considers three main tasks types, time-critical tasks, low-cost tasks, and high-security tasks. Based on these task types, the partitioning and scheduling schemes, time-aware, costaware, and reliability-aware, are proposed.

1) TIME-AWARE PARTITIONING AND SCHEDULING (TAPS)
Applications such as disaster warning systems, disaster evacuation modeling, and simulations are considered time-critical tasks. They have a relatively short expected execution time and, therefore, must be partitioned and scheduled to minimize the total execution time of the task.
To meet this requirement, the TAPS prioritizes computational capacity over cost and the credibility of resources during data partitioning and scheduling to ensure that the minimum execution time is achieved. For example, nodes with high computational capacity that are expected to remain in the cluster until their assigned task are completed are allocated larger portions of the data for execution.

2) COST-AWARE PARTITIONING AND SCHEDULING (CAPS)
The cost of renting resources from vehicle owners to form a federated vehicular cloud varies and affects the overall cost of the service offered. Generally, vehicular resources with a high VOLUME 9, 2021 level of credibility have higher costs. For example, resources from public transport may be considered more trustworthy and, therefore, have a higher cost than those from private third-party vehicles.
The CAPS scheme assigns larger data chunks to resources with low cost for a user task that requires a lower execution cost.

3) RELIABILITY-AWARE PARTITIONING AND SCHEDULING (RAPS)
The distributed ownership of resources in the vehicular cloud poses many security risks in executing applications such as personal health information systems.
Through the RAPS scheme, some security issues in the vehicular cloud can be addressed by ensuring that reliable vehicles are more involved in task execution. As a result, resources with higher credibility values are allocated larger percentages of data during data partitioning and scheduling.
Although the TAPS, CAPS, and RAPS schemes prioritize one criterion over the others in data partitioning and scheduling, it is desirable to consider other resource parameters to achieve optimal results for the other criteria. Therefore, the TOPSIS ranking and selection technique is used to rank resources so that high-ranked resources have optimal values for all the criteria under consideration.

B. TASK BASED RESOURCE RANKING
The Technique for Order of Preference by Similarity to Ideal Solution (TOPSIS) is a multi-criteria decision analysis technique used to evaluate multiple conflicting criteria. It was proposed by [38] to select the option closest to the ideal best solution (best ideal alternative) and the farthest from the ideal negative solution (worst ideal alternative).
In the literature, TOPSIS has been considered to make decisions in cloud computing, such as the selection of cloud service providers [39] due to its simplicity and computational efficiency. It consists of the following steps.
Step 1: Establish a decision matrix for the criteria and alternatives.
Step 2: Normalize the decision matrix.
Step 4: Identify ideal positive and negative solutions.
Step 5: Compute the separation measures for each alternative.
Step 6: Calculate the relative closeness to the ideal solution.
Step 7: Rank the alternatives according to their closeness coefficients. In Step 1, the performance or decision matrix X ij is constructed based on the number of resources in a cluster N and the attributes of the resources M . The impact of the attributes is also determined and optimized: benefit criteria are maximized while the cost criteria are minimized. The attributes of resources considered are the link expiration time between nodes and cluster head (LET ), the unit cost of the resource, the credibility value of the resource and the reciprocal computational capacity of the resources.
The decision matrix is then normalized in step 2 using equation (4) In step 3, each attribute is assigned a weight value W j = [w 1 , w 2 , w 2 , w 4 ] based on the partitioning and scheduling scheme under consideration. The weights of the attributes for the TAPS, CAPS and RAPS schemes are listed in Table 1. The sum of the weights of the attributes for each partitioning and scheduling scheme is one ( n i=1 W j = 1). It is worth noting that the values of the assigned weights are empirical values that serve as case study input to the algorithm. A system designer or the cloud controller may define a new Data Partitioning and Scheduling Scheme (DPSS) by specifying different weight values for the parameters.
The weighted normalized matrix is then computed according to equation (5).
In determining the ideal best solution in step 4, the largest weighted attribute value is selected if the relevant attribute values are maximized, while the smallest weighted attribute value is selected if the relevant attribute is minimized as depicted in equation 7.
For the ideal worst solution, the smallest weighted attribute value is selected if the relevant attributes are maximized and the largest value is selected if the relevant evaluation attribute is minimized according to the equations 8.
The Euclidean distance between each alternative resources and the ideal best solution is computed according to in equation (9) and the distance between resources and the ideal worst solution is also computed according to equation (10).
The TOPSIS performance score (closeness coefficient) for each resource was then computed considering equation (11).
Based on the performance score, the resources are ranked in descending order and the highest-ranked node is then selected as the lead node.

C. DATA PARTITION DETERMINATION AND DATA MAPPING
The data partitioning and scheduling scheme was modeled as a one-round scheme in this study. The data of an application is partitioned and scheduled for resources in a time frame.
The data to be executed by the federated vehicular cloud are sent from the cluster head to the lead node. The lead node, acting as the load originator for the cluster, partitions the data into α 0 , α 1 , . . . , α n units for vehicles V 0 , V 1 . . . , V n .
For each data chunk α i , it takes Z i α i time units to be sent from the lead node to the cluster members. The value of Z i is the inverse data transmission rate (T ) of the communication channel (Z i = 1 T ). Similarly, for a vehicle V i with reciprocal computation capacity ω i , the time required to execute a data chuck α i is ω i α i . The reciprocal computation capacity is the time it takes a vehicle node to execute a unit of data of a given task.
The following conditions and assumptions were considered in modeling the data partition and scheduling scheme.
1) The processing units of the vehicles are not equipped with front-end processors; therefore, sending and computing the data partitions do not occur simultaneously. The vehicles can either process, send, or receive data partitions at a time. 2) Since the communication among nodes is carried out in a single channel, the transmission of data chunks from the lead node is carried out sequentially. Similarly, the other nodes also transfer their processed data chunks sequentially to the lead node.
3) The order of reception of data partition from the lead node is based on the TOPSIS rank of the node.
Higher-ranked nodes receive their data partitions before the low-ranked nodes. However, the transmission of processed data chunks is a reversed order of data reception (FILO). 4) The size of each task data chunk before processing is the same as after processing, and the sum of all data chunks assigned to vehicular nodes is one (normalized) that is, n i=0 α i = 1. 5) The lead node vehicle starts processing its partition after distributing chunks of task data to all the other nodes while the cluster members start processing their partitions after receiving their data chunks. 6) For all vehicles other than the lead node, the reception, processing, and submission time of the data is within the execution time of the preceding vehicle.
Given these assumptions, the timing diagram in Figure 5 was produced for the partitioning and scheduling scheme. From the timing diagram, the processing time is defined as the time taken to perform operations on data (αω) while the execution time is the sum of the processing time and communication time (Z α i + α i ω i + Z α i ). Based on the assumptions and features of the timing diagram, we determined the partitions for each node using an analytical model consisting of the following system equations.
The times required by the lead node EX (V 0 ) and cluster members EX (V i ) to complete the execution of their assigned task are given by equation (13) and equation (14).
From the timing diagram, the processing time of the lead vehicle V 0 and the last vehicle V n are the same and can be expressed as follows: Thus, the data chunk α n can be expressed in terms of the data chunk α 0 in equation (16) as.
The execution time for all nodes except the lead node can be expressed in terms of the processing time and communication times of their successor nodes as follows. (17) For each node except the lead node and the last node, its partition can be expressed in terms of that of its successor VOLUME 9, 2021 node as: Equation (18) can now be written as: From equation (20) the partition for the (n − 1) th node can be written as Based on equations (16) and (20) the partitions for all vehicles can be expressed in terms of α 0 through the recursive equations: Given that all data chunks sum up to one (equation (23) and equation (24)) and each data chunk can be expressed in terms of α 0 , a closed-form equation can be derived to compute the data partition α 0 of the lead node as shown in equation (25).
The partitions of all other vehicles are then obtained by substituting α 0 into the set of equations in (22).
After the partitions of the vehicles are determined, they are mapped to the ordered resources by distributing them sequentially by the lead node.

VI. SYSTEM SIMULATION
The simulation of the UniDRM was performed using the Omnet++ [40] simulator and the VANET simulation framework Veins [41]. The parameters of vehicular nodes in the Veins simulator were modified to include the willingness flag, credibility value, cost, type, and resource capacity. Realistic mobility and traffic scenarios were also modeled using the mobility simulator SUMO [42] and map data from OpenStreet [43].
To analyze the performance of the proposed UniDRM framework, we conducted simulations for all phases of the UniDRM framework: resource management, data management, and communication channel management phases.
In the resource management phase, resource-based clustering and task-based resource ranking were implemented. Although the cluster formation schemes proposed for the UniDRM cover the formation of StaaS and CaaS clusters, as described in Section IV, for simplicity, we considered the infrastructure-less CaaS clusters for implementation of the data management phase of the UniDRM framework. In the data management phase, the data partitioning and scheduling schemes are implemented. For the data partitioning and scheduling simulation, we considered a merge sort application as it reflects one of the key assumptions of the data partitioning and scheduling scheme; the size of data fraction before execution and after execution is the same. The application consists of a file of a large dataset partitioned and distributed among the nodes of the cluster. The partitioned data chunks were then sorted by the cluster members and the results were sent to the lead node, which merged all the results.
The communication channel management phase of the simulation focused on modeling the channel access delay and incorporating it into the data partitioning and scheduling schemes. Three different network congestion scenarios were considered in the simulation, namely, low network traffic congestion (LCS), medium network traffic congestion (MCS), and high network traffic congestion (HCS).
Several simulation runs were carried out according to the system parameters presented in Table 2, and the average results of each scenario were analyzed and presented.

A. ANALYSIS OF SIMULATION RESULTS
From the preliminary simulation results, it was observed that the execution time of the partitioning and scheduling scheme varied under network traffic congestion when the same resources (with unchanged parameters) were used for the task execution.
In the ideal network traffic scenario where only data partitions were transmitted during the scheduling phase of the simulation, the expected execution time was achieved as there were no channel congestion and channel access delays during data chunks transmission.
However, in the instances where different types of communication from both cluster and non-cluster members took place during the data scheduling phase of the simulation (in LCS, MCS, and HCS), the execution times were higher than that of the ideal network traffic scenario. This was due to the high channel access delay caused by channel congestion during the transmission of the data chunks.
As shown in Figure 6a and Figure 6b, the execution times for the ideal network traffic scenario and the HCS are 0.587s and 8.883s, respectively (the execution times in this study are reported as a fraction of the total data size). It was also observed that the channel access delay yielded unforeseen internal idle times in the scheduling scheme.
Although an ideal network traffic scenario is desirable, it is not realistic in a vehicular environment because of the large number of vehicles and devices that communicate in a vehicular environment at any given time. The transmission of data chunks in a vehicular environment experiences channel access delays, leading to an increase in the expected execution time of tasks.
To address this challenge and reduce the internal idle times, we reformulated the communication parameter (Z ) in the analytical model in Section V to account for the channel access delay. The expression for Z now becomes λ i + 1 T . Where λ i is the communication channel access delay.
Using the modified communication parameter Z in the partitioning and scheduling scheme produced a better execution time when subjected to the same high network traffic congestion (HCS). As shown in Figure 6c, the execution time of the scheduling scheme with the modified communication parameter is 0.741s. Although this value is higher than that of the ideal scenario (without delay), when subjected to the same network traffic congestion scenario, the execution time decreased from 8.883s in Figure 6b to 0.754s in Figure 6d which corresponds to a percentage reduction in execution time of approximately 14.6%.
Because the value of the communication channel access delay is dependent on the network condition, such as the number of vehicles communicating in the same transmission range, the actual values cannot be determined before the computation of data chunks for vehicles. Therefore, a predictive approach based on the results from different simulation scenarios is considered in this study to determine the delay values at different time instances of the simulation.

B. THE DISTRIBUTION OF ACCESS DELAY FACTOR (λ)
Similar to the studies in [26], the results of our simulations show that non-deterministic features such as data transmission failure and channel access delay λ in a vehicular environment can be considered as monotonically increasing or monotonically decreasing functions.
The channel access delay values increased monotonically in the initial simulation stage as the vehicles entered the simulation. This is due to the exchange of messages to establish neighborhood associations.
After all the vehicles entered the simulation, the channel access delay values appeared to be constant and then decreased monotonically to the end of the simulation as the vehicles left the simulation network.
This behavior of the channel access delay during the simulation can be represented as a sigmoid function. Therefore, to predict the channel access delay values, we considered the Boltzmann sigmoid distribution shown in Figure 7 to model the characteristics of our simulation network scenario. The values of the Boltzmann sigmoidal function values were obtained using the following equations where Equation (26) and equation (27) are the sigmoidal monotonically increasing and sigmoidal monotonically VOLUME 9, 2021  The network traffic congestion scenarios, the HCS, MCS, and LCS considered in the simulation were generated with the following values for the Boltzmann sigmoid distribution. The minimum delay value d 1 , and slope, y, for all the congestion scenarios were 0.002s and 6. The maximum delay value, d 2 , for LCS, MCS, and HCS are 0.035s, 0.053s, and 0.074s respectively. The distribution curves of the access delay for the three congestion scenarios obtained with the generated values are shown in Figure 8a.
Although the Boltzmann sigmoidal model is considered in this study, due to the modularity of our system, other models that reflect the characteristics of vehicular environment channel access can be considered for predicting the channel access delay values.
Subsequent simulations were carried out with the modified communication variable Z , which included predicted channel access delay values for different network congestion scenarios.

C. PERFORMANCE EVALUATION
For the performance evaluation, we considered fully homogeneous and fully heterogeneous cluster scenarios. The fully homogeneous scenario was modeled on the assumption that all resources in a cluster have the same parameter values (i.e., cost, credibility, computation capacity, data transfer rate) and are available throughout the task execution.
In the performance analysis of the fully homogeneous scenario, the metrics, execution time, speedup, and parallel execution efficiency were considered.
The execution time is the total time required by the lead node to distribute data chucks and aggregate the results from all cluster members after processing their data chunks. Figure 8b shows the execution time for a varying number of vehicular processors. In the figure, three performance curves with different network traffic congestion scenarios are presented. It can be deduced that the execution time decreases as the number of vehicle processors involved in task execution increases. Furthermore, the curves become steeper and converge as more processing units are involved in task execution. This implies that the execution times do not improve significantly beyond a certain number of processing nodes. For this simulation instance, after approximately 18 vehicular nodes in the LCS, 17 nodes in MCS, and 15 vehicular nodes in the HCS, there is not much improvement in the execution time.
The speedup and efficiency are two essential metrics used to determine the performance of all parallel processing systems. Speedup is the ratio of the execution time of a single vehicular processor to the number of vehicular processes involved in the partitioning and scheduling scheme. The parallel efficiency is the ratio between the speedup and the number of vehicular processors considered. Figure 8c and Figure 8d show the speedup and efficiency values obtained with an increasing number of vehicular nodes considering the low congested network scenario (LCS), medium congested network scenarios (MCS), and the high congested network scenarios (HCS). It can be observed that the speedup values generally increase up to a certain point and then flatten out, while the parallel efficiency curves also decrease until they flatten out. This is expected because the addition of more vehicles does not contribute to the execution time. However, the speedup and efficiency values for the LCS were always better than those of the MCS and HCS. This is because LSC has the lowest delay in data transmission. VOLUME 9, 2021 The speedup and parallel efficiency values can be used to determine the optimal number of vehicular nodes to be included in the data partitioning and scheduling for a given task. For example, to achieve a speedup value of 2.0 required 4 (four) vehicular nodes in the HCS and 2 (two) vehicular nodes in the LCS and MCS, respectively.
As a fully homogeneous scenario may not be realistic in a vehicular environment, we also considered a fully heterogeneous scenario where all the properties of the vehicular resources vary to implement the task-based data partitioning and scheduling schemes (TAPS, CAPS, and RAPS).
The parameters of the resources in the cluster considered for task execution are shown in Table 3. This includes the cost per unit time usage, credibility value, and inverse computational capacity per unit data. In addition, the ranks of the resources based on the type of partitioning scheme are also shown in the table. The performance metrics considered are the execution time, the total cost of all resource usage, average reliability score, and resource efficiency.
The cost of each resource usage is calculated as the product of the execution time and the cost per unit time usage of the resource. The total cost for all resources used in the schedule is calculated as follows, (28) where V 0 denotes the lead node and j = 1, 2, 3 . . . n denote all nodes other than the lead node.
The reliability score for each resource usage is also computed as the product of the data partition assigned to a node (α j ) and the credibility value of the node (Cred j ). The average reliability score (AvReScore) is then computed in (29) as the total reliability score of all resources divided by the number of resources used for task execution (n).
The results of the TAPS, CAPS, and RAPS schemes showing the percentages of data partitions for each node are shown in Tables 4, 5 and 6. The vehicular nodes in the table  are ordered based on their TOPSIS scores, where the first node serves as the lead node, and the other nodes are the cluster members. It can be observed that the percentage of partitions of cluster member nodes decreases as the ranks of resources decrease. This ensures that highly ranked resources are assigned larger percentages of data.
The execution time, total cost, average reliability score, and parallel efficiency for the different task-based partitioning and scheduling schemes shown in the tables are compared using the graphs in Figures 9, 10, 11, and 12.
From the execution time graph in Figure 9, the values for the curves decrease and converge as more nodes are considered for task execution. The rate of decrease depends on the processor-to-channel capacity ratio (i.e., the ratio between the reciprocal processor speed (ω) and the inverse data transfer rate of the channel Z ). Thus, when nodes involved in the task execution have a larger processor-channel capacity ratio, the execution time decreases steadily.  However, nodes with a lower processor-channel capacity ratio negatively affect the execution time. It may either decrease insignificantly, remain unchanged, or increase. Nodes with a processor-channel capacity ratio less than 2 (two) lead to an increase in the execution time as their data transfer times exceed their data processing times. Such nodes can be excluded from the task execution to achieve optimal execution time. It can be observed that the TAPS performs better than the RAPS and CAPS as expected.
For the total cost of resource utilization shown in Figure 10, the values generally increase when more nodes are included in the task execution. As expected, the CAPS scheme has relatively lower cost compared to TAPS and RAPS as resources that are assigned large data percentage are relatively cheaper. For the average reliability score shown in Figure 11, the performance curves decrease when more resources are involved in the task execution since the resources with higher credibility scores receive a larger share of data. Therefore, the average reliability scores decrease when more nodes with lower credibility values are considered. It was observed that the average reliability score of the RAPS was relatively higher than those of the TAPS and CAPS. However, the scores of  TAPS are closer to those of RAPS since the weight assigned to the credibility attribute of resources in the TOPSIS ranking for TAPS is higher than that of CAPS.
The parallel efficiency of the task-based partitioning and scheduling schemes are compared in Figure 12. In general, the efficiency values of TAPS are better than those of RAPS and CAPS as the execution time is considered in the computation of the efficiency values being the lowest in the TAPS.
In all task-based scheduling schemes, the appropriate number of vehicles for task execution can be selected based on the execution time, total resource cost, average reliability score, and efficiency values required by the client.

VII. CONCLUSION
In this study, we proposed the UniDRM, a unified data and resource management framework for partitioning and scheduling divisible loads in a federated vehicular cloud. The framework organizes vehicles in a region of interest with sufficient resources into clusters based on the type of resource available in the vehicle and their mobility parameters to collaborate as a single unit to execute a task.
To execute a computationally intensive task, the UniDRM framework partitions data into non-dependent fragments and distributes them to selected vehicles in the cluster for parallel execution.
Mathematical models were used to determine partitions for each vehicle. Three different partitioning and scheduling schemes: expected completion time-aware, cost-aware, and reliability-aware, were proposed to obtain optimal execution times, total costs, and average reliability values for the task execution. Realistic simulations of the modeled schemes were performed by considering the dynamic characteristics of the vehicle resources.
Although the UniDRM is proposed for federated vehicular cloud, some of the applied techniques, such as resource ranking for data partitioning, can be used for other distributed computing platforms such as mobile cloud computing.
Future works will examine the effect of clustering issues such as the cluster head change rate and cluster member exit rate on the proposed scheduling schemes. Data partitioning and scheduling in multi-clusters, where two or more clusters execute a single application with multiple data instances in a vehicular environment, will also be investigated.