I. Introduction
It is widely recognized that federated learning (FL) can provide cooperation and data privacy preservation in many fields, such as medical and industrial cyber-physical systems [1], [2]. Vehicular ad hoc networks (VANETs), serving as a significant role in the intelligent transportation systems (ITSs), are expected to endow extensive benefits to ITSs, such as high traffic efficiency, in-vehicle entertainment, autonomous vehicle safety, and, by extension, road safety [3]. In particular, many data-driven cooperative learning applications in VANETs, e.g., trajectory prediction, steering angle prediction, pedestrian behavior prediction, etc., are promising directions of achieving above-mentioned benefits [4], [5]. However, such approaches rely heavily on central computation with cooperating users’ driving data, which may lead to privacy leakage and heavy central computation overhead. Specifically, existing studies show that a driver’s private information may be exposed if the whereabouts and driving pattern of a vehicle can be tracked [6]. Though several methods such as naive data anonymization [7], differential privacy [8], and dataset distillation [9] have been proposed to preserve data privacy, they may compromise learning model performance and computational efficiency. In addition, it is intuitive that the computation overhead would be heavy if cooperative learning tasks are conducted with a series of data uploaded by intensive cooperating participants. Therefore, employing FL in VANETs would be a promising direction for achieving cooperative data-driven approaches with desired privacy preservation. In FL, participants’ data is stored and fed into the learning model locally, thus realizing data privacy preservation and lowering central computation. Recent years have witnessed an increasing interest in applying FL into internet of vehicle (IoV) [10], [11], unfortunately, most existing FL studies are designed for conventional center-clients structures, which could cause redundant communication and compromised model accuracy with non-independent and identically distributed (Non-IID) data when applied in VANETs.