Abstract:
Geo-tagged time series air pollution data have been produced as a result of the widespread deployment of government setup pollution monitoring stations. Data at multiple ...Show MoreMetadata
Abstract:
Geo-tagged time series air pollution data have been produced as a result of the widespread deployment of government setup pollution monitoring stations. Data at multiple sites/stations can be lost at different unanticipated times due to a number of problems such equipment failures, power outages, etc. In this context, this study made an effort to explore the most effective site selection technique and predictive model that could be applied to the task of imputation of missing values from multi-site time series. Specifically, three site selection methodologies (e.g., radius-based, Kullback-Leibler (KL) divergence-based, and cluster-based) along with a LightGBM-based predictive model are explored for estimating missing values. The performance is evaluated based on real-world Ozone dataset (containing 2,75,616 samples) of Delhi, India. The proposed approach provides better predictive accuracy (\text{MAE}=7.39, \text{RMSE}=10.17, correlation coefi-cient =0.89, \mathrm{R}^{2}=0.79) compared to several baselines.
Published in: 2022 IEEE Calcutta Conference (CALCON)
Date of Conference: 10-11 December 2022
Date Added to IEEE Xplore: 13 March 2023
ISBN Information: