Abstract:
In the current data-driven era, large volumes of data are generated and collected at a rapid rate. Examples of these big data include transportation data (e.g., public tr...Show MoreMetadata
Abstract:
In the current data-driven era, large volumes of data are generated and collected at a rapid rate. Examples of these big data include transportation data (e.g., public transit data). Integration of different transportation data, as well as reuse of past knowledge and information on public transit, can be for social good (e.g., can help improve public transit services for bus riders). To elaborate, bus riders wish to have a precise and accurate schedule for their transit system. On-time bus arrival and departure are desirable as an early departure or late arrival of bus may lead to rider inconvenience. To achieve this goal, we present in this paper a regression-based data science solution for transportation analytics. It integrates heterogeneous data regarding bus stops, bus arrival times, road networks, traffic counts, construction sites, lane closures, etc. It reuses past knowledge and information discovered from historical data for handling future situations. Evaluation on real-life transportation data from a Canadian city of Winnipeg shows that our regression-based data science solution led to a high R2 score. It demonstrates the practicality of our solution in transportation analytics and bus arrival time prediction, as well as the benefits of data integration and information (and knowledge) reuse. Moreover, it is important to note that, although we illustrate our solution on Winnipeg transit data, our solution is expected to be reusable for transportation analytics at other locations.
Published in: 2022 IEEE 23rd International Conference on Information Reuse and Integration for Data Science (IRI)
Date of Conference: 09-11 August 2022
Date Added to IEEE Xplore: 08 September 2022
ISBN Information: