An Autonomous Data Collection Pipeline for Online Time-Sync Comments | IEEE Conference Publication | IEEE Xplore

An Autonomous Data Collection Pipeline for Online Time-Sync Comments


Abstract:

Time-Sync Comments (TSCs) are a sequence of comments associated with video contents at each timestamp. By applying textural analysis, researchers can transform the TSCs i...Show More

Abstract:

Time-Sync Comments (TSCs) are a sequence of comments associated with video contents at each timestamp. By applying textural analysis, researchers can transform the TSCs into labels that represent the semantic meaning of the original video content. Multiple studies have used the TSCs in video segmentation and tagging. TSCs can be either created by a single user or generated by various users. Thanks to the exploding of multimedia platforms, online comments have proved to be an efficient TSCs data source in multiple research since 2014. However, previous TSCs studies mainly focused on data sources targeting young non-English speaking audiences, potentially introducing data bias due to limited geographic regions and groups. This paper aims to solve this problem by proposing a universal data collection framework of TSCs generated by audiences worldwide from popular social media platforms. We first introduced an efficient data mining strategy for gathering such TSCs data in general. Then, we demonstrated how to build the autonomous pipeline and collected two large-scale TSCs datasets with different sets of keywords, namely LST-YF20 and LST-YT1000, directly from YouTube Lives. We also conducted an extensive experiment on the efficiency of our data pipeline with a group of fixed keywords. The result of our investigation suggests that our data pipeline could efficiently produce high-quality TSCs datasets while keeping a constrained budget. We believe our framework could further contribute to future research in the multimedia field.
Date of Conference: 27 June 2022 - 01 July 2022
Date Added to IEEE Xplore: 10 August 2022
ISBN Information:
Print on Demand(PoD) ISSN: 0730-3157
Conference Location: Los Alamitos, CA, USA

Contact IEEE to Subscribe

References

References is not available for this document.