Abstract:
Previous research shows that the domain of the training data has a large impact on the performance of the downstream tasks. Selecting data from an appropriate domain lead...Show MoreMetadata
Abstract:
Previous research shows that the domain of the training data has a large impact on the performance of the downstream tasks. Selecting data from an appropriate domain leads to improvements on the performance. Using text classification can help discriminate the data which belong to different domains. In this paper, we use a text classification method to select data from a particular domain (task-specific target domain). We experiment with different sizes of target domain corpus to explore the effect of the method. A pretrained RoBERTa model is adapted to the target domain corpus using the selected data prior to training the model on the downstream tasks. Our experiments show that using a simple domain classifier to select a small dataset to adapt the model can help stabilize the performance of downstream tasks.
Date of Conference: 25-27 March 2022
Date Added to IEEE Xplore: 19 September 2022
ISBN Information: