Abstract:
In this paper, we explored the effects of BERT pre-training objectives on BERT's success. Various pre-training objectives have been applied to BERT and evaluated their su...Show MoreMetadata
Abstract:
In this paper, we explored the effects of BERT pre-training objectives on BERT's success. Various pre-training objectives have been applied to BERT and evaluated their success on downstream tasks such as text classification, sentiment analysis, masked word prediction and NER. Also, a new pretraining objective called SSP (Same Sentence Prediction) has been introduced. SSP predicts whether two segments are from the same sentence. Segments are created by splitting sentences into half. 50% of the first segments are replaced by a token sequence with the same length from the beginning of another sentence. The rest of the first parts remain unchanged. The main advantage of the proposed new pre-training task over NSP and SOP is that it uses the dataset more efficiently and creates more training input from a dataset. Thus, models can be trained with more training steps than NSP and SOP. Models trained with SSP and SOP achieved better results than models using NSP. This showed that using NSP in training is not sufficient and that the model can achieve better results by using other tasks. In addition, the models with SSP got close results to the models with SOP, while they outperformed the models with SOP in masked word prediction. This indicated that SSP could be an alternative to SOP. It has been observed that the auxiliary tasks to MLM used in the training phase do not sufficiently increase the success in sentiment analysis. It did not affect sentiment analysis as much as it affected success in text classification, named entity detection and masked word prediction. It has been shown that other tasks should be used in training to increase the success of sentiment analysis tasks.
Date of Conference: 06-08 October 2021
Date Added to IEEE Xplore: 18 November 2021
ISBN Information: