Abstract:
The transformer architecture based on self-attention offers a versatile structure which has led to the definition of multiple deep learning models for various tasks or ap...Show MoreMetadata
Abstract:
The transformer architecture based on self-attention offers a versatile structure which has led to the definition of multiple deep learning models for various tasks or applications of natural language processing. The purpose of this work is to analyze two language models for training bidirectional encoders like BERT: the Masked Language Model (MLM) and the Conditional Masked Language Model (CMLM) for learning sentence embeddings. How sentence-level representations impact the task of sequence classification is the main focus of interest in our investigation: is there any significant difference in quality between these two pretrained language models? We evaluate via fine-tuning these pretrained models on a downstream task as sequence classification.
Published in: 2022 International Conference on Mechatronics, Electronics and Automotive Engineering (ICMEAE)
Date of Conference: 05-09 December 2022
Date Added to IEEE Xplore: 01 February 2024
ISBN Information: