Abstract:
Multitask learning helps to obtain a meaningful representation of the data, retaining a small number of parameters needed to train the model. In natural language processi...Show MoreMetadata
Abstract:
Multitask learning helps to obtain a meaningful representation of the data, retaining a small number of parameters needed to train the model. In natural language processing, models often reach as many as a few hundred million trainable parameters, which makes adaptations to new tasks computationally infeasible. Creating shareable layers for multiple tasks allows to spare resources and often leads to better data representation. In this work, we propose a new approach to train a BERT model in a multitask setup, which we call EmBERT. To introduce information about the task, we inject task-specific embeddings to the multi-head attention layers. Our modified architecture requires a minimal number of additional parameters relative to the original BERT model (+0.025% per task) while achieving state-of-the-art results in the GLUE benchmark.
Date of Conference: 18-22 July 2021
Date Added to IEEE Xplore: 20 September 2021
ISBN Information: