Abstract:
In recent years, model compression techniques have greatly advanced the use of large-scale language models in industry. However, the structure of existing compressed mode...Show MoreMetadata
Abstract:
In recent years, model compression techniques have greatly advanced the use of large-scale language models in industry. However, the structure of existing compressed models mostly relies on manual design, which may result in loss or overfitting of real scene data distributions. In addition, the existing compression model is usually only effective for a single task, but in real scenarios, multiple tasks are often complex and changeable or appear at the same time, so the practical role of the single-task compression model is very limited. In this work we propose AutoDistiller, an automatic compression method for multi-task language models. We build a meta-network that generates the structure of the compression model, including encoder units generated based on Transformer layer sampling method and Bernoulli distributed sampling training method. Our automatic distiller implements the process of learning data distribution on multiple tasks from large-scale language models and transferring knowledge to a compressed model end-to-end. Preliminary results from fine-tuning text categorization tasks suggest that AutoDistiller can help automate the compression of large-scale language models.
Published in: 2022 41st Chinese Control Conference (CCC)
Date of Conference: 25-27 July 2022
Date Added to IEEE Xplore: 11 October 2022
ISBN Information: