Abstract:
Pre-trained Language Models bring about an in-creasing computational and memory cost. The recently proposed computation-flexible BERT models facilitate their deployment i...Show MoreMetadata
Abstract:
Pre-trained Language Models bring about an in-creasing computational and memory cost. The recently proposed computation-flexible BERT models facilitate their deployment in varied computational environments. Training such flexible BERT models involves jointly optimizing multiple BERT subnets that inevitably interfere with one another. Besides, the performance of large sub nets is limited when there is a significant performance gap between the smallest sub net and the supernet, despite methods managing to enhance the smaller subnets. We propose layer-wise Neural grafting to boost BERT subnets, particularly the larger ones. The proposed method improves the average performance of BERT sub nets on six out of eight GLUE tasks. Furthermore, we build a flexible BERT model that enables practical width- and depth-dynamic inference regarding different inputs by combining width-dynamic gating modules and early exit off-ramps in the depth dimension. Experimental results demonstrate that the proposed framework achieves a better dynamic inference range than other methods in the trade-off between performance and computational complexity on four GLUE tasks and the SQuAD data set. Our optimal-tradeoff inference result, in particular, outperforms related fixed-size models with comparable computational complexity. Compared to the supernet, BERT-Base, this inference result improves the average GLUE score and Fl score on SQuAD by 1.3 and 2.2 absolute points, respectively, and decreases computations by around 45%.
Date of Conference: 18-23 June 2023
Date Added to IEEE Xplore: 02 August 2023
ISBN Information: