I. Introduction
Sequence-to-sequence (seq2seq) pretrained language models (PLMs) [1], [2], [3], [4], [5] are widely used in the community of natural language processing and have achieved remarkable success in numerous downstream tasks of both natural language generation (NLG) and understanding (NLU), such as machine translation [2], [6], [7], text summarization [5], [8], grammatical error correction [9] and other discriminative tasks [3], [10], [11]. Specifically, seq2seq models are generally implemented with an encoder-decoder framework [12], where the encoder models the input sentence first and then the decoder generates the output tokens auto-regressively conditioned on the representation of encoder.