Abstract:
The task of complexity-controllable definition generation refers to providing definitions with different readability for words in specific contexts. This task can be util...Show MoreMetadata
Abstract:
The task of complexity-controllable definition generation refers to providing definitions with different readability for words in specific contexts. This task can be utilized to help language learners eliminate reading barriers and facilitate language acquisition. However, the available training data for this task remains scarce due to the difficulty of obtaining reliable definition data and the high cost of data standardization. To tackle those challenges, we introduce a general solution from both the data-driven and method-driven perspectives. We construct a large-scale standard Chinese dataset, COMPILING, which contains both difficult and simple definitions and can serve as a benchmark for future research. Besides, we propose a multitasking framework SimpDefiner for unsupervised controllable definition generation. By designing a parameter-sharing scheme between two decoders, the framework can extract the complexity information from the non-parallel corpus. Moreover, we propose the SimpDefiner guided prompting (SGP) method, where simple definitions generated by SimpDefiner are utilized to construct prompts for GPT-4, hence obtaining more realistic and contextually appropriate definitions. The results demonstrate SimpDefiner's outstanding ability to achieve controllable generation and better results could be achieved when GPT-4 is incorporated.
Published in: IEEE Transactions on Big Data ( Early Access )