Abstract:
With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has...Show MoreMetadata
Abstract:
With the growth of the academic engines, the mining and analysis acquisition of massive researcher data, such as collaborator recommendation and researcher retrieval, has become indispensable for improving the quality and intelligence of services. However, most of the existing studies for researcher data mining focus on a single task for a particular application scenario and learning a task-specific model, which is usually unable to transfer to out-of-scope tasks. In this paper, we propose a multi-task self-supervised learning-based researcher data pre-training model named RPT, which is efficient to accomplish multiple researcher data mining tasks. Specifically, we divide the researchers’ data into semantic document sets and community graph. We design the hierarchical Transformer and the local community encoder to capture information from the two categories of data, respectively. Then, we propose three self-supervised learning objectives to train the whole model. For RPT’s main task, we leverage contrastive learning to discriminate whether these captured two kinds of information belong to the same researcher. In addition, two auxiliary tasks, named hierarchical masked language model and community relation prediction for extracting semantic and community information, are integrated to improve pre-training. Finally, we also propose two transfer modes of RPT for fine-tuning in different scenarios. We conduct extensive experiments to evaluate RPT, results on three downstream tasks verify the effectiveness of pre-training for researcher data mining.
Published in: IEEE Transactions on Big Data ( Volume: 9, Issue: 1, 01 February 2023)