Abstract:
Unsupervised sentence embedding methods based on contrastive learning have gained attention for effectively representing sentences in natural language processing. Retriev...Show MoreMetadata
Abstract:
Unsupervised sentence embedding methods based on contrastive learning have gained attention for effectively representing sentences in natural language processing. Retrieving additional samples via a nearest-neighbor approach can enhance the model’s ability to learn relevant semantics and distinguish sentences. However, previous related research mainly focused on retrieving neighboring samples within a single batch range or global range, which makes the model possibly unable to capture effective semantic information or incurs excessive time cost. Furthermore, previous methods use retrieved neighbor samples as hard negatives. We argue that nearest neighbor samples contain relevant semantic information, and treating them as hard negatives risks losing valuable semantic knowledge. In this work, we introduce Neighbor Contrastive learning for unsupervised Sentence Embeddings(NCSE), which combines contrastive learning with the nearest-neighbor approach. Specifically, we create a candidate set to store sentence embeddings across multiple batches. Retrieving the candidate set can ensure sufficient samples, making it easier for the model to learn relevant semantics. Using retrieved nearest neighbor samples as positives and applying the self-attention mechanism to aggregate the sample and its neighbors encourages the model to learn relevant semantics from multiple neighbors. Experiments on the semantic text similarity task demonstrate our method’s effectiveness in sentence embedding learning.
Date of Conference: 30 June 2024 - 05 July 2024
Date Added to IEEE Xplore: 09 September 2024
ISBN Information: