Efficient Long Document Ranking via Adaptive Token Pruning with Query-Document Alignment | IEEE Conference Publication | IEEE Xplore

Efficient Long Document Ranking via Adaptive Token Pruning with Query-Document Alignment


Abstract:

Transformer-based models have achieved great success in document ranking, yet they suffer from substantial computational costs due to the quadratic complexity of attentio...Show More

Abstract:

Transformer-based models have achieved great success in document ranking, yet they suffer from substantial computational costs due to the quadratic complexity of attention, particularly for Long Document Ranking (LDR). Token pruning is a promising approach to reducing computational costs, while existing methods have largely overlooked the interaction and alignment between document and query for guiding the token pruning process, which may result in mistakenly pruned tokens that are critical for the query but not for the document. Additionally, these methods often lack the flexibility required for varying input samples. To this end, we propose a novel framework, Adaptive Token Pruning with Query-Document Alignment (QD-ATP) for accelerating LDR. Specifically, we first introduce a well-designed Query-Document Alignment Guidance (QDAG) module to effectively align the semantic and matching information between query and document, ensuring that the pruned tokens are less important for both two fields. Furthermore, we design a novel Adaptive Token Pruning (ATP) module, which can dynamically adjust pruning ratios based on different input samples. Experimental results on three benchmark datasets demonstrate that QD-ATP can achieve up to a 7.3× latency speedup while preserving competitive performance.
Date of Conference: 06-11 April 2025
Date Added to IEEE Xplore: 07 March 2025
ISBN Information:

ISSN Information:

Conference Location: Hyderabad, India

Funding Agency:


Contact IEEE to Subscribe

References

References is not available for this document.