Abstract:
Self-attention based models like Transformer have achieved great success on kinds of Natural Language Processing tasks. However, the traditional fixed fully-connected str...Show MoreMetadata
Abstract:
Self-attention based models like Transformer have achieved great success on kinds of Natural Language Processing tasks. However, the traditional fixed fully-connected structure faces many challenges in practice, such as computing redundancy, fixed granularity, and inexplicable. In this paper, we present BiG-Transformer, which employs attention with bipartite-graph structure to replace the fully-connected self-attention mechanism in Transformer. Specifically, two parts of the graph are designed for integrating hierarchical semantic information, and two types of connection are proposed to fuse information from different positions. Experiments on four tasks show the BiG-Transformer achieves better performance compared to Transformer liked models and Recurrent Neural Networks.
Date of Conference: 19-24 July 2020
Date Added to IEEE Xplore: 28 September 2020
ISBN Information: