Loading [MathJax]/extensions/MathMenu.js
BiG-Transformer: Integrating Hierarchical Features for Transformer via Bipartite Graph | IEEE Conference Publication | IEEE Xplore

BiG-Transformer: Integrating Hierarchical Features for Transformer via Bipartite Graph


Abstract:

Self-attention based models like Transformer have achieved great success on kinds of Natural Language Processing tasks. However, the traditional fixed fully-connected str...Show More

Abstract:

Self-attention based models like Transformer have achieved great success on kinds of Natural Language Processing tasks. However, the traditional fixed fully-connected structure faces many challenges in practice, such as computing redundancy, fixed granularity, and inexplicable. In this paper, we present BiG-Transformer, which employs attention with bipartite-graph structure to replace the fully-connected self-attention mechanism in Transformer. Specifically, two parts of the graph are designed for integrating hierarchical semantic information, and two types of connection are proposed to fuse information from different positions. Experiments on four tasks show the BiG-Transformer achieves better performance compared to Transformer liked models and Recurrent Neural Networks.
Date of Conference: 19-24 July 2020
Date Added to IEEE Xplore: 28 September 2020
ISBN Information:

ISSN Information:

Conference Location: Glasgow, UK

Contact IEEE to Subscribe

References

References is not available for this document.