Cross-Modal Multistep Fusion Network With Co-Attention for Visual Question Answering | IEEE Journals & Magazine | IEEE Xplore