Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded | IEEE Conference Publication | IEEE Xplore