AVCaps: An Audio-Visual Dataset With Modality-Specific Captions | IEEE Journals & Magazine | IEEE Xplore