Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning | IEEE Conference Publication | IEEE Xplore