AVE-CLIP: AudioCLIP-based Multi-window Temporal Transformer for Audio Visual Event Localization | IEEE Conference Publication | IEEE Xplore