CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation | IEEE Conference Publication | IEEE Xplore