Language-Guided Audio-Visual Source Separation via Trimodal Consistency | IEEE Conference Publication | IEEE Xplore