X-STA: Cross-Modal Spatial-Temporal Alignment Network for Unified Audio-Visual Segmentation | IEEE Journals & Magazine | IEEE Xplore