A multimodal approach to initialisation for top-down speaker diarization of television shows | IEEE Conference Publication | IEEE Xplore