Audio-visual speech activity detection in a two-speaker scenario incorporating depth information from a profile or frontal view | IEEE Conference Publication | IEEE Xplore