Skip to Main Content
Automatic visual language identification (VLID) is the technology of using information derived from the visual appearance and movement of the speech articulators to identify the language being spoken, without the use of any audio information. This technique for language identification (LID) is useful in situations in which conventional audio processing is ineffective (very noisy environments), or impossible (no audio signal is available). Research in this field is also beneficial in the related field of automatic lip-reading. This paper introduces several methods for visual language identification (VLID). They are based upon audio LID techniques, which exploit language phonology and phonotactics to discriminate languages. We show that VLID is possible in a speaker-dependent mode by discriminating different languages spoken by an individual, and we then extend the technique to speaker-independent operation, taking pains to ensure that discrimination is not due to artifacts, either visual (e.g., skin-tone) or audio (e.g., rate of speaking). Although the low accuracy of visual speech recognition currently limits the performance of VLID, we can obtain an error-rate of <; 10% in discriminating between Arabic and English on 19 speakers and using about 30 s of visual speech.