Skip to Main Content
We present an audio-visual user authentication system which extracts multiple face profiles using a simple user-interface and compresses these face profiles into compact transformed face profile (TFP) feature vectors. The MFCC vectors from the spoken password are combined with the TFP vectors to form a joint AV feature vector processed by a simple nearest neighbour classifier. The speaking-style of the user is also efficiently captured by a compact FGRAM-CFD representation of the entire spoken password and merged with the face information by a post-fusion classification. The resulting person authentication framework delivers high performance and resilience against imposter attacks as demonstrated in extensive simulations conducted on an in-house audio-visual userID database, collected in real-life office environment.