Skip to Main Content
In this paper, we report on the development of an efficient user authentication system based on a combined acquisition of online pen and speech signals. The novelty of our approach is in the simultaneous recording of these two modalities, simply asking the user to utter what she/he is writing. The main benefit of this multimodal approach is a better accuracy at no extra costs in terms of access time or inconvenience. Another benefit comes from an increased difficulty for forgers willing to perform imitation attacks as two signals need to be reproduced. We are comparing here two potential scenarios of use. The first one is called spoken signatures where the user signs and says the content of the signature. The second scenario is based on spoken handwriting where the user is prompted to write and read the content of sentences randomly extracted from a text. Data according to these two scenarios have been recorded from a set of 70 users. In the first part of this paper, we describe the acquisition procedure, and we comment on the viability and usability of such simultaneous recordings. Our conclusions are supported by a short survey performed with the users. In the second part, we present the authentication systems that we have developed for both scenarios. More specifically, our strategy was to model independently both streams of data and to perform a fusion at the score level. Starting from a state-of-the-art-modeling algorithm based on Gaussian Mixture Models trained with an Expectation-Maximization procedure, we report on several significant improvements that are brought. As a general observation, the use of both modalities outperforms significantly the modalities used alone.