1 INTRODUCTION
Storytelling is almost as old as mankind’s history itself, as it has always been used to pass on knowledge to subsequent generations [41]. Traditional storytelling is characterized by a "single teller addressing an audience using speech, physical gestures, possibly props, and non-speech sounds." [86, p. 1] As the foundation of human knowledge structures and memory, stories are an integral part of social interaction [85]. In terms of human-human communication, the above described multimodality plays an important role, as it benefits the speaker at various levels. Not only can it help to communicate more efficiently by providing complementary information. Also, providing redundant information by using different modalities simultaneously supports robustness. Thus, information provided in a specific modality can be used to enhance, disambiguate, or highlight information transported via another modality [81]. Consistency is crucial here. Walker-Andrews et al., for ex, have observed that babies can only recognize emotional facial expressions when they are accompanied by appropriate vocal expressions [82], [84]. Consistency, i.e. corresponding verbal and non-verbal behaviour, also leads to a higher level of perceived likability in adults [84]. Similarly, human-robot interaction (HRI) benefits from multimodality (for a survey see [48]). To this end, the importance of consistency is also indicated in HRI, for example between vocal expressions and body postures [77] or facial expressions [4].