When we begin to build and interact with machines or robots that either look like humans or have human functionalities and capabilities, then people may well interact with their human-like machines in ways that mimic human-human communication. For example, if a robot has a face, a human might interact with it similarly to how humans interact with other creatures with faces, Specifically, a human might talk to it, gesture to it, smile at it, and so on. If a human interacts with a computer or a machine that understands spoken commands, the human might converse with the machine, expecting it to have competence in spoken language. In our research on a multimodal interface to mobile robots, we have assumed a model of communication and interaction that, in a sense, mimics how people communicate. Our interface therefore incorporates both natural language understanding and gesture recognition as communication modes. We limited the interface to these two modes to simplify integrating them in the interface and to make our research more tractable. We believe that with an integrated system, the user is less concerned with how to communicate (which interactive mode to employ for a task), and is therefore free to concentrate on the tasks and goals at hand. Because we integrate all our system's components, users can choose any combination of our interface's modalities. The onus is on our interface to integrate the input, process it, and produce the desired results.