This paper investigates situated utterance generation in human-robot interaction. In addition, we study the achievement of joint attention because a person must have joint attention with a robot to identify the object indicated by a situated utterance description generated by the robot. This paper proposes a joint attention mechanism to achieve such joint attention as well as a speech generation system named Linta-III. By using the joint attention mechanism, Linta-III can omit obvious information in the situation from an utterance description. The joint attention mechanism employs eye contact and attention expression functions. These functions are the robot's physical expressions, and they allow the joint attention mechanism to draw the person's attention to the same sensor information as that noticed by the robot. We have also conducted a psychological experiment to evaluate the joint attention mechanism. The results indicated that the eye contact and attention expression functions were effective methods in the development of joint attention.