This paper introduces a high performance vision tracking system for mobile robot using sensor data fusion. For mobile robots, it is difficult to collect continuous vision information due to robot's motion. To solve this problem, the proposed vision tracking system estimates the robot's position relative to a target and rotates the camera towards the target. This concept is derived from the human eye reflex mechanism, known as the Vestibulo-Ocular Reflex (VOR), for compensating the head motion. This concept for tracking the target results in much higher performance levels, when compared with the conventional method that rotates the camera using only vision information. The proposed system do not require heavy computing loads to process image data and can track the target continuously even during vision occlusion. The robot motion information is estimated using data from accelerometer, gyroscope, and encoders. This multi-sensor data fusion is achieved using Kalman filter. The proposed vision tracking system is implemented on a two-wheeled robot. The experimental results show that the proposed system achieves excellent tracking and recognition performance in various motion scenarios, including scenarios where camera is temporarily blocked from the target.