In this paper, we describe the development of multi-touch tabletop display system and the classification of hand gesture commands for interacting with our system. And also, we analyze the suitability for interactive tabletop in light of the respective input and output degrees of freedom, as well as the precision and completeness provided by each. Our system is based on FTIR (frustrated total internal reflection) principle and hand gestures for necessary instructions are predefined using position sensing and tracking of multi-touch points and the number of fingertips. The system consists of two beam-projectors, diffuser film, infrared cameras and large acrylic screen that attached infrared LED. In recognition process, gesture commands are analyzed by comparing with predefined gesture instructions according to the number of contacted fingertips, Euclidean distance and angles between two bright spots. In this paper, vision based tabletop display system that we proposed provides much advantages understanding human and computer interaction. Also the efficiency of proposed method can be proved through controlling Google-earth.