Skip to Main Content
Mobile devices are becoming ubiquitous. People take pictures via their phone cameras to explore the world on the go. In many cases, they are concerned with the picture-related information. Understanding user intent conveyed by those pictures therefore becomes important. Existing mobile applications employ visual search to connect the captured picture with the physical world. However, they only achieve limited success due to the ambiguity nature of user intent in the picture-one picture usually contains multiple objects. By taking advantage of multitouch interactions on mobile devices, this paper presents a prototype of interactive mobile visual search, named TapTell, to help users formulate their visual intent more conveniently. This kind of search leverages limited yet natural user interactions on the phone to achieve more effective visual search while maintaining a satisfying user experience. We make three contributions in this work. First, we conduct a focus study on the usage patterns and concerned factors for mobile visual search, which in turn leads to the interactive design of expressing visual intent by gesture. Second, we introduce four modes of gesture-based interactions (crop, line, lasso, and tap) and develop a mobile prototype. Third, we perform an in-depth usability evaluation on these different modes, which demonstrates the advantage of interactions and shows that lasso is the most natural and effective interaction mode. We show that TapTell provides a natural user experience to use phone camera and gesture to explore the world. Based on the observation and conclusion, we also suggest some design principles for interactive mobile visual search in the future.