Skip to Main Content
Recent years have witnessed an exciting progress in mobile visual search with applications to location recognition and streaming augmented reality. Most existing works among them are deployed with reference images coming from street views in urban scene. In such scenario, an interesting yet untouched problem is how to determine the viewing angle of the visual query aside of search, which could benefit multidisciplinary applications such as purifying the visual matching and accelerating the streaming AR. In this paper, we study the viewing angle estimation by exploiting the visual appearance of the query, which might be further improved by incorporating the coarse mobile context such as gyro or compass information. Our main idea is to treat this problem as a scene classification problem, upon which the key design is an optimal visual signature to reveal diversity of different viewing angles. We introduce a novel layout based viewing angle descriptor, which is based on carefully designed spatial division as well as appearance feature like color, texture and gradient. We have validated our approach on our dataset containing 1232 street view images in the urban areas of Manhattan, New York City. We show that our proposed descriptor has outperformed several alternatives in holistic image representations, including GIST, HOG and bag-of-feature with spatial pyramid matching.