Abstract:
With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such a...Show MoreMetadata
Abstract:
With the widespread proliferation of location-aware devices and social media applications, more and more people share information on location-based social networks such as Twitter. Such data can be beneficial to better plan and manage individual's activities and other social applications, e.g., location-based advertisement or recommendation. However, only a very small proportion of tweets are geotagged due to privacy concerns or lack of underlying positioning infrastructures. Hence it is meaningful to estimate the geographic information for non-geotagged tweets, i.e., geocoding, which can help to improve the applicability and utility of social media data. Contrary to existing geocoding approaches, this paper aims at the privacy risk and providing a fine-grained estimation. In this paper, we propose Privacy-preserving GEocoding of Non-geotagged Tweets (P-GENT) for geocoding non-geotagged tweets with fine-grained estimation whilst protecting privacy. Our approach estimates the geographic location of a non-geotagged tweet based on the similarities between the content of the tweet and the keyword lists of detected local events form the archived geo-tagged tweets during the same time period. This approach implements a spatio-temporal clustering algorithm to discover local events with a fine-grained granularity and an important keyword extraction mechanism to describe the detected local event. In addition, a density-seed discovery approach is used to reduce the sparseness of geo-tagged tweets and the time complexity of clustering approach. The experimental evaluation with real-world data demonstrates that our approach has at most 92% precision for one timeslot and 33-43% precision remained for all time slots after using privacy-preserving mechanisms.
Date of Conference: 01-03 August 2018
Date Added to IEEE Xplore: 06 September 2018
ISBN Information:
Electronic ISSN: 2324-9013