Skip to Main Content
One approach to overcoming the problem of too much information about a user being disclosed on social networking services (by the user or by the user's friends) through natural language texts (blogs, comments, status updates, etc.) is to anonymize the texts. However, determining which information is sensitive and should thus be anonymized is a challenging problem. Sensitive information is any information about a user that could be used to identify the user. We have developed an algorithm that anonymizes sensitive information in text to be posted by generalization. Synonyms for the anonymized information are used as fingerprints for detecting a discloser of the information. The fingerprints are quantified using the modified discernability metric to enable an appropriate level of anonymity to be used for each group of the user's friends. The fingerprints cannot be converted into another one to incorrectly identify a person who has revealed sensitive information. Use of the algorithm to control the disclosure of information on Facebook demonstrated that it works well not only in social networking but also in other areas (health, religion, politics, military, etc.) that store sensitive information.
Date of Conference: 20-24 Aug. 2012