Skip to Main Content
Mashup is a web technology that allows different service providers to flexibly integrate their expertise and to deliver highly customizable services to their customers. Data mashup is a special type of mashup application that aims at integrating data from multiple data providers depending on the user's request. However, integrating data from multiple sources brings about three challenges: 1) Simply joining multiple private data sets together would reveal the sensitive information to the other data providers. 2) The integrated (mashup) data could potentially sharpen the identification of individuals and, therefore, reveal their person-specific sensitive information that was not available before the mashup. 3) The mashup data from multiple sources often contain many data attributes. When enforcing a traditional privacy model, such as K-anonymity, the high-dimensional data would suffer from the problem known as the curse of high dimensionality, resulting in useless data for further data analysis. In this paper, we study and resolve a privacy problem in a real-life mashup application for the online advertising industry in social networks, and propose a service-oriented architecture along with a privacy-preserving data mashup algorithm to address the aforementioned challenges. Experiments on real-life data suggest that our proposed architecture and algorithm is effective for simultaneously preserving both privacy and information utility on the mashup data. To the best of our knowledge, this is the first work that integrates high-dimensional data for mashup service.