Skip to Main Content
Privacy concerns over the ever-increasing gathering of personal information by various institutions led to the development of privacy preserving data mining. Two main approaches to privacy preserving data mining have emerged in recent years. The first approach protects the privacy of the data by perturbing the data through a random process. The second approach uses cryptographic techniques to perform secure multi-party computation. While the second approach is generally viewed as superior due to its strong assurance of privacy, there are reasons why it might not be appropriate in certain applications. For instance, in some cases, the data collection phase usually does not allow for complicated processing, such as ballots cast via the short message service. In other cases, such as paper surveys, the data flow is unidirectional from the survey taker to the survey collector. The requested data is sent once with no further interaction. In both these cases, the complicated computations and iterative interaction between data originators and data collectors that are required for secure multi-party computation cannot be used. We show how, in these cases, random perturbation of data can be a useful approach to privacy preserving data mining. In particular, we study a data perturbation scheme which was shown to have asymptotically small privacy loss and information loss. We illustrate the ideas using an example of a privacy preserving paper-based survey where no computation is done by the users. Finally, we apply this method to privacy preserving association rules mining.