Skip to Main Content
The large datasets are being mined to extract hidden knowledge and patterns that assist decision makers in making effective, efficient, and timely decisions in an ever increasing competitive world. This type of “knowledge-driven” data mining activity is not possible without sharing the “datasets” between their owners and data mining experts (or corporations); as a consequence, protecting ownership (by embedding a watermark) on the datasets is becoming relevant. The most important challenge in watermarking (to be mined) datasets is: how to preserve knowledge in features or attributes? Usually, an owner needs to manually define “Usability constraints” for each type of dataset to preserve the contained knowledge. The major contribution of this paper is a novel formal model that facilitates a data owner to define usability constraints-to preserve the knowledge contained in the dataset-in an automated fashion. The model aims at preserving “classification potential” of each feature and other major characteristics of datasets that play an important role during the mining process of data; as a result, learning statistics and decision-making rules also remain intact. We have implemented our model and integrated it with a new watermark embedding algorithm to prove that the inserted watermark not only preserves the knowledge contained in a dataset but also significantly enhances watermark security compared with existing techniques. We have tested our model on 25 different data-mining datasets to show its efficacy, effectiveness, and the ability to adapt and generalize.