Skip to Main Content
Protecting sensitive information while preserving the share ability and usability of data is becoming increasingly important in the outsourced business process industry. Particularly in the context of call-centers a lot of customer related sensitive information is stored in audio recordings. In this work, we address the problem of protecting sensitive customer information in audio recordings and Automatic Speech Recognition (ASR) transcripts. The high word error rates, spontaneous nature of communication and the variability in agent-customer interaction makes it harder and expensive to craft rules or build annotators to detect sensitive information. In this paper we propose a semi supervised method to model sensitive information as a directed graph which is automatically generated from ASR transcripts. Vocabularies specific to the nodes are generated using features of context sensitive clusters. The direction and weight of the edge capture the ordering and timing constraints respectively for these features. These constraints are learnt from the time stamps associated with ASR transcripts. The effectiveness of this approach is demonstrated by applying it to the problem of detecting and locating credit card transaction in real life conversations between agents and customer of a call center.