Abstract:
Natural language processing (NLP) systems usually require a huge amount of textual data but the publication of such datasets is often hindered by privacy and data protect...Show MoreMetadata
Abstract:
Natural language processing (NLP) systems usually require a huge amount of textual data but the publication of such datasets is often hindered by privacy and data protection issues. Here, we discuss the questions of de-identification related to three NLP areas, namely, clinical NLP, NLP for social media and information extraction from resumes. We also illustrate how de-identification is related to named entity recognition and we argue that de-identification tools can be successfully built on named entity recognizers.
Published in: 2014 37th International Convention on Information and Communication Technology, Electronics and Microelectronics (MIPRO)
Date of Conference: 26-30 May 2014
Date Added to IEEE Xplore: 24 July 2014
ISBN Information: