Abstract:
First name to gender mappings have been widely recognized as a critical tool to complete, study and validate data records in a range of different areas. In this study, we...Show MoreMetadata
Abstract:
First name to gender mappings have been widely recognized as a critical tool to complete, study and validate data records in a range of different areas. In this study, we investigate how organizations with large databases of existing entities can create their own mappings between first names and gender and how these mappings can be improved and utilized. Therefore, we first explore a dataset with demographic information on more than 6 million people, provided by a car insurance. We then study how naming conventions have changed over time and how they differ by nationality. Second, we build a probabilistic first name to gender mapping and augment the mapping by adding nationality and decade of birth to improve the mapping's performance. We test our mapping in a two label and three label setting and further validate our mapping by categorizing patent filings by gender of the inventor. We compare the results with previous studies' outcomes and find that our mapping produces high precision results. We validate that the additional information of nationality and year of birth improve the recall scores of name to gender mappings. Therefore, it constitutes an efficient process to improve data quality of organizations' records, whenever the attribute gender is missing or unreliable.
Date of Conference: 11-14 December 2017
Date Added to IEEE Xplore: 15 January 2018
ISBN Information: