{"title":"Name-Centric Gender Inference Using Data Analytics","authors":"S. Pudaruth, Upasana Singh, Hoshiladevi Ramnial","doi":"10.1109/ISCMI.2016.44","DOIUrl":null,"url":null,"abstract":"In this era of globalisation and technology, determining the gender of a person from forenames has numerous applications especially in the machine translation and natural language processing fields. In this paper, we used a supervised machine learning approach to classify 10000 first names into either a male or female name. The names were manually extracted from an online telephone directory and then manually classified into an appropriate category. We obtained the highest accuracy of 88.0% when using support vector machines while the Naïve Bayes produced the lowest accuracy of 84.7%. A total of 15 features were used in this study. Traditionally, such systems have relied on a name dictionary to output the gender of forenames. However, our proposed system can predict the gender of unseen or unknown names. Furthermore, our dataset consists of names from different origins such as European, African, Arabic, Indian and Chinese, unlike previous studies which use names from one origin only.","PeriodicalId":417057,"journal":{"name":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI.2016.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1
Abstract
In this era of globalisation and technology, determining the gender of a person from forenames has numerous applications especially in the machine translation and natural language processing fields. In this paper, we used a supervised machine learning approach to classify 10000 first names into either a male or female name. The names were manually extracted from an online telephone directory and then manually classified into an appropriate category. We obtained the highest accuracy of 88.0% when using support vector machines while the Naïve Bayes produced the lowest accuracy of 84.7%. A total of 15 features were used in this study. Traditionally, such systems have relied on a name dictionary to output the gender of forenames. However, our proposed system can predict the gender of unseen or unknown names. Furthermore, our dataset consists of names from different origins such as European, African, Arabic, Indian and Chinese, unlike previous studies which use names from one origin only.