使用数据分析的以名字为中心的性别推断

S. Pudaruth, Upasana Singh, Hoshiladevi Ramnial
{"title":"使用数据分析的以名字为中心的性别推断","authors":"S. Pudaruth, Upasana Singh, Hoshiladevi Ramnial","doi":"10.1109/ISCMI.2016.44","DOIUrl":null,"url":null,"abstract":"In this era of globalisation and technology, determining the gender of a person from forenames has numerous applications especially in the machine translation and natural language processing fields. In this paper, we used a supervised machine learning approach to classify 10000 first names into either a male or female name. The names were manually extracted from an online telephone directory and then manually classified into an appropriate category. We obtained the highest accuracy of 88.0% when using support vector machines while the Naïve Bayes produced the lowest accuracy of 84.7%. A total of 15 features were used in this study. Traditionally, such systems have relied on a name dictionary to output the gender of forenames. However, our proposed system can predict the gender of unseen or unknown names. Furthermore, our dataset consists of names from different origins such as European, African, Arabic, Indian and Chinese, unlike previous studies which use names from one origin only.","PeriodicalId":417057,"journal":{"name":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","volume":"21 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2016-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Name-Centric Gender Inference Using Data Analytics\",\"authors\":\"S. Pudaruth, Upasana Singh, Hoshiladevi Ramnial\",\"doi\":\"10.1109/ISCMI.2016.44\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In this era of globalisation and technology, determining the gender of a person from forenames has numerous applications especially in the machine translation and natural language processing fields. In this paper, we used a supervised machine learning approach to classify 10000 first names into either a male or female name. The names were manually extracted from an online telephone directory and then manually classified into an appropriate category. We obtained the highest accuracy of 88.0% when using support vector machines while the Naïve Bayes produced the lowest accuracy of 84.7%. A total of 15 features were used in this study. Traditionally, such systems have relied on a name dictionary to output the gender of forenames. However, our proposed system can predict the gender of unseen or unknown names. Furthermore, our dataset consists of names from different origins such as European, African, Arabic, Indian and Chinese, unlike previous studies which use names from one origin only.\",\"PeriodicalId\":417057,\"journal\":{\"name\":\"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"volume\":\"21 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2016-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ISCMI.2016.44\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2016 3rd International Conference on Soft Computing & Machine Intelligence (ISCMI)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ISCMI.2016.44","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

摘要

在这个全球化和科技的时代,从名字中确定一个人的性别有许多应用,特别是在机器翻译和自然语言处理领域。在本文中,我们使用监督机器学习方法将10000个名字分类为男性或女性名字。这些名字是人工从在线电话簿中提取出来的,然后人工分类到适当的类别中。我们使用支持向量机获得的准确率最高,为88.0%,而Naïve贝叶斯的准确率最低,为84.7%。本研究共使用了15个特征。传统上,这样的系统依赖于名称字典来输出名称的性别。然而,我们提出的系统可以预测未见或未知姓名的性别。此外,我们的数据集包括来自不同来源的名字,如欧洲人、非洲人、阿拉伯人、印度人和中国人,而不像以前的研究只使用一个来源的名字。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Name-Centric Gender Inference Using Data Analytics
In this era of globalisation and technology, determining the gender of a person from forenames has numerous applications especially in the machine translation and natural language processing fields. In this paper, we used a supervised machine learning approach to classify 10000 first names into either a male or female name. The names were manually extracted from an online telephone directory and then manually classified into an appropriate category. We obtained the highest accuracy of 88.0% when using support vector machines while the Naïve Bayes produced the lowest accuracy of 84.7%. A total of 15 features were used in this study. Traditionally, such systems have relied on a name dictionary to output the gender of forenames. However, our proposed system can predict the gender of unseen or unknown names. Furthermore, our dataset consists of names from different origins such as European, African, Arabic, Indian and Chinese, unlike previous studies which use names from one origin only.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信