在线社交网络的独立于语言的性别分类器

Nancy Agarwal, M. A. Wani, Patrick A. H. Bours, S. Jabin, S. Z. Hussain
{"title":"在线社交网络的独立于语言的性别分类器","authors":"Nancy Agarwal, M. A. Wani, Patrick A. H. Bours, S. Jabin, S. Z. Hussain","doi":"10.1109/CICT48419.2019.9066196","DOIUrl":null,"url":null,"abstract":"Designing gender predictor for the Online Social Network (OSN) is receiving considerable attention from the research communities of different domains. However, the gender classifiers proposed by earlier studies for social media content so far, highly rely on the language used by the users for writing the content. It implies that the prediction model trained on one language (say English) will likely fail in identifying the gender of users with other languages (for example, Spanish). The study conducted in this paper aims to identify the features from user content on an OSN, which will assist in devising a Language-Independent Gender Classifier (LIGC). The experiments are performed on the Facebook networking site. The site provides the users with the list of various personal attributes that they may or may not reveal to other users on the network. The presented work collects such information of the Facebook users and carries out rigorous feature analysis to know whether this information varies between men and women on Facebook. Furthermore, several machine learning algorithms including Random Forest, SVM, Naïve Bayes, and kNN have been employed to determine the potential of the proposed feature set. Random forest approach achieves the highest value (70%) of performance metric, AUROC (Area under the Receiver Operating Characteristic). The current study is the first attempt to utilize information revelation for designing a gender identifier for OSN that is independent of the language used by the members.","PeriodicalId":234540,"journal":{"name":"2019 IEEE Conference on Information and Communication Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"A Language-independent gender classifier for Online Social Networks\",\"authors\":\"Nancy Agarwal, M. A. Wani, Patrick A. H. Bours, S. Jabin, S. Z. Hussain\",\"doi\":\"10.1109/CICT48419.2019.9066196\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Designing gender predictor for the Online Social Network (OSN) is receiving considerable attention from the research communities of different domains. However, the gender classifiers proposed by earlier studies for social media content so far, highly rely on the language used by the users for writing the content. It implies that the prediction model trained on one language (say English) will likely fail in identifying the gender of users with other languages (for example, Spanish). The study conducted in this paper aims to identify the features from user content on an OSN, which will assist in devising a Language-Independent Gender Classifier (LIGC). The experiments are performed on the Facebook networking site. The site provides the users with the list of various personal attributes that they may or may not reveal to other users on the network. The presented work collects such information of the Facebook users and carries out rigorous feature analysis to know whether this information varies between men and women on Facebook. Furthermore, several machine learning algorithms including Random Forest, SVM, Naïve Bayes, and kNN have been employed to determine the potential of the proposed feature set. Random forest approach achieves the highest value (70%) of performance metric, AUROC (Area under the Receiver Operating Characteristic). The current study is the first attempt to utilize information revelation for designing a gender identifier for OSN that is independent of the language used by the members.\",\"PeriodicalId\":234540,\"journal\":{\"name\":\"2019 IEEE Conference on Information and Communication Technology\",\"volume\":\"23 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE Conference on Information and Communication Technology\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/CICT48419.2019.9066196\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Conference on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT48419.2019.9066196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在线社交网络(OSN)性别预测器的设计受到了各领域研究界的广泛关注。然而,到目前为止,早期针对社交媒体内容的研究提出的性别分类器,高度依赖于用户撰写内容时使用的语言。这意味着在一种语言(比如英语)上训练的预测模型可能无法识别使用其他语言(比如西班牙语)的用户的性别。本文的研究旨在从OSN上的用户内容中识别特征,这将有助于设计与语言无关的性别分类器(LIGC)。实验是在Facebook社交网站上进行的。该网站为用户提供各种个人属性的列表,他们可能会或可能不会向网络上的其他用户透露这些属性。本文收集了Facebook用户的这些信息,并进行了严格的特征分析,以了解这些信息在Facebook上的男性和女性之间是否存在差异。此外,还使用了几种机器学习算法,包括随机森林、SVM、Naïve贝叶斯和kNN来确定所提出的特征集的潜力。随机森林方法实现了性能指标AUROC (Receiver Operating Characteristic Area under Area)的最大值(70%)。本研究首次尝试利用信息启示来设计一个独立于成员使用语言的OSN性别标识符。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
A Language-independent gender classifier for Online Social Networks
Designing gender predictor for the Online Social Network (OSN) is receiving considerable attention from the research communities of different domains. However, the gender classifiers proposed by earlier studies for social media content so far, highly rely on the language used by the users for writing the content. It implies that the prediction model trained on one language (say English) will likely fail in identifying the gender of users with other languages (for example, Spanish). The study conducted in this paper aims to identify the features from user content on an OSN, which will assist in devising a Language-Independent Gender Classifier (LIGC). The experiments are performed on the Facebook networking site. The site provides the users with the list of various personal attributes that they may or may not reveal to other users on the network. The presented work collects such information of the Facebook users and carries out rigorous feature analysis to know whether this information varies between men and women on Facebook. Furthermore, several machine learning algorithms including Random Forest, SVM, Naïve Bayes, and kNN have been employed to determine the potential of the proposed feature set. Random forest approach achieves the highest value (70%) of performance metric, AUROC (Area under the Receiver Operating Characteristic). The current study is the first attempt to utilize information revelation for designing a gender identifier for OSN that is independent of the language used by the members.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信