Nancy Agarwal, M. A. Wani, Patrick A. H. Bours, S. Jabin, S. Z. Hussain
{"title":"A Language-independent gender classifier for Online Social Networks","authors":"Nancy Agarwal, M. A. Wani, Patrick A. H. Bours, S. Jabin, S. Z. Hussain","doi":"10.1109/CICT48419.2019.9066196","DOIUrl":null,"url":null,"abstract":"Designing gender predictor for the Online Social Network (OSN) is receiving considerable attention from the research communities of different domains. However, the gender classifiers proposed by earlier studies for social media content so far, highly rely on the language used by the users for writing the content. It implies that the prediction model trained on one language (say English) will likely fail in identifying the gender of users with other languages (for example, Spanish). The study conducted in this paper aims to identify the features from user content on an OSN, which will assist in devising a Language-Independent Gender Classifier (LIGC). The experiments are performed on the Facebook networking site. The site provides the users with the list of various personal attributes that they may or may not reveal to other users on the network. The presented work collects such information of the Facebook users and carries out rigorous feature analysis to know whether this information varies between men and women on Facebook. Furthermore, several machine learning algorithms including Random Forest, SVM, Naïve Bayes, and kNN have been employed to determine the potential of the proposed feature set. Random forest approach achieves the highest value (70%) of performance metric, AUROC (Area under the Receiver Operating Characteristic). The current study is the first attempt to utilize information revelation for designing a gender identifier for OSN that is independent of the language used by the members.","PeriodicalId":234540,"journal":{"name":"2019 IEEE Conference on Information and Communication Technology","volume":"23 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE Conference on Information and Communication Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/CICT48419.2019.9066196","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Designing gender predictor for the Online Social Network (OSN) is receiving considerable attention from the research communities of different domains. However, the gender classifiers proposed by earlier studies for social media content so far, highly rely on the language used by the users for writing the content. It implies that the prediction model trained on one language (say English) will likely fail in identifying the gender of users with other languages (for example, Spanish). The study conducted in this paper aims to identify the features from user content on an OSN, which will assist in devising a Language-Independent Gender Classifier (LIGC). The experiments are performed on the Facebook networking site. The site provides the users with the list of various personal attributes that they may or may not reveal to other users on the network. The presented work collects such information of the Facebook users and carries out rigorous feature analysis to know whether this information varies between men and women on Facebook. Furthermore, several machine learning algorithms including Random Forest, SVM, Naïve Bayes, and kNN have been employed to determine the potential of the proposed feature set. Random forest approach achieves the highest value (70%) of performance metric, AUROC (Area under the Receiver Operating Characteristic). The current study is the first attempt to utilize information revelation for designing a gender identifier for OSN that is independent of the language used by the members.
在线社交网络(OSN)性别预测器的设计受到了各领域研究界的广泛关注。然而,到目前为止,早期针对社交媒体内容的研究提出的性别分类器,高度依赖于用户撰写内容时使用的语言。这意味着在一种语言(比如英语)上训练的预测模型可能无法识别使用其他语言(比如西班牙语)的用户的性别。本文的研究旨在从OSN上的用户内容中识别特征,这将有助于设计与语言无关的性别分类器(LIGC)。实验是在Facebook社交网站上进行的。该网站为用户提供各种个人属性的列表,他们可能会或可能不会向网络上的其他用户透露这些属性。本文收集了Facebook用户的这些信息,并进行了严格的特征分析,以了解这些信息在Facebook上的男性和女性之间是否存在差异。此外,还使用了几种机器学习算法,包括随机森林、SVM、Naïve贝叶斯和kNN来确定所提出的特征集的潜力。随机森林方法实现了性能指标AUROC (Receiver Operating Characteristic Area under Area)的最大值(70%)。本研究首次尝试利用信息启示来设计一个独立于成员使用语言的OSN性别标识符。