社交媒体中阿拉伯语的性别推断

A. I. Al-Ghadir, Abdullatif M. AlAbdullatif, Aqil M. Azmi
{"title":"社交媒体中阿拉伯语的性别推断","authors":"A. I. Al-Ghadir, Abdullatif M. AlAbdullatif, Aqil M. Azmi","doi":"10.4018/IJKSR.2014100101","DOIUrl":null,"url":null,"abstract":"The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector k-TV, which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine SVM and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification.","PeriodicalId":296518,"journal":{"name":"Int. J. Knowl. Soc. Res.","volume":"28 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2014-10-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"Gender Inference for Arabic Language in Social Media\",\"authors\":\"A. I. Al-Ghadir, Abdullatif M. AlAbdullatif, Aqil M. Azmi\",\"doi\":\"10.4018/IJKSR.2014100101\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector k-TV, which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine SVM and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification.\",\"PeriodicalId\":296518,\"journal\":{\"name\":\"Int. J. Knowl. Soc. Res.\",\"volume\":\"28 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-10-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Int. J. Knowl. Soc. Res.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.4018/IJKSR.2014100101\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Int. J. Knowl. Soc. Res.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4018/IJKSR.2014100101","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 4

摘要

社交媒体的广泛使用吸引了一群新的研究人员,他们在寻找关于用户是谁、什么、在哪里的信息。一些信息检索研究者对识别用户的性别、年龄和教育程度感兴趣。这项工作的目的是识别社交媒体上阿拉伯语帖子中的性别。与性别分类相关的工作大多是针对社交媒体中基于英语的内容。其他语言的工作,比如阿拉伯语,几乎没有。通常人们在社交媒体上使用口语表达自己,所以这项研究旨在使用阿拉伯语的沙特方言来识别性别。为了解决性别识别问题,作者提出了一种基于词干出现频率和k-Top词的k-Top向量k-TV方法。这项工作的一部分需要汇编沙特方言词汇的数据集。为此,依赖于一个知名的广泛使用的社交网站。为了测试这个系统,我们编译了1200个男女平均分配的样本。作者使用不同数量的样本训练支持向量机SVM和k-NN分类器进行训练和测试。SVM在性别分类方面做得更好,准确率达到95%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Gender Inference for Arabic Language in Social Media
The widespread usage of social media has attracted a new group of researchers seeking information on who, what and, where the users are. Some of the information retrieval researchers are interested in identifying the gender, age group, and the educational level of the users. The objective of this work is to identify the gender in the Arabic posts in the social media. Most of the works related to gender classification has been for English based content in the social media. Work for other languages, such as Arabic, is almost next to none. Typically people express themselves in the social media using colloquial, so this study is geared towards the identification of genders using the Saudi dialect of the Arabic language. To solve the gender identification problem the authors, a novel method called k-Top Vector k-TV, which is based on the k-top words based on the words occurrences and the frequency of the stems, was introduced. Part of this work required compiling a dataset of Saudi dialect words. For this, a well-known widely used social site was relied on. To test the system, we compiled 1200 samples equally split between both genders. The authors trained Support Vector Machine SVM and k-NN classifiers using different number of samples for training and testing. SVM did a better job and achieved an accuracy of 95% for gender classification.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信