N-Gram Approach for Gender Prediction

T. Raghunadha Reddy, B. V. Vardhan, P. Vijayapal Reddy
{"title":"N-Gram Approach for Gender Prediction","authors":"T. Raghunadha Reddy, B. V. Vardhan, P. Vijayapal Reddy","doi":"10.1109/IACC.2017.0176","DOIUrl":null,"url":null,"abstract":"The Internet was growing with huge amount of information, through Blogs, Twitter tweets, Reviews, social media network and with other information content. Most of the text in the internet was unstructured and anonymous. Author Profiling is a text classification technique that is used to predict the profiling characteristics of the authors like gender, age, country, native language and educational background by analyzing their texts. Researchers proposed different types of features such as lexical, content based, structural and syntactic features to identify the writing styles of the authors. Most of the existing approaches in Author Profiling used the combination of features to represent a document vector for classification. In this paper, a new model was proposed in which document weights were calculated with combination of POS N-grams and most frequent terms. These document weights were used to represent the document vectors for classification. This experiment was carried out on the reviews domain to predict the gender of the authors and the achieved results were promising when compared with the existing approaches in Author Profiling.","PeriodicalId":248433,"journal":{"name":"2017 IEEE 7th International Advance Computing Conference (IACC)","volume":"11 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"14","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2017 IEEE 7th International Advance Computing Conference (IACC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/IACC.2017.0176","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 14

Abstract

The Internet was growing with huge amount of information, through Blogs, Twitter tweets, Reviews, social media network and with other information content. Most of the text in the internet was unstructured and anonymous. Author Profiling is a text classification technique that is used to predict the profiling characteristics of the authors like gender, age, country, native language and educational background by analyzing their texts. Researchers proposed different types of features such as lexical, content based, structural and syntactic features to identify the writing styles of the authors. Most of the existing approaches in Author Profiling used the combination of features to represent a document vector for classification. In this paper, a new model was proposed in which document weights were calculated with combination of POS N-grams and most frequent terms. These document weights were used to represent the document vectors for classification. This experiment was carried out on the reviews domain to predict the gender of the authors and the achieved results were promising when compared with the existing approaches in Author Profiling.
性别预测的N-Gram方法
通过博客、Twitter推文、评论、社交媒体网络和其他信息内容,互联网随着大量信息的增长而增长。互联网上的大多数文本都是非结构化和匿名的。作者分析是一种文本分类技术,通过分析作者的文本,预测作者的性别、年龄、国家、母语和教育背景等特征。研究人员提出了不同类型的特征,如词汇特征、内容特征、结构特征和句法特征,以识别作者的写作风格。作者分析中的大多数现有方法使用特征组合来表示用于分类的文档向量。本文提出了一种结合词频n图和最频繁项计算文档权重的新模型。这些文档权重被用来表示用于分类的文档向量。本实验在评论域上进行了作者性别预测,与现有的作者分析方法相比,取得了令人满意的结果。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信