Stylometric and Semantic Analysis of Demographically Diverse Non-native English Review Data

Salim Sazzed
{"title":"Stylometric and Semantic Analysis of Demographically Diverse Non-native English Review Data","authors":"Salim Sazzed","doi":"10.1109/ASONAM55673.2022.10068612","DOIUrl":null,"url":null,"abstract":"The demographic knowledge facilitates a fine-grained interpretation of the user-generated review text and enables better decision-making. In this study, we aim to com-prehend how various attributes of non-native English text vary across demographically distinct groups. We introduce a non-native English corpus of around 1150 reviews representing four demographically diverse country-specific groups: Finland, Kenya, Bangladesh, and China. The reviews differ in various contexts, including geography, native language family, race and culture, and English proficiency levels of the reviewers. We then perform stylometric and semantic analysis on these distinct sets of reviews to unveil how the linguistic characteristics differ across the demography. The investigation reveals that stylometric features are mostly similar across the reviews of various groups; nevertheless, dissimilarities are observed in attributes, such as review length, presence of articles, or prepositions. We employ classical machine learning (ML) algorithms and transformer-based fine-tuned language models for categorizing the reviews into distinct demographic groups. We observe that semantic features yield slightly better efficacy than syntactic features for distinguishing the demography-specific reviews.","PeriodicalId":423113,"journal":{"name":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","volume":"39 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-11-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ASONAM55673.2022.10068612","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 2

Abstract

The demographic knowledge facilitates a fine-grained interpretation of the user-generated review text and enables better decision-making. In this study, we aim to com-prehend how various attributes of non-native English text vary across demographically distinct groups. We introduce a non-native English corpus of around 1150 reviews representing four demographically diverse country-specific groups: Finland, Kenya, Bangladesh, and China. The reviews differ in various contexts, including geography, native language family, race and culture, and English proficiency levels of the reviewers. We then perform stylometric and semantic analysis on these distinct sets of reviews to unveil how the linguistic characteristics differ across the demography. The investigation reveals that stylometric features are mostly similar across the reviews of various groups; nevertheless, dissimilarities are observed in attributes, such as review length, presence of articles, or prepositions. We employ classical machine learning (ML) algorithms and transformer-based fine-tuned language models for categorizing the reviews into distinct demographic groups. We observe that semantic features yield slightly better efficacy than syntactic features for distinguishing the demography-specific reviews.
人口统计学差异的非母语英语评论数据的文体和语义分析
人口统计知识有助于对用户生成的评论文本进行细粒度的解释,并实现更好的决策。在本研究中,我们旨在了解非英语母语文本的各种属性如何在人口统计学上不同的群体中变化。我们介绍了一个大约1150篇评论的非母语英语语料库,代表了四个人口统计学上不同的国家特定群体:芬兰、肯尼亚、孟加拉国和中国。在不同的背景下,包括地理、母语家庭、种族和文化,以及审稿人的英语熟练程度,审稿人的评论会有所不同。然后,我们对这些不同的评论集进行文体和语义分析,以揭示语言特征在人口统计学中的差异。调查显示,文体特征在不同群体的评论中大多相似;然而,不同的属性被观察到,如评论的长度,冠词的存在,或介词。我们使用经典的机器学习(ML)算法和基于转换器的微调语言模型将评论分类为不同的人口统计组。我们观察到,语义特征比句法特征在区分人口统计学特定评论方面的效果略好。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信