SCRAPING NEWS SITES AND SOCIAL NETWORKS FOR PREJUDICE TERM ANALYSIS

P. Henriques, Cristiana Araújo, Isabel Ermida, Idalete Dias
{"title":"SCRAPING NEWS SITES AND SOCIAL NETWORKS FOR PREJUDICE TERM ANALYSIS","authors":"P. Henriques, Cristiana Araújo, Isabel Ermida, Idalete Dias","doi":"10.33965/ac2019_201912l022","DOIUrl":null,"url":null,"abstract":"Computer-Mediated Communication (CMC) has paved the way for new patterns of linguistic aggravation. Hidden behind the screen, anyone can comment on any other person's opinion using an offensive or injurious tone. Besides, types of prejudice such as homophobia, sexism, racism, xenophobia, anticlericalism, body/addiction shaming, among others, are easily found nowadays in social networks and other forms of interactive Web sites potentiated by Web 2.0. This increasing violence deserves further investigation from different academic perspectives, among which Sociolinguistics stands out. This paper is concerned with the design and development of a set of computer-based tools to collect articles and posts with the respective comment threads that can be used as sources to extract prejudice terms and allow different analyses to be conducted. These prejudice terms were devised using a sociolinguistic variable stratification approach. We will focus on the filters used to extract the relevant fields from the Web pages collected, and on the converters used to adapt formats to obtain a common format for information representation. We will also introduce the statistical analysis processor that explores the extracted data, in that format, to output a set of indicators.","PeriodicalId":432605,"journal":{"name":"Proceedings of the 16th International Conference on Applied Computing 2019","volume":"124 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 16th International Conference on Applied Computing 2019","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33965/ac2019_201912l022","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Computer-Mediated Communication (CMC) has paved the way for new patterns of linguistic aggravation. Hidden behind the screen, anyone can comment on any other person's opinion using an offensive or injurious tone. Besides, types of prejudice such as homophobia, sexism, racism, xenophobia, anticlericalism, body/addiction shaming, among others, are easily found nowadays in social networks and other forms of interactive Web sites potentiated by Web 2.0. This increasing violence deserves further investigation from different academic perspectives, among which Sociolinguistics stands out. This paper is concerned with the design and development of a set of computer-based tools to collect articles and posts with the respective comment threads that can be used as sources to extract prejudice terms and allow different analyses to be conducted. These prejudice terms were devised using a sociolinguistic variable stratification approach. We will focus on the filters used to extract the relevant fields from the Web pages collected, and on the converters used to adapt formats to obtain a common format for information representation. We will also introduce the statistical analysis processor that explores the extracted data, in that format, to output a set of indicators.
抓取新闻网站和社交网络进行偏见术语分析
计算机媒介交流(CMC)为语言恶化的新模式铺平了道路。隐藏在屏幕后面,任何人都可以用冒犯或伤害的语气评论其他人的观点。此外,诸如同性恋恐惧症、性别歧视、种族主义、仇外心理、反教权主义、身体/成瘾羞辱等偏见,在当今的社交网络和其他形式的Web 2.0交互式网站中很容易找到。这种不断增加的暴力值得从不同的学术角度进一步研究,其中社会语言学尤为突出。本文关注的是设计和开发一套基于计算机的工具,用于收集带有各自评论线程的文章和帖子,这些评论线程可以用作提取偏见术语的来源,并允许进行不同的分析。这些偏见术语是使用社会语言学变量分层方法设计的。我们将重点关注用于从所收集的Web页面中提取相关字段的过滤器,以及用于调整格式以获得用于信息表示的通用格式的转换器。我们还将介绍统计分析处理器,它以该格式探索提取的数据,以输出一组指标。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信