SFU意见评论语料库:一个分析网络新闻评论的语料库。

IF 1.3 N/A LANGUAGE & LINGUISTICS
Corpus Pragmatics Pub Date : 2020-01-01 Epub Date: 2019-11-02 DOI:10.1007/s41701-019-00065-w
Varada Kolhatkar, Hanhan Wu, Luca Cavasso, Emilie Francis, Kavan Shukla, Maite Taboada
{"title":"SFU意见评论语料库:一个分析网络新闻评论的语料库。","authors":"Varada Kolhatkar,&nbsp;Hanhan Wu,&nbsp;Luca Cavasso,&nbsp;Emilie Francis,&nbsp;Kavan Shukla,&nbsp;Maite Taboada","doi":"10.1007/s41701-019-00065-w","DOIUrl":null,"url":null,"abstract":"<p><p>We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper <i>The Globe and Mail</i> in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.</p>","PeriodicalId":52343,"journal":{"name":"Corpus Pragmatics","volume":null,"pages":null},"PeriodicalIF":1.3000,"publicationDate":"2020-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://sci-hub-pdf.com/10.1007/s41701-019-00065-w","citationCount":"4","resultStr":"{\"title\":\"The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.\",\"authors\":\"Varada Kolhatkar,&nbsp;Hanhan Wu,&nbsp;Luca Cavasso,&nbsp;Emilie Francis,&nbsp;Kavan Shukla,&nbsp;Maite Taboada\",\"doi\":\"10.1007/s41701-019-00065-w\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper <i>The Globe and Mail</i> in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.</p>\",\"PeriodicalId\":52343,\"journal\":{\"name\":\"Corpus Pragmatics\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2020-01-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://sci-hub-pdf.com/10.1007/s41701-019-00065-w\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Corpus Pragmatics\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1007/s41701-019-00065-w\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2019/11/2 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"N/A\",\"JCRName\":\"LANGUAGE & LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Corpus Pragmatics","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s41701-019-00065-w","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2019/11/2 0:00:00","PubModel":"Epub","JCR":"N/A","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}
引用次数: 4

摘要

我们提出了SFU意见和评论语料库(SOCC),这是一个意见文章和对这些文章作出回应的评论的集合。这些文章包括加拿大报纸《环球邮报》在2012年至2016年的5年间发表的所有评论文章,共计10339篇文章和663173条评论。SOCC是一个研究网络评论语言特征的项目的一部分。语料库可以用来研究许多语用现象。在其他方面,研究者可以探索:文章和评论之间的联系;评论之间的联系;评论中讨论的主题类型;评论者相互回应的友好(建设性的)或刻薄(有害的)方式;语言如何被用来传达非常具体的评价类型;以及否定如何影响话语中评价意义的解释。我们目前的重点是研究评论中的建设性和评价。为此,我们用四层注释注释了大语料库的一个子集(1043条注释):建设性、毒性、否定和评价(Martin and White, the language of evaluation, Palgrave, New York, 2005)。本文详细介绍了我们的语料库、数据收集过程、语料库的特点,并对标注进行了描述。虽然我们关注的是对观点新闻文章的评论,但这个语料库中的现象很可能出现在许多评论平台上:其他新闻评论、Reddit论坛上的评论和回复、博客上的反馈或YouTube评论。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.

The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments.

We present the SFU Opinion and Comments Corpus (SOCC ), a collection of opinion articles and the comments posted in response to the articles. The articles include all the opinion pieces published in the Canadian newspaper The Globe and Mail in the 5-year period between 2012 and 2016, a total of 10,339 articles and 663,173 comments. SOCC is part of a project that investigates the linguistic characteristics of online comments. The corpus can be used to study a host of pragmatic phenomena. Among other aspects, researchers can explore: the connections between articles and comments; the connections of comments to each other; the types of topics discussed in comments; the nice (constructive) or mean (toxic) ways in which commenters respond to each other; how language is used to convey very specific types of evaluation; and how negation affects the interpretation of evaluative meaning in discourse. Our current focus is the study of constructiveness and evaluation in the comments. To that end, we have annotated a subset of the large corpus (1043 comments) with four layers of annotations: constructiveness, toxicity, negation and Appraisal (Martin and White, The language of evaluation, Palgrave, New York, 2005). This paper details our corpus, the data collection process, the characteristics of the corpus and describes the annotations. While our focus is comments posted in response to opinion news articles, the phenomena in this corpus are likely to be present in many commenting platforms: other news comments, comments and replies in fora such as Reddit, feedback on blogs, or YouTube comments.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Corpus Pragmatics
Corpus Pragmatics Arts and Humanities-Language and Linguistics
CiteScore
2.60
自引率
0.00%
发文量
15
期刊介绍: Corpus Pragmatics offers a forum for theoretical and applied linguists who carry out research in the new linguistic discipline that stands at the interface between corpus linguistics and pragmatics. The journal promotes the combination of the two approaches through research on new topics in linguistics, with a particular focus on interdisciplinary studies, and to enlarge and implement current pragmatic theories that have hitherto not benefited from empirical corpus support. Authors are encouraged to describe the statistical analyses used in their research and to supply the data and scripts in R when possible. The objective of Corpus Pragmatics is to develop pragmatics with the aid of quantitative corpus methodology. The journal accepts original research papers, short research notes, and occasional thematic issues. The journal follows a double-blind peer review system.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信