Supporting the Annotation Experience Through CorEx and Word Mover's Distance

Stefania Pecore
{"title":"Supporting the Annotation Experience Through CorEx and Word Mover's Distance","authors":"Stefania Pecore","doi":"10.4230/OASIcs.LDK.2021.12","DOIUrl":null,"url":null,"abstract":"Online communities can be used to promote destructive behaviours, as in pro-Eating Disorder (ED) communities. Research needs annotated data to study these phenomena. Even though many platforms have already moderated this type of content, Twitter has not, and it can still be used for research purposes. In this paper, we unveiled emojis, words, and uncommon linguistic patterns within the ED Twitter community by using the Correlation Explanation (CorEx) algorithm on unstructured and non-annotated data to retrieve the topics. Then we annotated the dataset following these topics. We analysed then the use of CorEx and Word Mover’s Distance to retrieve automatically similar new sentences and augment the annotated dataset. 2012 ACM Subject Classification Applied computing → Document management and text processing; Applied computing → Annotation","PeriodicalId":377119,"journal":{"name":"International Conference on Language, Data, and Knowledge","volume":"13 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1900-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Language, Data, and Knowledge","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.4230/OASIcs.LDK.2021.12","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Online communities can be used to promote destructive behaviours, as in pro-Eating Disorder (ED) communities. Research needs annotated data to study these phenomena. Even though many platforms have already moderated this type of content, Twitter has not, and it can still be used for research purposes. In this paper, we unveiled emojis, words, and uncommon linguistic patterns within the ED Twitter community by using the Correlation Explanation (CorEx) algorithm on unstructured and non-annotated data to retrieve the topics. Then we annotated the dataset following these topics. We analysed then the use of CorEx and Word Mover’s Distance to retrieve automatically similar new sentences and augment the annotated dataset. 2012 ACM Subject Classification Applied computing → Document management and text processing; Applied computing → Annotation
通过CorEx和Word Mover's Distance支持注释体验
在线社区可以用来促进破坏性行为,比如支持饮食失调(ED)的社区。研究需要有注释的数据来研究这些现象。尽管许多平台已经对这类内容进行了审核,但Twitter还没有,它仍然可以用于研究目的。在本文中,我们通过对非结构化和无注释数据使用相关解释(CorEx)算法来检索主题,揭示了ED Twitter社区中的表情符号、单词和不常见的语言模式。然后我们根据这些主题对数据集进行注释。然后,我们分析了使用CorEx和Word Mover的距离来自动检索相似的新句子并增强注释数据集。2012 ACM学科分类应用计算→文档管理与文本处理;应用计算→标注
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信