Leveraging syntactic parsing to improve event annotation matching

Camiel Colruyt, Orphée De Clercq, Veronique Hoste
{"title":"Leveraging syntactic parsing to improve event annotation matching","authors":"Camiel Colruyt, Orphée De Clercq, Veronique Hoste","doi":"10.18653/v1/D19-5903","DOIUrl":null,"url":null,"abstract":"Detecting event mentions is the first step in event extraction from text and annotating them is a notoriously difficult task. Evaluating annotator consistency is crucial when building datasets for mention detection. When event mentions are allowed to cover many tokens, annotators may disagree on their span, which means that overlapping annotations may then refer to the same event or to different events. This paper explores different fuzzy-matching functions which aim to resolve this ambiguity. The functions extract the sets of syntactic heads present in the annotations, use the Dice coefficient to measure the similarity between sets and return a judgment based on a given threshold. The functions are tested against the judgment of a human evaluator and a comparison is made between sets of tokens and sets of syntactic heads. The best-performing function is a head-based function that is found to agree with the human evaluator in 89% of cases.","PeriodicalId":129206,"journal":{"name":"Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP","volume":"2 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the First Workshop on Aggregating and Analysing Crowdsourced Annotations for NLP","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.18653/v1/D19-5903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 1

Abstract

Detecting event mentions is the first step in event extraction from text and annotating them is a notoriously difficult task. Evaluating annotator consistency is crucial when building datasets for mention detection. When event mentions are allowed to cover many tokens, annotators may disagree on their span, which means that overlapping annotations may then refer to the same event or to different events. This paper explores different fuzzy-matching functions which aim to resolve this ambiguity. The functions extract the sets of syntactic heads present in the annotations, use the Dice coefficient to measure the similarity between sets and return a judgment based on a given threshold. The functions are tested against the judgment of a human evaluator and a comparison is made between sets of tokens and sets of syntactic heads. The best-performing function is a head-based function that is found to agree with the human evaluator in 89% of cases.
利用语法解析来改进事件注释匹配
检测事件提及是从文本中提取事件的第一步,而对事件进行注释是一项非常困难的任务。在构建用于提及检测的数据集时,评估注释器一致性至关重要。当事件提及被允许覆盖许多标记时,注释者可能对它们的范围不一致,这意味着重叠的注释可能会引用相同的事件或不同的事件。本文探讨了不同的模糊匹配函数来解决这种歧义。这些函数提取注解中出现的语法头的集合,使用Dice系数度量集合之间的相似性,并根据给定的阈值返回判断。根据人工评估器的判断对函数进行测试,并在标记集和语法头集之间进行比较。表现最好的函数是基于头部的函数,在89%的情况下,它与人类评估者一致。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信