Building a Chinese AMR Bank with Concept and Relation Alignments

Bin Li, Y. Wen, Li Song, Weiguang Qu, Nianwen Xue
{"title":"Building a Chinese AMR Bank with Concept and Relation Alignments","authors":"Bin Li, Y. Wen, Li Song, Weiguang Qu, Nianwen Xue","doi":"10.33011/lilt.v18i.1429","DOIUrl":null,"url":null,"abstract":"Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.","PeriodicalId":218122,"journal":{"name":"Linguistic Issues in Language Technology","volume":"43 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguistic Issues in Language Technology","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.33011/lilt.v18i.1429","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 11

Abstract

Abstract Meaning Representation (AMR) is a meaning representation framework in which the meaning of a full sentence is represented as a single-rooted, acyclic, directed graph. In this article, we describe an on-going project to build a Chinese AMR (CAMR) corpus, which currently includes 10,149 sentences from the newsgroup and weblog portion of the Chinese TreeBank (CTB). We describe the annotation specifications for the CAMR corpus, which follow the annotation principles of English AMR but make adaptations where needed to accommodate the linguistic facts of Chinese. The CAMR specifications also include a systematic treatment of sentence-internal discourse relations. One significant change we have made to the AMR annotation methodology is the inclusion of the alignment between word tokens in the sentence and the concepts/relations in the CAMR annotation to make it easier for automatic parsers to model the correspondence between a sentence and its meaning representation. We develop an annotation tool for CAMR, and the inter-agreement as measured by the Smatch score between the two annotators is 0.83, indicating reliable annotation. We also present some quantitative analysis of the CAMR corpus. 46.71% of the AMRs of the sentences are non-tree graphs. Moreover, the AMR of 88.95% of the sentences has concepts inferred from the context of the sentence but do not correspond to a specific word.
从概念和关系的角度构建中国的AMR银行
抽象意义表示(AMR)是一种意义表示框架,它将完整句子的意义表示为单根、无环、有向图。在本文中,我们描述了一个正在进行的项目,以建立一个中文AMR (CAMR)语料库,该语料库目前包括来自中文树库(CTB)新闻组和博客部分的10,149个句子。我们描述了CAMR语料库的标注规范,该规范遵循英语AMR的标注原则,但在需要的地方进行了调整,以适应汉语的语言事实。CAMR规范还包括对句子内部语篇关系的系统处理。我们对AMR注释方法所做的一个重要更改是包含句子中的单词标记与CAMR注释中的概念/关系之间的对齐,从而使自动解析器更容易对句子及其意义表示之间的对应关系进行建模。我们开发了CAMR标注工具,两个标注器的Smatch评分的一致性为0.83,表明标注是可靠的。我们还对CAMR语料库进行了定量分析。46.71%的句子amr是非树状图。此外,88.95%的句子的AMR包含从句子上下文推断出来的概念,但不对应于特定的单词。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信