利用图算法加速大型押韵语料库的标注

Q2 Arts and Humanities
Julien Baley
{"title":"利用图算法加速大型押韵语料库的标注","authors":"Julien Baley","doi":"10.1163/19606028-bja10019","DOIUrl":null,"url":null,"abstract":"\n Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.","PeriodicalId":35117,"journal":{"name":"Cahiers de Linguistique Asie Orientale","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2022-03-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Leveraging graph algorithms to speed up the annotation of large rhymed corpora\",\"authors\":\"Julien Baley\",\"doi\":\"10.1163/19606028-bja10019\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"\\n Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.\",\"PeriodicalId\":35117,\"journal\":{\"name\":\"Cahiers de Linguistique Asie Orientale\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-03-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Cahiers de Linguistique Asie Orientale\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1163/19606028-bja10019\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"Arts and Humanities\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Cahiers de Linguistique Asie Orientale","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1163/19606028-bja10019","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"Arts and Humanities","Score":null,"Total":0}
引用次数: 1

摘要

押韵模式在汉语早期语音重建中起着至关重要的作用。在过去的几年里,出现了使用图来建模押韵模式的现象,特别是List(2016)提出的使用图群落检测来超越链接绑定方法的限制,并测试有关语音重建的新假设。List的方法需要有一个押韵的语料库;这样的社团很少见,而且制作成本高得令人望而却步。本文通过引入几种自动注释策略来解决这个问题。其中,主要贡献是使用图社区检测本身来构建自动注释器。这个注释者不需要先前的注释,也不需要音韵学知识,并通过学习不同时期的韵类来自动适应语料库。通过一系列的案例研究,我们证明了这种方法在快速准确地注释数十万首诗歌方面的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Leveraging graph algorithms to speed up the annotation of large rhymed corpora
Rhyming patterns play a crucial role in the phonological reconstruction of earlier stages of Chinese. The past few years have seen the emergence of the use of graphs to model rhyming patterns, notably with List’s (2016) proposal to use graph community detection as a way to go beyond the limits of the link-and-bind method and test new hypotheses regarding phonological reconstruction. List’s approach requires the existence of a rhyme-annotated corpus; such corpora are rare and prohibitively expensive to produce. The present paper solves this problem by introducing several strategies to automate annotation. Among others, the main contribution is the use of graph community detection itself to build an automatic annotator. This annotator requires no previous annotation, no knowledge of phonology, and automatically adapts to corpora of different periods by learning their rhyme categories. Through a series of case studies, we demonstrate the viability of the approach in quickly annotating hundreds of thousands of poems with high accuracy.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Cahiers de Linguistique Asie Orientale
Cahiers de Linguistique Asie Orientale Arts and Humanities-Language and Linguistics
CiteScore
0.90
自引率
0.00%
发文量
11
期刊介绍: The Cahiers is an international linguistics journal whose mission is to publish new and original research on the analysis of languages of the Asian region, be they descriptive or theoretical. This clearly reflects the broad research domain of our laboratory : the Centre for Linguistic Research on East Asian Languages (CRLAO). The journal was created in 1977 by Viviane Alleton and Alain Peyraube and has been directed by three successive teams of editors, all professors based at the CRLAO in Paris. An Editorial Board, composed of scholars from around the world, assists in the reviewing process and in a consultative role.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信