Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces

IF 1.7 3区 计算机科学 Q3 COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS
Pablo Faria, Charlotte Galves, Catarina Magro
{"title":"Syntactic annotation for Portuguese corpora: standards, parsers, and search interfaces","authors":"Pablo Faria, Charlotte Galves, Catarina Magro","doi":"10.1007/s10579-023-09699-4","DOIUrl":null,"url":null,"abstract":"<p>In the last two decades, four Portuguese syntactically annotated corpora were built along the lines initially defined for the <i>Penn Parsed Historical Corpora</i> (Santorini, 2016). They cover the old, the middle, the classical and the modern periods of European Portuguese, as well as the nineteenth and twentieth century Brazilian Portuguese, and include different textual genres and oral discourse excerpts. Together they provide a fundamental resource for the study of variation and change in Portuguese. In the last years, an effort was made to maximally unify the annotation scheme applied to those corpora, in such a way that the searches done on one corpus could be done in exactly the same manner on the others. This effort resulted in the Portuguese Syntactic Annotation Manual (Magro &amp; Galves, 2019). In this paper, we present the syntactic annotation for the Portuguese Corpora. We describe the functioning of ParsPort, a rule-based parser which makes use of the revision mode of the query language Corpus Search (Randall, 2005–2015). We argue that ParsPort is more efficient to our annotation efforts than the probabilistic parser developed by Bikel (2004), previously used for the syntactic annotation of the Portuguese Corpora. Finally we mention recent advances towards more user-friendly tools for syntactic searches.</p>","PeriodicalId":49927,"journal":{"name":"Language Resources and Evaluation","volume":"5 1","pages":""},"PeriodicalIF":1.7000,"publicationDate":"2023-12-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Language Resources and Evaluation","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1007/s10579-023-09699-4","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INTERDISCIPLINARY APPLICATIONS","Score":null,"Total":0}
引用次数: 0

Abstract

In the last two decades, four Portuguese syntactically annotated corpora were built along the lines initially defined for the Penn Parsed Historical Corpora (Santorini, 2016). They cover the old, the middle, the classical and the modern periods of European Portuguese, as well as the nineteenth and twentieth century Brazilian Portuguese, and include different textual genres and oral discourse excerpts. Together they provide a fundamental resource for the study of variation and change in Portuguese. In the last years, an effort was made to maximally unify the annotation scheme applied to those corpora, in such a way that the searches done on one corpus could be done in exactly the same manner on the others. This effort resulted in the Portuguese Syntactic Annotation Manual (Magro & Galves, 2019). In this paper, we present the syntactic annotation for the Portuguese Corpora. We describe the functioning of ParsPort, a rule-based parser which makes use of the revision mode of the query language Corpus Search (Randall, 2005–2015). We argue that ParsPort is more efficient to our annotation efforts than the probabilistic parser developed by Bikel (2004), previously used for the syntactic annotation of the Portuguese Corpora. Finally we mention recent advances towards more user-friendly tools for syntactic searches.

Abstract Image

葡萄牙语语法注释:标准、解析器和搜索界面
在过去二十年中,按照最初为宾夕法尼亚大学历史语料库(Penn Parsed Historical Corpora)定义的思路,建立了四个葡萄牙语句法注释语料库(Santorini,2016 年)。它们涵盖了欧洲葡萄牙语的古、中、古典和现代时期,以及十九世纪和二十世纪的巴西葡萄牙语,并包括不同的文本流派和口头话语摘录。它们共同为研究葡萄牙语的变异和变化提供了基本资源。在过去几年中,我们努力最大限度地统一这些语料库的注释方案,以便在一个语料库上进行的检索可以在其他语料库上以完全相同的方式进行。这一努力的成果就是《葡萄牙语句法注释手册》(Magro & Galves, 2019)。在本文中,我们将介绍葡萄牙语语法注释。我们介绍了 ParsPort 的功能,这是一种基于规则的解析器,利用了查询语言语料库搜索(Corpus Search)的修订模式(Randall,2005-2015 年)。我们认为,与 Bikel(2004 年)开发的概率分析器相比,ParsPort 对我们的注释工作更有效率。最后,我们将提到最近在开发更方便用户的句法搜索工具方面取得的进展。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Language Resources and Evaluation
Language Resources and Evaluation 工程技术-计算机:跨学科应用
CiteScore
6.50
自引率
3.70%
发文量
55
审稿时长
>12 weeks
期刊介绍: Language Resources and Evaluation is the first publication devoted to the acquisition, creation, annotation, and use of language resources, together with methods for evaluation of resources, technologies, and applications. Language resources include language data and descriptions in machine readable form used to assist and augment language processing applications, such as written or spoken corpora and lexica, multimodal resources, grammars, terminology or domain specific databases and dictionaries, ontologies, multimedia databases, etc., as well as basic software tools for their acquisition, preparation, annotation, management, customization, and use. Evaluation of language resources concerns assessing the state-of-the-art for a given technology, comparing different approaches to a given problem, assessing the availability of resources and technologies for a given application, benchmarking, and assessing system usability and user satisfaction.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信