Uma gramática computacional de um fragmento do nheengatu / A computational grammar for a fragment of Nheengatu

IF 0.2 0 LANGUAGE & LINGUISTICS

Revista de Estudos da Linguagem Pub Date : 2021-04-08 DOI:10.17851/2237-2083.29.3.1717-1777

L. F. D. Alencar

{"title":"Uma gramática computacional de um fragmento do nheengatu / A computational grammar for a fragment of Nheengatu","authors":"L. F. D. Alencar","doi":"10.17851/2237-2083.29.3.1717-1777","DOIUrl":null,"url":null,"abstract":"Resumo: A disponibilidade de recursos para o processamento computacional constitui um dos fatores de sobrevivencia de uma lingua. O objetivo deste trabalho foi implementar um fragmento do nheengatu no formalismo Grammatical Framework , especialmente projetado para o desenvolvimento de aplicacoes multilingues. Outrora mais falado que o portugues na Amazonia, o nheengatu esta ameacado de extincao, embora ainda conte com estimados 14000 falantes. O fragmento restringe-se a oracoes que expressam estados contingentes e nao-contingentes, mas inclui fenomenos gramaticais estruturalmente complexos tipicos da familia tupi-guarani, os quais contrastam fortemente com as construcoes equivalentes em portugues e ingles. Constitui um dos modulos da GrammYEP, uma gramatica computacional multilingue que integra modulos analogos do ingles e do portugues. A implementacao tomou como ponto de partida as descricoes gramaticais nao formalizadas de Navarro (2011) e Cruz (2011). A formalizacao revelou lacunas e inconsistencias nessas abordagens, em parte sanados por meio de uma reanalise dos dados. A GrammYEP alcancou resultados bastantes satisfatorios na traducao do e para o nheengatu. Traduziu para o portugues e o ingles a totalidade de um conjunto-teste de 142 sentencas dessa lingua. Inversamente, verteu para o nheengatu 98,18% e 84,11% dos conjuntos-teste correspondentes em portugues e ingles. Por outro lado, analisou apenas dois exemplos de um conjunto-teste negativo com 171 construcoes agramaticais em nheengatu. Desta avaliacao resultou um treebank com 243 sentencas do nheengatu, emparelhadas com as sentencas equivalentes em portugues e ingles. Palavras-chave: lingua geral amazonica (LGA); tupi moderno; predicacao qualificativa; construcao possessiva; traducao automatica; linguistica computacional; processamento de linguagem natural. Abstract: The availability of resources for computational processing is one of the survival factors of a language. The goal of this work was to implement a fragment of Nheengatu in the Grammatical Framework formalism, specially designed for the development of multilingual applications. Once more widely spoken than Portuguese in the Amazon region, Nheengatu is threatened with extinction, although it still has an estimated number of 14,000 speakers. The fragment is restricted to sentences that express contingent and non-contingent states, but includes structurally complex grammatical phenomena typical of the Tupi-Guarani family, which strongly contrast with the equivalent constructions in Portuguese and English. It constitutes one of the modules of GrammYEP, a multilingual computational grammar comprising equivalent English and Portuguese modules. The starting point of the implementation was the non-formalized grammatical descriptions of Navarro (2011) and Cruz (2011). The formalization revealed gaps and inconsistencies in these approaches, which were partly remedied through a reanalysis of the data. GrammYEP achieved quite satisfactory results in the translation from and to Nheengatu. It translated into Portuguese and English all examples from a test set with 142 Nheengatu sentences. Conversely, 98.18% and 84.11% of the corresponding Portuguese and English test sets were rendered into Nheengatu. On the other hand, it parsed only two examples from a negative test set with 171 ungrammatical constructions in Nheengatu. This evaluation resulted in a treebank with 243 Nheengatu sentences, paired with the equivalent sentences in Portuguese and English. Keywords: Amazonian Lingua Franca; Modern Tupi; qualifying predication; possessive construction; machine translation; computational linguistics; natural language processing.","PeriodicalId":42188,"journal":{"name":"Revista de Estudos da Linguagem","volume":"29 1","pages":"1717-1777"},"PeriodicalIF":0.2000,"publicationDate":"2021-04-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Revista de Estudos da Linguagem","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.17851/2237-2083.29.3.1717-1777","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"0","JCRName":"LANGUAGE & LINGUISTICS","Score":null,"Total":0}

引用次数: 2

Abstract

Resumo: A disponibilidade de recursos para o processamento computacional constitui um dos fatores de sobrevivencia de uma lingua. O objetivo deste trabalho foi implementar um fragmento do nheengatu no formalismo Grammatical Framework , especialmente projetado para o desenvolvimento de aplicacoes multilingues. Outrora mais falado que o portugues na Amazonia, o nheengatu esta ameacado de extincao, embora ainda conte com estimados 14000 falantes. O fragmento restringe-se a oracoes que expressam estados contingentes e nao-contingentes, mas inclui fenomenos gramaticais estruturalmente complexos tipicos da familia tupi-guarani, os quais contrastam fortemente com as construcoes equivalentes em portugues e ingles. Constitui um dos modulos da GrammYEP, uma gramatica computacional multilingue que integra modulos analogos do ingles e do portugues. A implementacao tomou como ponto de partida as descricoes gramaticais nao formalizadas de Navarro (2011) e Cruz (2011). A formalizacao revelou lacunas e inconsistencias nessas abordagens, em parte sanados por meio de uma reanalise dos dados. A GrammYEP alcancou resultados bastantes satisfatorios na traducao do e para o nheengatu. Traduziu para o portugues e o ingles a totalidade de um conjunto-teste de 142 sentencas dessa lingua. Inversamente, verteu para o nheengatu 98,18% e 84,11% dos conjuntos-teste correspondentes em portugues e ingles. Por outro lado, analisou apenas dois exemplos de um conjunto-teste negativo com 171 construcoes agramaticais em nheengatu. Desta avaliacao resultou um treebank com 243 sentencas do nheengatu, emparelhadas com as sentencas equivalentes em portugues e ingles. Palavras-chave: lingua geral amazonica (LGA); tupi moderno; predicacao qualificativa; construcao possessiva; traducao automatica; linguistica computacional; processamento de linguagem natural. Abstract: The availability of resources for computational processing is one of the survival factors of a language. The goal of this work was to implement a fragment of Nheengatu in the Grammatical Framework formalism, specially designed for the development of multilingual applications. Once more widely spoken than Portuguese in the Amazon region, Nheengatu is threatened with extinction, although it still has an estimated number of 14,000 speakers. The fragment is restricted to sentences that express contingent and non-contingent states, but includes structurally complex grammatical phenomena typical of the Tupi-Guarani family, which strongly contrast with the equivalent constructions in Portuguese and English. It constitutes one of the modules of GrammYEP, a multilingual computational grammar comprising equivalent English and Portuguese modules. The starting point of the implementation was the non-formalized grammatical descriptions of Navarro (2011) and Cruz (2011). The formalization revealed gaps and inconsistencies in these approaches, which were partly remedied through a reanalysis of the data. GrammYEP achieved quite satisfactory results in the translation from and to Nheengatu. It translated into Portuguese and English all examples from a test set with 142 Nheengatu sentences. Conversely, 98.18% and 84.11% of the corresponding Portuguese and English test sets were rendered into Nheengatu. On the other hand, it parsed only two examples from a negative test set with 171 ungrammatical constructions in Nheengatu. This evaluation resulted in a treebank with 243 Nheengatu sentences, paired with the equivalent sentences in Portuguese and English. Keywords: Amazonian Lingua Franca; Modern Tupi; qualifying predication; possessive construction; machine translation; computational linguistics; natural language processing.

查看原文本刊更多论文

Nheengatu片段的计算语法

摘要：计算处理资源的可用性是语言生存的因素之一。这项工作的目的是在语法框架形式中实现nhengatu的一个片段，专门为开发多语言应用程序而设计。在亚马逊地区，恩亨加图语的使用量一度超过葡萄牙语，但它正面临灭绝的威胁，尽管它仍有大约14000个使用者。该片段仅限于表达偶然和非偶然状态的祈祷词，但包括图皮-瓜拉尼家族典型的结构复杂的语法现象，这与葡萄牙语和英语中的等效结构形成了强烈对比。它是GrammYEP的模块之一，是一种多语言计算语法，集成了模拟英语和葡萄牙语模块。实施以非形式化语法描述符Navarro（2011）和Cruz（2011）为起点。形式化揭示了这些方法中的差距和不一致性，通过重新分析数据部分弥补了这一点。GrammYEP在《恩亨加图》和《恩亨加图》的翻译中取得了令人满意的成果。他将该语言142个句子的测试集全部翻译成葡萄牙语和英语。相反，98.18%和84.11%的葡萄牙语和英语对应测试集被倒入了恩亨加图。另一方面，它只分析了两个带有171个语法结构的否定测试集的例子。这项评估产生了一个包含243个nhengatu句子的树库，以及葡萄牙语和英语的等价句子。关键词：番石榴；现代tupi；限定预测；所有格结构；traducao automatica；计算语言学；自然语言处理。摘要：计算处理资源的可用性是语言生存的因素之一。这项工作的目标是在语法框架形式中实现Nheengatu的一个片段，专门为开发多语言应用程序而设计。在亚马逊地区，Nheengatu曾经比葡萄牙语更广泛地使用，但它正面临灭绝的威胁，尽管据估计它仍有14000个使用者。该片段仅限于表达偶然和非偶然状态的句子，但包括Tupi Guarani家族典型的结构复杂的语法现象，这与葡萄牙语和英语中的等价结构形成了强烈对比。它构成了GrammYEP的模块之一，GrammYEP是一种多语言计算语法，包括等效的英语和葡萄牙语模块。实施的起点是Navarro（2011）和Cruz（2011）的非形式化语法描述。形式化揭示了这些方法中的差距和不一致之处，通过重新分析数据部分弥补了这些差距。GrammYEP在从恩加图到恩加图的翻译中取得了令人满意的结果。它将142个Nheengatu句子的测试集中的所有例子翻译成葡萄牙语和英语。相反，98.18%和84.11%的相应葡萄牙语和英语测试集被呈现为Nheengatu。另一方面，它只分析了Nheengatu中171个非语法结构的阴性测试集中的两个例子。这项评估产生了一个包含243个Nheengatu句子的树库，以及葡萄牙语和英语的等价句子。关键词：亚马逊语；现代图皮；合格预测；所有格结构；机器翻译；计算语言；自然语言处理。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊