生成用于数据转换为文本的自然语言-葡萄牙语药物助理的应用

IF 0.1 Q4 LINGUISTICS

Linguamatica Pub Date : 2015-07-31 DOI:10.21814/LM.7.1.206

J. C. Pereira, A. Teixeira

{"title":"生成用于数据转换为文本的自然语言-葡萄牙语药物助理的应用","authors":"J. C. Pereira, A. Teixeira","doi":"10.21814/LM.7.1.206","DOIUrl":null,"url":null,"abstract":"New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability. Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus. For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.","PeriodicalId":41819,"journal":{"name":"Linguamatica","volume":"7 1","pages":"3-21"},"PeriodicalIF":0.1000,"publicationDate":"2015-07-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"1","resultStr":"{\"title\":\"Geração de Linguagem Natural para Conversão de Dados em Texto - Aplicação a um Assistente de Medicação para o Português\",\"authors\":\"J. C. Pereira, A. Teixeira\",\"doi\":\"10.21814/LM.7.1.206\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability. Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus. For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.\",\"PeriodicalId\":41819,\"journal\":{\"name\":\"Linguamatica\",\"volume\":\"7 1\",\"pages\":\"3-21\"},\"PeriodicalIF\":0.1000,\"publicationDate\":\"2015-07-31\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"1\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Linguamatica\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21814/LM.7.1.206\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q4\",\"JCRName\":\"LINGUISTICS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Linguamatica","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21814/LM.7.1.206","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q4","JCRName":"LINGUISTICS","Score":null,"Total":0}

引用次数: 1

摘要

智能手机和平板电脑等新设备正在改变人机交互。这些设备带来了一些挑战，特别是由于它们的小屏幕和键盘。为了在多模态交互中使用文本和语音，必须部署模块将应用程序的内部信息翻译成句子或文本，以便在屏幕上显示或合成。此外，这些模块必须生成用户母语的短语和文本;发展不应需要大量资源;生成的结果应该达到良好的可变性程度。我们的主要目标是提出、实施和评估一种将数据转换为葡萄牙语的方法，这种方法可以用最少的时间和知识开发，但不会影响生成的数据的必要可变性和质量。为药物助理开发的系统旨在用自然语言创建要服用的药物的描述。受最近研究结果的启发，我们选择了一种基于机器翻译的方法，并在一个小的并行语料库上训练模型。为此，我们创建了一个新的语料库。有了它，系统的两个变体被训练:基于短语的翻译和基于语法的翻译。这两种变体是通过自动测量(BLEU和Meteor)和人工进行评估的。结果表明，基于短语的方法比基于语法的方法产生更好的结果:人类评估者将60%的基于短语的回答评为好或非常好，而基于语法的回答只有46%。考虑到语料库的大小，我们认为这个值(60%)是好的。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Geração de Linguagem Natural para Conversão de Dados em Texto - Aplicação a um Assistente de Medicação para o Português

New equipments, such as smartphones and tablets, are changing human computer interaction. These devices present several challenges, especially due to their small screen and keyboard. In order to use text and voice in multimodal interaction, it is essential to deploy modules to translate the internal information of the applications into sentences or texts, in order to display it on screen or synthesize it. Also, these modules must generate phrases and texts in the user's native language; the development should not require considerable resources; and the outcome of the generation should achieve a good degree of variability. Our main objective is to propose, implement and evaluate a method of data conversion to Portuguese which can be developed with a minimum of time and knowledge, but without compromising the necessary variability and quality of what is generated. The developed system, for a Medication Assistant, is intended to create descriptions, in natural language, of medication to be taken. Motivated by recent results, we opted for an approach based on machine translation, with models trained on a small parallel corpus. For that, a new corpus was created. With it, two variants of the system were trained: phrase-based translation and syntax-based translation. The two variants were evaluated by automatic measurements -- BLEU and Meteor -- and by humans. The results showed that a phrase-based approach produced better results than a syntax-based one: human evaluators evaluated 60% of phrase-based responses as good, or very good, compared to only 46% of syntax-based responses. Considering the corpus size, we judge this value (60%) as good.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Linguamatica LINGUISTICS-

CiteScore

1.40

自引率

0.00%

发文量

审稿时长

6 weeks