Uma Comparação Sistemática de Diferentes Abordagens para a Sumarização Automática Extrativa de Textos em Português

IF 0.3 Q4 LINGUISTICS

Linguamatica Pub Date : 2015-07-31 DOI:10.21814/LM.7.1.203

M. Costa, Bruno Martins

引用次数: 3

Abstract

Automatic document summarization is the task of automatically generating condensed versions of source texts, presenting itself as one of the fundamental problems in the areas of Information Retrieval and Natural Language Processing. In this paper, different extractive approaches are compared in the task of summarizing individual documents corresponding to journalistic texts written in Portuguese. Through the use of the ROUGE package for measuring the quality of the produced summaries, we report on results for two different experimental domains, involving (i) the generation of headlines for news articles written in European Portuguese, and (ii) the generation of summaries for news articles written in Brazilian Portuguese. The results demonstrate that methods based on the selection of the first sentences have the best results when building extractive news headlines in terms of several ROUGE metrics. Regarding the generation of summaries with more than one sentence, the method that achieved the best results was the LSA Squared algorithm, for the various ROUGE metrics.

查看原文本刊更多论文

葡萄牙语文本自动提取摘要不同方法的系统比较

自动文档摘要是自动生成源文本的压缩版本的任务，是信息检索和自然语言处理领域的基本问题之一。在本文中，不同的提取方法在总结与葡萄牙语新闻文本对应的单个文件的任务中进行了比较。通过使用ROUGE软件包来衡量生成摘要的质量，我们报告了两个不同实验领域的结果，包括(i)用欧洲葡萄牙语撰写的新闻文章的标题生成，以及(ii)用巴西葡萄牙语撰写的新闻文章的摘要生成。结果表明，基于首句选择的方法在构建提取新闻标题时，在几个ROUGE指标方面具有最佳效果。对于多句摘要的生成，对于各种ROUGE指标，取得最佳效果的方法是LSA Squared算法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊