Pasokh: A standard corpus for the evaluation of Persian text summarizers

ICCKE 2013 Pub Date : 2013-12-16 DOI:10.1109/ICCKE.2013.6682873

Behdad Behmadi Moghaddas, M. Kahani, Seyyed Ahmad Toosi, Asef Pourmasoumi, Ahmad Estiri

引用次数: 16

Abstract

The increasingly vast amount of information, particularly on the Web, has resulted in a profound need for automatic summarization systems. The systems, in turn, need to be evaluated in terms of how desirably they can retrieve information. The evaluation is done by comparing the machine summaries against a standard reference corpus containing a reasonably large number of text sources and the summaries that human beings have made out of them. Due to the lack of such a standard corpus for Persian, the summarizers that were developed used to be evaluated against the small corpora constructed by the developers of the proposed systems. This made the systems non-comparable. Thus, Pasokh was constructed as a standard large enough reference corpus. It took over 2000 man-hours of work.

查看原文本刊更多论文

一个标准语料库，用于评估波斯语文本摘要器

越来越多的信息，特别是在Web上的信息，导致了对自动摘要系统的深刻需求。反过来，需要根据系统检索信息的理想程度来评估系统。评估是通过将机器摘要与标准参考语料库进行比较来完成的，标准参考语料库包含相当多的文本来源和人类从中做出的摘要。由于缺乏这样一个标准的波斯语语料库，所开发的摘要器过去常常与所提议系统的开发人员构建的小型语料库进行评估。这使得系统无法比较。因此，Pasokh被构建为一个标准的足够大的参考语料库。它花费了2000多个工时。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

ICCKE 2013

自引率

0.00%

发文量