Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Proceedings of the 3rd International Conference on Industrial and Business Engineering Pub Date : 2017-08-17 DOI:10.1145/3133811.3133817

Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi

引用次数: 0

Abstract

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.

查看原文本刊更多论文

基于序列演化文献集合的全文检索系统评价

在大型文档数据库中寻找与指定查询文档相似的文档是大数据时代的重要问题之一，因为大多数可用数据都是以非结构化文本的形式存在的。我们的测试集由两部分组成:第一部分文本是通过线性流水线程序通过人工抄袭的方法由人类工作产生的。在第二部分中，由软件生成文本，该软件插入、删除和替换目标文档的某些部分，以从输入文档生成类似的文档。这些文档集被称为连续演化文档(SED)。我们提出了新的方法:Order Preserving Precision (OPP)和Order Preserving Recall (OPR)来计算从主题IR系统获得的输出文档之间如何保持进化顺序。使用这些测试文本，我们评估了韩国文档检索系统KONAN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Proceedings of the 3rd International Conference on Industrial and Business Engineering

自引率

0.00%

发文量