基于序列演化文献集合的全文检索系统评价

Proceedings of the 3rd International Conference on Industrial and Business Engineering Pub Date : 2017-08-17 DOI:10.1145/3133811.3133817

Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi

{"title":"基于序列演化文献集合的全文检索系统评价","authors":"Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi","doi":"10.1145/3133811.3133817","DOIUrl":null,"url":null,"abstract":"Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.","PeriodicalId":403248,"journal":{"name":"Proceedings of the 3rd International Conference on Industrial and Business Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents\",\"authors\":\"Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi\",\"doi\":\"10.1145/3133811.3133817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.\",\"PeriodicalId\":403248,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Industrial and Business Engineering\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Industrial and Business Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3133811.3133817\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Industrial and Business Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3133811.3133817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

在大型文档数据库中寻找与指定查询文档相似的文档是大数据时代的重要问题之一，因为大多数可用数据都是以非结构化文本的形式存在的。我们的测试集由两部分组成:第一部分文本是通过线性流水线程序通过人工抄袭的方法由人类工作产生的。在第二部分中，由软件生成文本，该软件插入、删除和替换目标文档的某些部分，以从输入文档生成类似的文档。这些文档集被称为连续演化文档(SED)。我们提出了新的方法:Order Preserving Precision (OPP)和Order Preserving Recall (OPR)来计算从主题IR系统获得的输出文档之间如何保持进化顺序。使用这些测试文本，我们评估了韩国文档检索系统KONAN。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents

Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 3rd International Conference on Industrial and Business Engineering

自引率

0.00%

发文量