基于序列演化文献集合的全文检索系统评价

Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi
{"title":"基于序列演化文献集合的全文检索系统评价","authors":"Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi","doi":"10.1145/3133811.3133817","DOIUrl":null,"url":null,"abstract":"Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.","PeriodicalId":403248,"journal":{"name":"Proceedings of the 3rd International Conference on Industrial and Business Engineering","volume":"60 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2017-08-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents\",\"authors\":\"Hwan-Gue Cho, H. Tak, Han-Ho Kim, Yeoneo Kim, Yongju Shin, Chulsu Lim, Kwangnam Choi\",\"doi\":\"10.1145/3133811.3133817\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.\",\"PeriodicalId\":403248,\"journal\":{\"name\":\"Proceedings of the 3rd International Conference on Industrial and Business Engineering\",\"volume\":\"60 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2017-08-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 3rd International Conference on Industrial and Business Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3133811.3133817\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 3rd International Conference on Industrial and Business Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3133811.3133817","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

在大型文档数据库中寻找与指定查询文档相似的文档是大数据时代的重要问题之一,因为大多数可用数据都是以非结构化文本的形式存在的。我们的测试集由两部分组成:第一部分文本是通过线性流水线程序通过人工抄袭的方法由人类工作产生的。在第二部分中,由软件生成文本,该软件插入、删除和替换目标文档的某些部分,以从输入文档生成类似的文档。这些文档集被称为连续演化文档(SED)。我们提出了新的方法:Order Preserving Precision (OPP)和Order Preserving Recall (OPR)来计算从主题IR系统获得的输出文档之间如何保持进化顺序。使用这些测试文本,我们评估了韩国文档检索系统KONAN。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Evaluation of Full-Text Retrieval System Using Collection of Serially Evolved Documents
Finding a document that is similar to a specified query document within a large document database is one of important issues in the Big Data era, as most data available is in the form of unstructured texts. Our testing collection consists of two parts: In the first part texts were produced by human work by artificial plagiarism approach through the linear pipelined procedure. In the second part, texts are generated by software that inserts, deletes, and substitutes certain parts of the target documents to make a similar document from an input document. These document set is known as the Serially Evolved Documents (SED). We propose new methods: Order Preserving Precision (OPP) and Order Preserving Recall (OPR), to compute how the evolutionary order is kept among output documents obtained from the subject IR system. Using those testing texts we evaluated KONAN, a document retrieval system for Korean documents.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信