{"title":"评估web上的文本重用发现","authors":"Stanford Chiu, Ibrahim Uysal, W. Bruce Croft","doi":"10.1145/1840784.1840829","DOIUrl":null,"url":null,"abstract":"Text reuse detection aims to identify duplicates, reformulations or partial rewrites of a given text. Some previous research has focused on determining text reuse instances accurately on local corpora. However, the practical usage of finding text reuse on the web has remained largely untested. In this work, we 1) introduce a novel text reuse searching interface for the web, based on a previously proposed architecture, 2) evaluate its feasibility, and 3) investigate techniques to improve both effectiveness and efficiency. Our results show that exhaustive query submission using n-grams can dramatically reduce the execution time with only small losses in accuracy.","PeriodicalId":413481,"journal":{"name":"International Conference on Information Interaction in Context","volume":"1 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2010-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"11","resultStr":"{\"title\":\"Evaluating text reuse discovery on the web\",\"authors\":\"Stanford Chiu, Ibrahim Uysal, W. Bruce Croft\",\"doi\":\"10.1145/1840784.1840829\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Text reuse detection aims to identify duplicates, reformulations or partial rewrites of a given text. Some previous research has focused on determining text reuse instances accurately on local corpora. However, the practical usage of finding text reuse on the web has remained largely untested. In this work, we 1) introduce a novel text reuse searching interface for the web, based on a previously proposed architecture, 2) evaluate its feasibility, and 3) investigate techniques to improve both effectiveness and efficiency. Our results show that exhaustive query submission using n-grams can dramatically reduce the execution time with only small losses in accuracy.\",\"PeriodicalId\":413481,\"journal\":{\"name\":\"International Conference on Information Interaction in Context\",\"volume\":\"1 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2010-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"11\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"International Conference on Information Interaction in Context\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/1840784.1840829\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Conference on Information Interaction in Context","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/1840784.1840829","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Text reuse detection aims to identify duplicates, reformulations or partial rewrites of a given text. Some previous research has focused on determining text reuse instances accurately on local corpora. However, the practical usage of finding text reuse on the web has remained largely untested. In this work, we 1) introduce a novel text reuse searching interface for the web, based on a previously proposed architecture, 2) evaluate its feasibility, and 3) investigate techniques to improve both effectiveness and efficiency. Our results show that exhaustive query submission using n-grams can dramatically reduce the execution time with only small losses in accuracy.