David A. Smith, Ryan Cordell, E. M. Dillon, Nicholas Stramp, J. Wilkerson
{"title":"检测和建模本地文本重用","authors":"David A. Smith, Ryan Cordell, E. M. Dillon, Nicholas Stramp, J. Wilkerson","doi":"10.5555/2740769.2740800","DOIUrl":null,"url":null,"abstract":"Texts propagate through many social networks and provide evidence for their structure. We describe and evaluate efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to two case studies: analyzing the culture of free reprinting in the nineteenth-century United States and the development of bills into legislation in the U.S. Congress. Using these divergent case studies, we evaluate both the efficiency of the approximate local text reuse detection methods and the accuracy of the results. These techniques allow us to explore how ideas spread, which ideas spread, and which subgroups shared ideas.","PeriodicalId":92278,"journal":{"name":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","volume":"19 1","pages":"183-192"},"PeriodicalIF":0.0000,"publicationDate":"2014-09-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"43","resultStr":"{\"title\":\"Detecting and modeling local text reuse\",\"authors\":\"David A. Smith, Ryan Cordell, E. M. Dillon, Nicholas Stramp, J. Wilkerson\",\"doi\":\"10.5555/2740769.2740800\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Texts propagate through many social networks and provide evidence for their structure. We describe and evaluate efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to two case studies: analyzing the culture of free reprinting in the nineteenth-century United States and the development of bills into legislation in the U.S. Congress. Using these divergent case studies, we evaluate both the efficiency of the approximate local text reuse detection methods and the accuracy of the results. These techniques allow us to explore how ideas spread, which ideas spread, and which subgroups shared ideas.\",\"PeriodicalId\":92278,\"journal\":{\"name\":\"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries\",\"volume\":\"19 1\",\"pages\":\"183-192\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2014-09-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"43\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.5555/2740769.2740800\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the ... ACM/IEEE Joint Conference on Digital Libraries. ACM/IEEE Joint Conference on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.5555/2740769.2740800","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Texts propagate through many social networks and provide evidence for their structure. We describe and evaluate efficient algorithms for detecting clusters of reused passages embedded within longer documents in large collections. We apply these techniques to two case studies: analyzing the culture of free reprinting in the nineteenth-century United States and the development of bills into legislation in the U.S. Congress. Using these divergent case studies, we evaluate both the efficiency of the approximate local text reuse detection methods and the accuracy of the results. These techniques allow us to explore how ideas spread, which ideas spread, and which subgroups shared ideas.