D. Luciv, D. Koznov, G. Chernishev, H. Basit, K. Romanovsky, A. Terekhov
{"title":"重复查找工具","authors":"D. Luciv, D. Koznov, G. Chernishev, H. Basit, K. Romanovsky, A. Terekhov","doi":"10.1145/3183440.3195081","DOIUrl":null,"url":null,"abstract":"Software documentation is a significant component of modern software systems. Each year it becomes more and more complicated, just as the software itself. One of the aspects that negatively impact documentation quality is the presence of textual duplicates. Textual duplicates encountered in software documentation are inherently imprecise, i.e. in a single document the same information may be presented many times with different levels of detail and in various contexts. Documentation maintenance is an acute problem, and there is a strong demand for automation tools in this domain. In this study we present the Duplicate Finder Toolkit, a tool which assists an expert with duplicate maintenance-related tasks. Our tool can facilitate the maintenance process in a number of ways: 1) detection of both exact and near duplicates 2) duplicate visualization via heat maps 3) duplicate analysis - comparison of several duplicate instances, evaluation of their differences, exploration of duplicate context 4) duplicate manipulation and extraction.","PeriodicalId":121436,"journal":{"name":"Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings","volume":"10 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2018-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Duplicate finder toolkit\",\"authors\":\"D. Luciv, D. Koznov, G. Chernishev, H. Basit, K. Romanovsky, A. Terekhov\",\"doi\":\"10.1145/3183440.3195081\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Software documentation is a significant component of modern software systems. Each year it becomes more and more complicated, just as the software itself. One of the aspects that negatively impact documentation quality is the presence of textual duplicates. Textual duplicates encountered in software documentation are inherently imprecise, i.e. in a single document the same information may be presented many times with different levels of detail and in various contexts. Documentation maintenance is an acute problem, and there is a strong demand for automation tools in this domain. In this study we present the Duplicate Finder Toolkit, a tool which assists an expert with duplicate maintenance-related tasks. Our tool can facilitate the maintenance process in a number of ways: 1) detection of both exact and near duplicates 2) duplicate visualization via heat maps 3) duplicate analysis - comparison of several duplicate instances, evaluation of their differences, exploration of duplicate context 4) duplicate manipulation and extraction.\",\"PeriodicalId\":121436,\"journal\":{\"name\":\"Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings\",\"volume\":\"10 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2018-05-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3183440.3195081\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 40th International Conference on Software Engineering: Companion Proceeedings","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3183440.3195081","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 5
摘要
软件文档是现代软件系统的重要组成部分。一年比一年复杂,就像软件本身一样。对文档质量产生负面影响的一个方面是文本重复的存在。在软件文档中遇到的文本重复本质上是不精确的,即在单个文档中,相同的信息可能以不同的细节级别和不同的上下文中多次呈现。文档维护是一个尖锐的问题,在这个领域对自动化工具有强烈的需求。在这项研究中,我们提出了Duplicate Finder Toolkit,这是一个帮助专家进行重复维护相关任务的工具。我们的工具可以通过多种方式促进维护过程:1)精确和接近重复的检测;2)通过热图进行重复可视化;3)重复分析——对多个重复实例进行比较,评估它们的差异,探索重复上下文;4)重复操作和提取。
Software documentation is a significant component of modern software systems. Each year it becomes more and more complicated, just as the software itself. One of the aspects that negatively impact documentation quality is the presence of textual duplicates. Textual duplicates encountered in software documentation are inherently imprecise, i.e. in a single document the same information may be presented many times with different levels of detail and in various contexts. Documentation maintenance is an acute problem, and there is a strong demand for automation tools in this domain. In this study we present the Duplicate Finder Toolkit, a tool which assists an expert with duplicate maintenance-related tasks. Our tool can facilitate the maintenance process in a number of ways: 1) detection of both exact and near duplicates 2) duplicate visualization via heat maps 3) duplicate analysis - comparison of several duplicate instances, evaluation of their differences, exploration of duplicate context 4) duplicate manipulation and extraction.