{"title":"基于AST差异的精确文件跟踪","authors":"Akira Fujimoto, Yoshiki Higo, S. Kusumoto","doi":"10.1109/APSEC53868.2021.00067","DOIUrl":null,"url":null,"abstract":"In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.","PeriodicalId":143800,"journal":{"name":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","volume":null,"pages":null},"PeriodicalIF":0.0000,"publicationDate":"2021-12-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Towards Accurate File Tracking Based on AST Differences\",\"authors\":\"Akira Fujimoto, Yoshiki Higo, S. Kusumoto\",\"doi\":\"10.1109/APSEC53868.2021.00067\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.\",\"PeriodicalId\":143800,\"journal\":{\"name\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"volume\":null,\"pages\":null},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-12-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2021 28th Asia-Pacific Software Engineering Conference (APSEC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/APSEC53868.2021.00067\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2021 28th Asia-Pacific Software Engineering Conference (APSEC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/APSEC53868.2021.00067","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards Accurate File Tracking Based on AST Differences
In the field of software development, version control systems such as Git are imperative tools that help software teams manage source code. Git can detect a change history of each file individually. Even if a file was renamed in the past, Git can identify and track the before renamed file based on content similarities, which are calculated as the ratio of lines that match pre- and post-change files to the total number of lines. However, line-based comparison techniques do not consider source code structures and have coarse granularity, which can result in misidentifying pre-change files and tracking interruptions. To resolve these problems, this paper proposes a technique that calculates file content similarities using source code differences based on an abstract syntax tree. In experiments conducted on 197 open source Java-based projects, we found that the number of rename detections increased 3.3 %, and that, on average, our technique tracked commits 1.37 times more frequently than previous technique. We also measured accuracy levels and found that the maximum F - measure was 0.943, which is higher than the 0.926 maximum value of the line-based technique.