N. Anquetil, T. Lethbridge
{"title":"从源文件的名称中恢复软件架构","authors":"N. Anquetil, T. Lethbridge","doi":"10.1002/(SICI)1096-908X(199905/06)11:3%3C201::AID-SMR192%3E3.0.CO;2-1","DOIUrl":null,"url":null,"abstract":"We discuss how to extract a useful set of subsystems from a set of existing source-code file names. This problem is challenging because many legacy systems use thousands of files names, including some that are very short and cryptic. At the same time the problem is important because software maintainers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words to try to identify in file names. We do this by (a) iteratively decomposing file names, (b) finding common substrings, and (c) choosing words in routine names, in an English dictionary or in source-code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our five approaches, we present two experiments. The first compares the ‘concepts’ found in each file name by each method with the results of manually decomposing file names. The second experiment compares automatically generated subsystems with subsystem examples proposed by experts. We conclude that two methods are most effective: extracting concepts using common substrings and extracting those concepts that relate to the names of routines in the files. Copyright © 1999 John Wiley & Sons, Ltd.","PeriodicalId":383619,"journal":{"name":"J. Softw. Maintenance Res. Pract.","volume":"14 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"1999-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"123","resultStr":"{\"title\":\"Recovering software architecture from the names of source files\",\"authors\":\"N. Anquetil, T. Lethbridge\",\"doi\":\"10.1002/(SICI)1096-908X(199905/06)11:3%3C201::AID-SMR192%3E3.0.CO;2-1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"We discuss how to extract a useful set of subsystems from a set of existing source-code file names. This problem is challenging because many legacy systems use thousands of files names, including some that are very short and cryptic. At the same time the problem is important because software maintainers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words to try to identify in file names. We do this by (a) iteratively decomposing file names, (b) finding common substrings, and (c) choosing words in routine names, in an English dictionary or in source-code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our five approaches, we present two experiments. The first compares the ‘concepts’ found in each file name by each method with the results of manually decomposing file names. The second experiment compares automatically generated subsystems with subsystem examples proposed by experts. We conclude that two methods are most effective: extracting concepts using common substrings and extracting those concepts that relate to the names of routines in the files. Copyright © 1999 John Wiley & Sons, Ltd.\",\"PeriodicalId\":383619,\"journal\":{\"name\":\"J. Softw. Maintenance Res. Pract.\",\"volume\":\"14 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"1999-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"123\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"J. Softw. Maintenance Res. Pract.\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1002/(SICI)1096-908X(199905/06)11:3%3C201::AID-SMR192%3E3.0.CO;2-1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"J. Softw. Maintenance Res. Pract.","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1002/(SICI)1096-908X(199905/06)11:3%3C201::AID-SMR192%3E3.0.CO;2-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 123
Recovering software architecture from the names of source files
We discuss how to extract a useful set of subsystems from a set of existing source-code file names. This problem is challenging because many legacy systems use thousands of files names, including some that are very short and cryptic. At the same time the problem is important because software maintainers often find it difficult to understand such systems. We propose a general algorithm to cluster files based on their names, and a set of alternative methods for implementing the algorithm. One of the key tasks is picking candidate words to try to identify in file names. We do this by (a) iteratively decomposing file names, (b) finding common substrings, and (c) choosing words in routine names, in an English dictionary or in source-code comments. In addition, we investigate generating abbreviations from the candidate words in order to find matches in file names, as well as how to split file names into components given no word markers. To compare and evaluate our five approaches, we present two experiments. The first compares the ‘concepts’ found in each file name by each method with the results of manually decomposing file names. The second experiment compares automatically generated subsystems with subsystem examples proposed by experts. We conclude that two methods are most effective: extracting concepts using common substrings and extracting those concepts that relate to the names of routines in the files. Copyright © 1999 John Wiley & Sons, Ltd.