Shaowei Wang, D. Lo, Zhenchang Xing, Lingxiao Jiang
{"title":"基于信息检索的关注定位:基于Linux内核的实证研究","authors":"Shaowei Wang, D. Lo, Zhenchang Xing, Lingxiao Jiang","doi":"10.1109/WCRE.2011.72","DOIUrl":null,"url":null,"abstract":"Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel dataset. The Linux kernel dataset contains more than 1,500 concerns that are linked to over 85,000 C functions. We have evaluated the effectiveness of the ten techniques on recovering the links between the concerns and the implementing functions and ranked the IR techniques based on their precisions on concern localization. Keywords-concern localization; information retrieval; Linux kernel; mean average precision;","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"49 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"53","resultStr":"{\"title\":\"Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel\",\"authors\":\"Shaowei Wang, D. Lo, Zhenchang Xing, Lingxiao Jiang\",\"doi\":\"10.1109/WCRE.2011.72\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel dataset. The Linux kernel dataset contains more than 1,500 concerns that are linked to over 85,000 C functions. We have evaluated the effectiveness of the ten techniques on recovering the links between the concerns and the implementing functions and ranked the IR techniques based on their precisions on concern localization. Keywords-concern localization; information retrieval; Linux kernel; mean average precision;\",\"PeriodicalId\":350863,\"journal\":{\"name\":\"2011 18th Working Conference on Reverse Engineering\",\"volume\":\"49 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"53\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th Working Conference on Reverse Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCRE.2011.72\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th Working Conference on Reverse Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2011.72","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Concern Localization using Information Retrieval: An Empirical Study on Linux Kernel
Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel dataset. The Linux kernel dataset contains more than 1,500 concerns that are linked to over 85,000 C functions. We have evaluated the effectiveness of the ten techniques on recovering the links between the concerns and the implementing functions and ranked the IR techniques based on their precisions on concern localization. Keywords-concern localization; information retrieval; Linux kernel; mean average precision;