{"title":"从标识符中提取领域概念的研究","authors":"S. Abebe, P. Tonella","doi":"10.1109/WCRE.2011.19","DOIUrl":null,"url":null,"abstract":"Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.","PeriodicalId":350863,"journal":{"name":"2011 18th Working Conference on Reverse Engineering","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2011-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"24","resultStr":"{\"title\":\"Towards the Extraction of Domain Concepts from the Identifiers\",\"authors\":\"S. Abebe, P. Tonella\",\"doi\":\"10.1109/WCRE.2011.19\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.\",\"PeriodicalId\":350863,\"journal\":{\"name\":\"2011 18th Working Conference on Reverse Engineering\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2011-10-17\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"24\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2011 18th Working Conference on Reverse Engineering\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/WCRE.2011.19\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2011 18th Working Conference on Reverse Engineering","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/WCRE.2011.19","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Towards the Extraction of Domain Concepts from the Identifiers
Program identifiers represent an invaluable source of information for developers who are not familiar with the code to be evolved. Domain concepts and inter-concept relationships can be automatically extracted by means of natural language processing techniques applied to the program identifiers. However, the ontology produced by this approach tends to be very large and to include implementation details that reduce its usefulness for domain concept understanding. In this paper, we analyze the effectiveness of information retrieval based techniques used to filter domain concepts and relations from the implementation details, so as to obtain a smaller, more informative domain ontology. In particular, we show that fully automated techniques based on keywords or topics have quite poor performance, while a semi-automated approach, requiring limited user involvement, can highly improve the filtering of domain concepts.