{"title":"源代码摘要的自然性。它有多重要?","authors":"C. Ferretti, Martina Saletta","doi":"10.1109/ICPC58990.2023.00027","DOIUrl":null,"url":null,"abstract":"Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.","PeriodicalId":376593,"journal":{"name":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","volume":"111 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Naturalness in Source Code Summarization. How Significant is it?\",\"authors\":\"C. Ferretti, Martina Saletta\",\"doi\":\"10.1109/ICPC58990.2023.00027\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.\",\"PeriodicalId\":376593,\"journal\":{\"name\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"volume\":\"111 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-05-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/ICPC58990.2023.00027\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2023 IEEE/ACM 31st International Conference on Program Comprehension (ICPC)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/ICPC58990.2023.00027","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Naturalness in Source Code Summarization. How Significant is it?
Research in source code summarization, that is the description of the functionality of a program with short sentences expressed in natural language, is a topic of great interest in the software engineering community, since it can help in automatically generating software documentation, and in general can ease the effort of the developers in understanding the code they are working on. In this work, which is conceived as a negative results paper, we study the existing neural models designed for this purpose, pointing out their high sensitivity to the natural elements present in the source code (i.e. comments and identifiers) and the related drop in performance when such elements are ablated or masked. We then propose a novel source code summarization approach based on the aid of an intermediate pseudo-language, through which we are able to fine-tune the BRIO model for natural language on source code summarization, and to achieve results comparable to that obtained by the state-of-the-art source code competitors (e.g. PLBART and CodeBERT). We finally discuss about the limitations of these NLP-based approaches when transferred in the domain of source code processing, and we provide some insights for further research directions.