C. Psarras, Themistoklis G. Diamantopoulos, A. Symeonidis
{"title":"一种从源代码自动总结软件功能的机制","authors":"C. Psarras, Themistoklis G. Diamantopoulos, A. Symeonidis","doi":"10.1109/QRS.2019.00028","DOIUrl":null,"url":null,"abstract":"When developers search online to find software components to reuse, they usually first need to understand the container projects/libraries, and subsequently identify the required functionality. Several approaches identify and summarize the offerings of projects from their source code, however they often require that the developer has knowledge of the underlying topic modeling techniques; they do not provide a mechanism for tuning the number of topics, and they offer no control over the top terms for each topic. In this work, we use a vectorizer to extract information from variable/method names and comments, and apply Latent Dirichlet Allocation to cluster the source code files of a project into different semantic topics. The number of topics is optimized based on their purity with respect to project packages, while topic categories are constructed to provide further intuition and Stack Exchange tags are used to express the topics in more abstract terms.","PeriodicalId":122665,"journal":{"name":"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)","volume":"27 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2019-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"A Mechanism for Automatically Summarizing Software Functionality from Source Code\",\"authors\":\"C. Psarras, Themistoklis G. Diamantopoulos, A. Symeonidis\",\"doi\":\"10.1109/QRS.2019.00028\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"When developers search online to find software components to reuse, they usually first need to understand the container projects/libraries, and subsequently identify the required functionality. Several approaches identify and summarize the offerings of projects from their source code, however they often require that the developer has knowledge of the underlying topic modeling techniques; they do not provide a mechanism for tuning the number of topics, and they offer no control over the top terms for each topic. In this work, we use a vectorizer to extract information from variable/method names and comments, and apply Latent Dirichlet Allocation to cluster the source code files of a project into different semantic topics. The number of topics is optimized based on their purity with respect to project packages, while topic categories are constructed to provide further intuition and Stack Exchange tags are used to express the topics in more abstract terms.\",\"PeriodicalId\":122665,\"journal\":{\"name\":\"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)\",\"volume\":\"27 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2019-07-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/QRS.2019.00028\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2019 IEEE 19th International Conference on Software Quality, Reliability and Security (QRS)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/QRS.2019.00028","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
A Mechanism for Automatically Summarizing Software Functionality from Source Code
When developers search online to find software components to reuse, they usually first need to understand the container projects/libraries, and subsequently identify the required functionality. Several approaches identify and summarize the offerings of projects from their source code, however they often require that the developer has knowledge of the underlying topic modeling techniques; they do not provide a mechanism for tuning the number of topics, and they offer no control over the top terms for each topic. In this work, we use a vectorizer to extract information from variable/method names and comments, and apply Latent Dirichlet Allocation to cluster the source code files of a project into different semantic topics. The number of topics is optimized based on their purity with respect to project packages, while topic categories are constructed to provide further intuition and Stack Exchange tags are used to express the topics in more abstract terms.