{"title":"Nirjas:一个从源代码中提取元数据的开源框架","authors":"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra","doi":"10.1109/confluence52989.2022.9734222","DOIUrl":null,"url":null,"abstract":"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nirjas: An open source framework for extracting metadata from the source code\",\"authors\":\"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra\",\"doi\":\"10.1109/confluence52989.2022.9734222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Nirjas: An open source framework for extracting metadata from the source code
Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization