Nirjas:一个从源代码中提取元数据的开源框架

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence) Pub Date : 2022-01-27 DOI:10.1109/confluence52989.2022.9734222

Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra

{"title":"Nirjas:一个从源代码中提取元数据的开源框架","authors":"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra","doi":"10.1109/confluence52989.2022.9734222","DOIUrl":null,"url":null,"abstract":"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nirjas: An open source framework for extracting metadata from the source code\",\"authors\":\"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra\",\"doi\":\"10.1109/confluence52989.2022.9734222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

元数据/注释是任何软件开发过程的关键元素。在本文中，我们继续解释源代码中的元数据/注释如何在理解软件中发挥重要作用。我们引入了一个基于python的开源框架“Nirjas”，它可以帮助我们以结构化的方式提取元数据。在不同编程语言的源文件中添加注释有不同的语法、类型和广泛接受的约定。各种边缘情况会在我们的提取中产生噪声，为此我们使用Regex准确地检索这些元数据。非regex方法可以给出结果，但在准确性和噪声分离方面有所欠缺。Nirjas还可以分离不同类型的注释、源代码，并向我们提供有关这些注释的详细信息，包括行号、文件名、使用的语言、总SLOC等。Nirjas是一个独立的python框架/库，可以使用源代码安装或使用pip (python的包安装程序)轻松快速地安装。Nirjas最初是作为Google夏季代码项目的一个项目创建的，目前由FOSSology组织开发和维护

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Nirjas: An open source framework for extracting metadata from the source code

Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)

自引率

0.00%

发文量