Nirjas:一个从源代码中提取元数据的开源框架

Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra
{"title":"Nirjas:一个从源代码中提取元数据的开源框架","authors":"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra","doi":"10.1109/confluence52989.2022.9734222","DOIUrl":null,"url":null,"abstract":"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization","PeriodicalId":261941,"journal":{"name":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","volume":"8 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2022-01-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Nirjas: An open source framework for extracting metadata from the source code\",\"authors\":\"Ayushi Bhardwaj, Sahil, Kaushlendra Pratap, Gaurav Mishra\",\"doi\":\"10.1109/confluence52989.2022.9734222\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization\",\"PeriodicalId\":261941,\"journal\":{\"name\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"volume\":\"8 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2022-01-27\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1109/confluence52989.2022.9734222\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1109/confluence52989.2022.9734222","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

元数据/注释是任何软件开发过程的关键元素。在本文中,我们继续解释源代码中的元数据/注释如何在理解软件中发挥重要作用。我们引入了一个基于python的开源框架“Nirjas”,它可以帮助我们以结构化的方式提取元数据。在不同编程语言的源文件中添加注释有不同的语法、类型和广泛接受的约定。各种边缘情况会在我们的提取中产生噪声,为此我们使用Regex准确地检索这些元数据。非regex方法可以给出结果,但在准确性和噪声分离方面有所欠缺。Nirjas还可以分离不同类型的注释、源代码,并向我们提供有关这些注释的详细信息,包括行号、文件名、使用的语言、总SLOC等。Nirjas是一个独立的python框架/库,可以使用源代码安装或使用pip (python的包安装程序)轻松快速地安装。Nirjas最初是作为Google夏季代码项目的一个项目创建的,目前由FOSSology组织开发和维护
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Nirjas: An open source framework for extracting metadata from the source code
Metadata/Comments are the critical element of any software development process. In this paper, we went on to explain how the metadata/comments in the source code can play an essential role in comprehending the software. We introduced a python based open-source framework “Nirjas” that helps us in extracting the metadata in a structured manner. There are various syntax, types and widely accepted conventions for adding a comment in the source file of different programming languages. Various edge cases can create noise in our extraction, for which we used Regex to accurately retrieve these metadata. The non-Regex method can give us the result but misses out on accuracy and noise separation. Nirjas can also separate different types of comments, source code and provide us with the details about those comments in terms of the line number, file name, language used, total SLOC, etc. Nirjas is a standalone python framework/library and can be easily and quickly installed using the source installation or using pip (package installer for Python). Nirjas was first created and started out as one of the projects during the Google Summer of Code program and is currently developed and maintained under the FOSSology organization
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信