A framework for creating knowledge graphs of scientific software metadata

IF 4.1 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE
Aidan Kelley, D. Garijo
{"title":"A framework for creating knowledge graphs of scientific software metadata","authors":"Aidan Kelley, D. Garijo","doi":"10.1162/qss_a_00167","DOIUrl":null,"url":null,"abstract":"Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1423-1446"},"PeriodicalIF":4.1000,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Science Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/qss_a_00167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}
引用次数: 13

Abstract

Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.
创建科学软件元数据知识图谱的框架
越来越多的研究人员依靠计算方法来生成或操纵其科学出版物中描述的结果。为此目的而创建的软件——科学软件——是理解、复制和重用许多学科现有工作的关键,从地球科学到天文学或人工智能。然而,由于科学软件的文档(分散在手册、自述文件、网站和代码注释中)和缺乏结构化元数据来描述它,因此查找、设置和比较类似的软件通常具有挑战性。因此,研究人员必须手动检查现有的工具,以了解它们之间的差异,并将它们纳入他们的工作中。这种方法很难适应每年可用的出版物和工具的数量。在本文中,我们通过引入一个框架来解决这些问题,该框架可以自动从科学软件的文档(特别是它们的自述文件)中提取元数据;在科学软件知识图谱(Knowledge Graph, KG)中构造提取元数据的方法;以及用于浏览和比较所生成的KG的内容的开发框架。我们通过使用来自公共代码库的10,000多个科学软件条目的元数据创建一个KG来演示我们的方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Quantitative Science Studies
Quantitative Science Studies INFORMATION SCIENCE & LIBRARY SCIENCE-
CiteScore
12.10
自引率
12.50%
发文量
46
审稿时长
22 weeks
期刊介绍:
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信