A framework for creating knowledge graphs of scientific software metadata

IF 4.1 Q1 INFORMATION SCIENCE & LIBRARY SCIENCE

Quantitative Science Studies Pub Date : 2021-11-05 DOI:10.1162/qss_a_00167

Aidan Kelley, D. Garijo

{"title":"A framework for creating knowledge graphs of scientific software metadata","authors":"Aidan Kelley, D. Garijo","doi":"10.1162/qss_a_00167","DOIUrl":null,"url":null,"abstract":"Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.","PeriodicalId":34021,"journal":{"name":"Quantitative Science Studies","volume":"2 1","pages":"1423-1446"},"PeriodicalIF":4.1000,"publicationDate":"2021-11-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"13","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Quantitative Science Studies","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1162/qss_a_00167","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 13

Abstract

Abstract An increasing number of researchers rely on computational methods to generate or manipulate the results described in their scientific publications. Software created to this end—scientific software—is key to understanding, reproducing, and reusing existing work in many disciplines, ranging from Geosciences to Astronomy or Artificial Intelligence. However, scientific software is usually challenging to find, set up, and compare to similar software due to its disconnected documentation (dispersed in manuals, readme files, websites, and code comments) and the lack of structured metadata to describe it. As a result, researchers have to manually inspect existing tools to understand their differences and incorporate them into their work. This approach scales poorly with the number of publications and tools made available every year. In this paper we address these issues by introducing a framework for automatically extracting scientific software metadata from its documentation (in particular, their readme files); a methodology for structuring the extracted metadata in a Knowledge Graph (KG) of scientific software; and an exploitation framework for browsing and comparing the contents of the generated KG. We demonstrate our approach by creating a KG with metadata from over 10,000 scientific software entries from public code repositories.

查看原文本刊更多论文

创建科学软件元数据知识图谱的框架

越来越多的研究人员依靠计算方法来生成或操纵其科学出版物中描述的结果。为此目的而创建的软件——科学软件——是理解、复制和重用许多学科现有工作的关键，从地球科学到天文学或人工智能。然而，由于科学软件的文档(分散在手册、自述文件、网站和代码注释中)和缺乏结构化元数据来描述它，因此查找、设置和比较类似的软件通常具有挑战性。因此，研究人员必须手动检查现有的工具，以了解它们之间的差异，并将它们纳入他们的工作中。这种方法很难适应每年可用的出版物和工具的数量。在本文中，我们通过引入一个框架来解决这些问题，该框架可以自动从科学软件的文档(特别是它们的自述文件)中提取元数据;在科学软件知识图谱(Knowledge Graph, KG)中构造提取元数据的方法;以及用于浏览和比较所生成的KG的内容的开发框架。我们通过使用来自公共代码库的10,000多个科学软件条目的元数据创建一个KG来演示我们的方法。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊