ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph

IF 1.7 Q2 INFORMATION SCIENCE & LIBRARY SCIENCE

International Journal on Digital Libraries Pub Date : 2023-06-15 DOI:10.1007/s00799-023-00366-1

Salomon Kabongo, Jennifer D’Souza, Sören Auer

{"title":"ORKG-Leaderboards: a systematic workflow for mining leaderboards as a knowledge graph","authors":"Salomon Kabongo, Jennifer D’Souza, Sören Auer","doi":"10.1007/s00799-023-00366-1","DOIUrl":null,"url":null,"abstract":"Abstract The purpose of this work is to describe the orkg -Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg -Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg -Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.","PeriodicalId":44974,"journal":{"name":"International Journal on Digital Libraries","volume":"36 2 1","pages":"0"},"PeriodicalIF":1.7000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal on Digital Libraries","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1007/s00799-023-00366-1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"INFORMATION SCIENCE & LIBRARY SCIENCE","Score":null,"Total":0}

引用次数: 0

Abstract

Abstract The purpose of this work is to describe the orkg -Leaderboard software designed to extract leaderboards defined as task–dataset–metric tuples automatically from large collections of empirical research papers in artificial intelligence (AI). The software can support both the main workflows of scholarly publishing, viz. as LaTeX files or as PDF files. Furthermore, the system is integrated with the open research knowledge graph (ORKG) platform, which fosters the machine-actionable publishing of scholarly findings. Thus, the systemsss output, when integrated within the ORKG’s supported Semantic Web infrastructure of representing machine-actionable ‘resources’ on the Web, enables: (1) broadly, the integration of empirical results of researchers across the world, thus enabling transparency in empirical research with the potential to also being complete contingent on the underlying data source(s) of publications; and (2) specifically, enables researchers to track the progress in AI with an overview of the state-of-the-art across the most common AI tasks and their corresponding datasets via dynamic ORKG frontend views leveraging tables and visualization charts over the machine-actionable data. Our best model achieves performances above 90% F1 on the leaderboard extraction task, thus proving orkg -Leaderboards a practically viable tool for real-world usage. Going forward, in a sense, orkg -Leaderboards transforms the leaderboard extraction task to an automated digitalization task, which has been, for a long time in the community, a crowdsourced endeavor.

Abstract Image

查看原文本刊更多论文

ORKG-Leaderboards:将排行榜作为知识图进行挖掘的系统化工作流程

本文的目的是描述orkg -Leaderboard软件，该软件旨在从人工智能(AI)的大量实证研究论文中自动提取被定义为任务-数据集-度量元组的排行榜。该软件可以支持学术出版的主要工作流程，即作为LaTeX文件或PDF文件。此外，该系统与开放研究知识图谱(ORKG)平台集成，促进了学术成果的机器可操作出版。因此，当将系统输出集成到ORKG支持的表示网络上机器可操作的“资源”的语义网基础设施中时，可以:(1)广泛地集成世界各地研究人员的经验结果，从而使经验研究透明化，并有可能根据出版物的底层数据源完成;(2)具体来说，使研究人员能够通过动态ORKG前端视图利用机器可操作数据上的表格和可视化图表，跟踪人工智能的进展，概述最常见的人工智能任务及其相应数据集的最新技术。我们的最佳模型在排行榜提取任务中实现了90%以上的F1性能，从而证明了orkg -Leaderboards在现实世界中是一个切实可行的工具。从某种意义上说，orkg -Leaderboards将排行榜提取任务转变为自动数字化任务，这在社区中已经存在很长一段时间了。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

International Journal on Digital Libraries

CiteScore

4.30

自引率

6.70%

发文量

期刊介绍： The International Journal on Digital Libraries (IJDL) examines the theory and practice of acquisition definition organization management preservation and dissemination of digital information via global networking. It covers all aspects of digital libraries (DLs) from large-scale heterogeneous data and information management & access to linking and connectivity to security privacy and policies to its application use and evaluation.The scope of IJDL includes but is not limited to: The FAIR principle and the digital libraries infrastructure Findable: Information access and retrieval; semantic search; data and information exploration; information navigation; smart indexing and searching; resource discovery Accessible: visualization and digital collections; user interfaces; interfaces for handicapped users; HCI and UX in DLs; Security and privacy in DLs; multimodal access Interoperable: metadata (definition management curation integration); syntactic and semantic interoperability; linked data Reusable: reproducibility; Open Science; sustainability profitability repeatability of research results; confidentiality and privacy issues in DLs Digital Library Architectures including heterogeneous and dynamic data management; data and repositories Acquisition of digital information: authoring environments for digital objects; digitization of traditional content Digital Archiving and Preservation Digital Preservation and curation Digital archiving Web Archiving Archiving and preservation Strategies AI for Digital Libraries Machine Learning for DLs Data Mining in DLs NLP for DLs Applications of Digital Libraries Digital Humanities Open Data and their reuse Scholarly DLs (incl. bibliometrics altmetrics) Epigraphy and Paleography Digital Museums Future trends in Digital Libraries Definition of DLs in a ubiquitous digital library world Datafication of digital collections Interaction and user experience (UX) in DLs Information visualization Collection understanding Privacy and security Multimodal user interfaces Accessibility (or "Access for users with disabilities") UX studies