Automatic transparency evaluation for open knowledge extraction systems.

IF 2 3区工程技术 Q3 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Journal of Biomedical Semantics Pub Date : 2023-08-31 DOI:10.1186/s13326-023-00293-9

Maryam Basereh, Annalina Caputo, Rob Brennan

{"title":"Automatic transparency evaluation for open knowledge extraction systems.","authors":"Maryam Basereh, Annalina Caputo, Rob Brennan","doi":"10.1186/s13326-023-00293-9","DOIUrl":null,"url":null,"abstract":"Background: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.Results: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.Conclusions: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing.","PeriodicalId":15055,"journal":{"name":"Journal of Biomedical Semantics","volume":"14 1","pages":"12"},"PeriodicalIF":2.0000,"publicationDate":"2023-08-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC10468861/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Biomedical Semantics","FirstCategoryId":"5","ListUrlMain":"https://doi.org/10.1186/s13326-023-00293-9","RegionNum":3,"RegionCategory":"工程技术","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Background: This paper proposes Cyrus, a new transparency evaluation framework, for Open Knowledge Extraction (OKE) systems. Cyrus is based on the state-of-the-art transparency models and linked data quality assessment dimensions. It brings together a comprehensive view of transparency dimensions for OKE systems. The Cyrus framework is used to evaluate the transparency of three linked datasets, which are built from the same corpus by three state-of-the-art OKE systems. The evaluation is automatically performed using a combination of three state-of-the-art FAIRness (Findability, Accessibility, Interoperability, Reusability) assessment tools and a linked data quality evaluation framework, called Luzzu. This evaluation includes six Cyrus data transparency dimensions for which existing assessment tools could be identified. OKE systems extract structured knowledge from unstructured or semi-structured text in the form of linked data. These systems are fundamental components of advanced knowledge services. However, due to the lack of a transparency framework for OKE, most OKE systems are not transparent. This means that their processes and outcomes are not understandable and interpretable. A comprehensive framework sheds light on different aspects of transparency, allows comparison between the transparency of different systems by supporting the development of transparency scores, gives insight into the transparency weaknesses of the system, and ways to improve them. Automatic transparency evaluation helps with scalability and facilitates transparency assessment. The transparency problem has been identified as critical by the European Union Trustworthy Artificial Intelligence (AI) guidelines. In this paper, Cyrus provides the first comprehensive view of transparency dimensions for OKE systems by merging the perspectives of the FAccT (Fairness, Accountability, and Transparency), FAIR, and linked data quality research communities.

Results: In Cyrus, data transparency includes ten dimensions which are grouped in two categories. In this paper, six of these dimensions, i.e., provenance, interpretability, understandability, licensing, availability, interlinking have been evaluated automatically for three state-of-the-art OKE systems, using the state-of-the-art metrics and tools. Covid-on-the-Web is identified to have the highest mean transparency.

Conclusions: This is the first research to study the transparency of OKE systems that provides a comprehensive set of transparency dimensions spanning ethics, trustworthy AI, and data quality approaches to transparency. It also demonstrates how to perform automated transparency evaluation that combines existing FAIRness and linked data quality assessment tools for the first time. We show that state-of-the-art OKE systems vary in the transparency of the linked data generated and that these differences can be automatically quantified leading to potential applications in trustworthy AI, compliance, data protection, data governance, and future OKE system design and testing.

Abstract Image

查看原文本刊更多论文

开放知识提取系统的自动透明度评估。

背景:本文提出了一种新的面向开放知识抽取(OKE)系统的透明度评价框架Cyrus。Cyrus基于最先进的透明度模型和关联的数据质量评估维度。它汇集了对OKE系统透明度维度的全面视图。Cyrus框架用于评估三个关联数据集的透明度，这些数据集由三个最先进的OKE系统从相同的语料库构建而成。评估使用三种最先进的公平性(可查找性、可访问性、互操作性、可重用性)评估工具和一个被称为Luzzu的关联数据质量评估框架的组合自动执行。该评估包括六个Cyrus数据透明度维度，现有评估工具可以识别这些维度。OKE系统以关联数据的形式从非结构化或半结构化文本中提取结构化知识。这些系统是高级知识服务的基本组成部分。然而，由于缺乏一个透明的OKE框架，大多数OKE系统是不透明的。这意味着它们的过程和结果是不可理解和解释的。一个全面的框架阐明了透明度的不同方面，通过支持透明度分数的发展，允许对不同系统的透明度进行比较，深入了解系统的透明度弱点，以及改进它们的方法。自动透明度评估有助于可伸缩性和促进透明度评估。透明度问题已被欧盟可信赖人工智能(AI)指南确定为关键问题。在本文中，Cyrus通过合并FAccT(公平性、问责制和透明度)、FAIR和关联数据质量研究社区的观点，提供了OKE系统透明度维度的第一个全面视图。结果:在Cyrus中，数据透明度包括十个维度，分为两类。在本文中，使用最先进的度量和工具，对三个最先进的OKE系统自动评估了这些维度中的六个，即来源、可解释性、可理解性、许可、可用性、互连。网络上的冠状病毒被认为具有最高的平均透明度。结论:这是第一个研究OKE系统透明度的研究，该系统提供了一套全面的透明度维度，涵盖道德、可信赖的人工智能和数据质量透明度方法。它还首次演示了如何执行结合现有公平性和关联数据质量评估工具的自动化透明度评估。我们表明，最先进的OKE系统在生成的关联数据的透明度方面存在差异，这些差异可以自动量化，从而在可信赖的人工智能、合规性、数据保护、数据治理以及未来的OKE系统设计和测试中产生潜在的应用。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Journal of Biomedical Semantics MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

4.20

自引率

5.30%

发文量

审稿时长

30 weeks

期刊介绍： Journal of Biomedical Semantics addresses issues of semantic enrichment and semantic processing in the biomedical domain. The scope of the journal covers two main areas: Infrastructure for biomedical semantics: focusing on semantic resources and repositories, meta-data management and resource description, knowledge representation and semantic frameworks, the Biomedical Semantic Web, and semantic interoperability. Semantic mining, annotation, and analysis: focusing on approaches and applications of semantic resources; and tools for investigation, reasoning, prediction, and discoveries in biomedicine.