数据之旅:通过抽象解释AI工作流

IF 2.9 3区计算机科学 Q2 COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE

Semantic Web Pub Date : 2023-06-15 DOI:10.3233/sw-233407

E. Daga, Paul Groth

{"title":"数据之旅:通过抽象解释AI工作流","authors":"E. Daga, Paul Groth","doi":"10.3233/sw-233407","DOIUrl":null,"url":null,"abstract":"Artificial intelligence systems are not simply built on a single dataset or trained model. Instead, they are made by complex data science workflows involving multiple datasets, models, preparation scripts, and algorithms. Given this complexity, in order to understand these AI systems, we need to provide explanations of their functioning at higher levels of abstraction. To tackle this problem, we focus on the extraction and representation of data journeys from these workflows. A data journey is a multi-layered semantic representation of data processing activity linked to data science code and assets. We propose an ontology to capture the essential elements of a data journey and an approach to extract such data journeys. Using a corpus of Python notebooks from Kaggle, we show that we are able to capture high-level semantic data flow that is more compact than using the code structure itself. Furthermore, we show that introducing an intermediate knowledge graph representation outperforms models that rely only on the code itself. Finally, we report on a user survey to reflect on the challenges and opportunities presented by computational data journeys for explainable AI.","PeriodicalId":48694,"journal":{"name":"Semantic Web","volume":"6 1","pages":""},"PeriodicalIF":2.9000,"publicationDate":"2023-06-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"2","resultStr":"{\"title\":\"Data journeys: Explaining AI workflows through abstraction\",\"authors\":\"E. Daga, Paul Groth\",\"doi\":\"10.3233/sw-233407\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Artificial intelligence systems are not simply built on a single dataset or trained model. Instead, they are made by complex data science workflows involving multiple datasets, models, preparation scripts, and algorithms. Given this complexity, in order to understand these AI systems, we need to provide explanations of their functioning at higher levels of abstraction. To tackle this problem, we focus on the extraction and representation of data journeys from these workflows. A data journey is a multi-layered semantic representation of data processing activity linked to data science code and assets. We propose an ontology to capture the essential elements of a data journey and an approach to extract such data journeys. Using a corpus of Python notebooks from Kaggle, we show that we are able to capture high-level semantic data flow that is more compact than using the code structure itself. Furthermore, we show that introducing an intermediate knowledge graph representation outperforms models that rely only on the code itself. Finally, we report on a user survey to reflect on the challenges and opportunities presented by computational data journeys for explainable AI.\",\"PeriodicalId\":48694,\"journal\":{\"name\":\"Semantic Web\",\"volume\":\"6 1\",\"pages\":\"\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2023-06-15\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"2\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Semantic Web\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.3233/sw-233407\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Semantic Web","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.3233/sw-233407","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCE","Score":null,"Total":0}

引用次数: 2

摘要

人工智能系统不是简单地建立在单个数据集或经过训练的模型上。相反，它们是由复杂的数据科学工作流组成的，涉及多个数据集、模型、准备脚本和算法。考虑到这种复杂性，为了理解这些人工智能系统，我们需要在更高的抽象层次上解释它们的功能。为了解决这个问题，我们专注于从这些工作流中提取和表示数据旅程。数据旅程是与数据科学代码和资产相关联的数据处理活动的多层语义表示。我们提出了一个本体来捕获数据旅程的基本元素，并提出了一种提取这些数据旅程的方法。使用来自Kaggle的Python笔记本语料库，我们展示了我们能够捕获比使用代码结构本身更紧凑的高级语义数据流。此外，我们表明引入中间知识图表示优于仅依赖代码本身的模型。最后，我们报告了一项用户调查，以反映可解释人工智能的计算数据旅程所带来的挑战和机遇。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Data journeys: Explaining AI workflows through abstraction

Artificial intelligence systems are not simply built on a single dataset or trained model. Instead, they are made by complex data science workflows involving multiple datasets, models, preparation scripts, and algorithms. Given this complexity, in order to understand these AI systems, we need to provide explanations of their functioning at higher levels of abstraction. To tackle this problem, we focus on the extraction and representation of data journeys from these workflows. A data journey is a multi-layered semantic representation of data processing activity linked to data science code and assets. We propose an ontology to capture the essential elements of a data journey and an approach to extract such data journeys. Using a corpus of Python notebooks from Kaggle, we show that we are able to capture high-level semantic data flow that is more compact than using the code structure itself. Furthermore, we show that introducing an intermediate knowledge graph representation outperforms models that rely only on the code itself. Finally, we report on a user survey to reflect on the challenges and opportunities presented by computational data journeys for explainable AI.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Semantic Web COMPUTER SCIENCE, ARTIFICIAL INTELLIGENCEC-COMPUTER SCIENCE, INFORMATION SYSTEMS

CiteScore

8.30

自引率

6.70%

发文量

期刊介绍： The journal Semantic Web – Interoperability, Usability, Applicability brings together researchers from various fields which share the vision and need for more effective and meaningful ways to share information across agents and services on the future internet and elsewhere. As such, Semantic Web technologies shall support the seamless integration of data, on-the-fly composition and interoperation of Web services, as well as more intuitive search engines. The semantics – or meaning – of information, however, cannot be defined without a context, which makes personalization, trust, and provenance core topics for Semantic Web research. New retrieval paradigms, user interfaces, and visualization techniques have to unleash the power of the Semantic Web and at the same time hide its complexity from the user. Based on this vision, the journal welcomes contributions ranging from theoretical and foundational research over methods and tools to descriptions of concrete ontologies and applications in all areas. We especially welcome papers which add a social, spatial, and temporal dimension to Semantic Web research, as well as application-oriented papers making use of formal semantics.