COVID-19数据的FAIR机器学习模型流水线实现

IF 1.3 3区 计算机科学 Q3 COMPUTER SCIENCE, INFORMATION SYSTEMS
Sakinat Folorunso, E. Ogundepo, Mariam Basajja, Joseph Awotunde, A. Kawu, Francisca Onaolapo Oladipo, Ibrahim Abdullahi
{"title":"COVID-19数据的FAIR机器学习模型流水线实现","authors":"Sakinat Folorunso, E. Ogundepo, Mariam Basajja, Joseph Awotunde, A. Kawu, Francisca Onaolapo Oladipo, Ibrahim Abdullahi","doi":"10.1162/dint_a_00182","DOIUrl":null,"url":null,"abstract":"Abstract Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group the data into homogeneous subgroups and ascertain the underlying structure of the data using a Nigerian-based FAIR dataset that contains data on economic factors, healthcare facilities, and coronavirus occurrences in all the 36 states of Nigeria. The model showed that research data and the ML pipeline can be FAIRified, shared, and reused by following the proposed FAIRification workflow and implementing technical architecture.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"971-990"},"PeriodicalIF":1.3000,"publicationDate":"2022-08-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"4","resultStr":"{\"title\":\"FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data\",\"authors\":\"Sakinat Folorunso, E. Ogundepo, Mariam Basajja, Joseph Awotunde, A. Kawu, Francisca Onaolapo Oladipo, Ibrahim Abdullahi\",\"doi\":\"10.1162/dint_a_00182\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Abstract Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group the data into homogeneous subgroups and ascertain the underlying structure of the data using a Nigerian-based FAIR dataset that contains data on economic factors, healthcare facilities, and coronavirus occurrences in all the 36 states of Nigeria. The model showed that research data and the ML pipeline can be FAIRified, shared, and reused by following the proposed FAIRification workflow and implementing technical architecture.\",\"PeriodicalId\":34023,\"journal\":{\"name\":\"Data Intelligence\",\"volume\":\"4 1\",\"pages\":\"971-990\"},\"PeriodicalIF\":1.3000,\"publicationDate\":\"2022-08-18\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"4\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Data Intelligence\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://doi.org/10.1162/dint_a_00182\",\"RegionNum\":3,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Data Intelligence","FirstCategoryId":"94","ListUrlMain":"https://doi.org/10.1162/dint_a_00182","RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 4

摘要

研究和发展正逐渐成为数据驱动的,科学数据管理和管理的FAIR指南(数据应该是可查找的、可访问的、可互操作的和可重用的)的实施有可能显著增强研究数据重用的框架。通过这种方式,FAIR正在帮助数字化转型。数据的“公平化”提高了数据的互操作性和(再)可用性,因此,新的和强大的分析工具,如机器学习(ML)模型,可以访问数据,以推断有意义的见解,提取可操作的信息,并识别隐藏的模式。本文旨在使用通用的公平工作流构建公平机器学习模型管道,使整个机器学习分析过程公平。因此,FAIR输入数据使用FAIR ML模型建模。对FAIR ML模型的输出数据也进行了FAIR处理。为此,应用混合分层k-均值(HHK)聚类ML算法将数据分组为同质子组,并使用基于尼日利亚的FAIR数据集确定数据的底层结构,该数据集包含尼日利亚所有36个州的经济因素、医疗设施和冠状病毒发病率的数据。该模型表明,通过遵循提出的farification工作流和实现技术架构,可以对研究数据和ML管道进行farification、共享和重用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
FAIR Machine Learning Model Pipeline Implementation of COVID-19 Data
Abstract Research and development are gradually becoming data-driven and the implementation of the FAIR Guidelines (that data should be Findable, Accessible, Interoperable, and Reusable) for scientific data administration and stewardship has the potential to remarkably enhance the framework for the reuse of research data. In this way, FAIR is aiding digital transformation. The ‘FAIRification’ of data increases the interoperability and (re)usability of data, so that new and robust analytical tools, such as machine learning (ML) models, can access the data to deduce meaningful insights, extract actionable information, and identify hidden patterns. This article aims to build a FAIR ML model pipeline using the generic FAIRification workflow to make the whole ML analytics process FAIR. Accordingly, FAIR input data was modelled using a FAIR ML model. The output data from the FAIR ML model was also made FAIR. For this, a hybrid hierarchical k-means (HHK) clustering ML algorithm was applied to group the data into homogeneous subgroups and ascertain the underlying structure of the data using a Nigerian-based FAIR dataset that contains data on economic factors, healthcare facilities, and coronavirus occurrences in all the 36 states of Nigeria. The model showed that research data and the ML pipeline can be FAIRified, shared, and reused by following the proposed FAIRification workflow and implementing technical architecture.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
Data Intelligence
Data Intelligence COMPUTER SCIENCE, INFORMATION SYSTEMS-
CiteScore
6.50
自引率
15.40%
发文量
40
审稿时长
8 weeks
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信