Data Intelligence最新文献

筛选
英文 中文
Galaxy: A Decade of Realising CWFR Concepts 银河:实现CWFR概念的十年
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00136
Beatriz Serrano-Solano, A. Fouilloux, Ignacio Eguinoa, Matúš Kalaš, B. Grüning, Frederik Coppens
{"title":"Galaxy: A Decade of Realising CWFR Concepts","authors":"Beatriz Serrano-Solano, A. Fouilloux, Ignacio Eguinoa, Matúš Kalaš, B. Grüning, Frederik Coppens","doi":"10.1162/dint_a_00136","DOIUrl":"https://doi.org/10.1162/dint_a_00136","url":null,"abstract":"Abstract Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49187666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research 编者注:研究规范工作流框架特刊
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_e_00122
P. Wittenburg, A. Hardisty, Amirpasha Mozzafari, Limor Peer, N. Skvortsov, A. Spinuso, Zhiming Zhao
{"title":"Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research","authors":"P. Wittenburg, A. Hardisty, Amirpasha Mozzafari, Limor Peer, N. Skvortsov, A. Spinuso, Zhiming Zhao","doi":"10.1162/dint_e_00122","DOIUrl":"https://doi.org/10.1162/dint_e_00122","url":null,"abstract":"1Gemeindweg 55, 47533 Kleve, Germany 2Cardiff University, Cardiff, South Glamorgan , CF14 3UX, Wales, UK 3Forschungszentrum Jülich GmbH, 52425 Jülich, Germany 4Institution for Social and Policy Studies, Yale University, New Haven, CT 06520, USA 5Vavilov 44/2, 121351 Moscow, Russia 6Utrechtseweg 297, 3731 GA De Bilt, the Netherlands 7University of Amsterdam, PO-Box 94323, 1090 GH Amsterdam, the Netherlands","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45697513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canonical Workflow for Experimental Research 规范的实验研究工作流程
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00123
Dirk Betz, Claudia Biniossek, Christophe Blanchi, Felix Henninger, T. Lauer, P. Wieder, P. Wittenburg, M. Zünkeler
{"title":"Canonical Workflow for Experimental Research","authors":"Dirk Betz, Claudia Biniossek, Christophe Blanchi, Felix Henninger, T. Lauer, P. Wieder, P. Wittenburg, M. Zünkeler","doi":"10.1162/dint_a_00123","DOIUrl":"https://doi.org/10.1162/dint_a_00123","url":null,"abstract":"Abstract The overall expectation of introducing Canonical Workflow for Experimental Research and FAIR digital objects (FDOs) can be summarised as reducing the gap between workflow technology and research practices to make experimental work more efficient and improve FAIRness without adding administrative load on the researchers. In this document, we will describe, with the help of an example, how CWFR could work in detail and improve research procedures. We have chosen the example of “experiments with human subjects” which stretches from planning an experiment to storing the collected data in a repository. While we focus on experiments with human subjects, we are convinced that CWFR can be applied to many other data generation processes based on experiments. The main challenge is to identify repeating patterns in existing research practices that can be abstracted to create CWFR. In this document, we will include detailed examples from different disciplines to demonstrate that CWFR can be implemented without violating specific disciplinary or methodological requirements. We do not claim to be comprehensive in all aspects, since these examples are meant to prove the concept of CWFR.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42683678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canonical Workflow for Machine Learning Tasks 机器学习任务的规范工作流
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00124
Christophe Blanchi, B. Gebre, P. Wittenburg
{"title":"Canonical Workflow for Machine Learning Tasks","authors":"Christophe Blanchi, B. Gebre, P. Wittenburg","doi":"10.1162/dint_a_00124","DOIUrl":"https://doi.org/10.1162/dint_a_00124","url":null,"abstract":"Abstract There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41320073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Canonical Analysis Workflows Documented Data Harmonization on Global Air Quality Data 实现全球空气质量数据规范化分析工作流程文档化数据协调
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00130
S. Schröder, Eleonora Epp, A. Mozaffari, M. Romberg, Niklas Selke, M. Schultz
{"title":"Enabling Canonical Analysis Workflows Documented Data Harmonization on Global Air Quality Data","authors":"S. Schröder, Eleonora Epp, A. Mozaffari, M. Romberg, Niklas Selke, M. Schultz","doi":"10.1162/dint_a_00130","DOIUrl":"https://doi.org/10.1162/dint_a_00130","url":null,"abstract":"Abstract Data harmonization and documentation of the data processing are essential prerequisites for enabling Canonical Analysis Workflows. The recently revised Terabyte-scale air quality database system, which the Tropospheric Ozone Assessment Report (TOAR) created, contains one of the world's largest collections of near-surface air quality measurements and considers FAIR data principles as an integral part. A special feature of our data service is the on-demand processing and product generation of several air quality metrics directly from the underlying database. In this paper, we show that the necessary data harmonization for establishing such online analysis services goes much deeper than the obvious issues of common data formats, variable names, and measurement units, and we explore how the generation of FAIR Digital Objects (FDO) in combination with automatically generated documentation may support Canonical Analysis Workflows for air quality and related data.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64531481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Scaling Notebooks as Re-configurable Cloud Workflows 将笔记本扩展为可重新配置的云工作流
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00140
Yuandou Wang, Spiros Koulouzis, Riccardo Bianchi, N. Li, Yifang Shi, J. Timmermans, W. Kissling, Zhiming Zhao
{"title":"Scaling Notebooks as Re-configurable Cloud Workflows","authors":"Yuandou Wang, Spiros Koulouzis, Riccardo Bianchi, N. Li, Yifang Shi, J. Timmermans, W. Kissling, Zhiming Zhao","doi":"10.1162/dint_a_00140","DOIUrl":"https://doi.org/10.1162/dint_a_00140","url":null,"abstract":"Abstract Literate computing environments, such as the Jupyter (i.e., Jupyter Notebooks, JupyterLab, and JupyterHub), have been widely used in scientific studies; they allow users to interactively develop scientific code, test algorithms, and describe the scientific narratives of the experiments in an integrated document. To scale up scientific analyses, many implemented Jupyter environment architectures encapsulate the whole Jupyter notebooks as reproducible units and autoscale them on dedicated remote infrastructures (e.g., highperformance computing and cloud computing environments). The existing solutions are still limited in many ways, e.g., 1) the workflow (or pipeline) is implicit in a notebook, and some steps can be generically used by different code and executed in parallel, but because of the tight cell structure, all steps in the Jupyter notebook have to be executed sequentially and lack of the flexibility of reusing the core code fragments, and 2) there are performance bottlenecks that need to improve the parallelism and scalability when handling extensive input data and complex computation. In this work, we focus on how to manage the workflow in a notebook seamlessly. We 1) encapsulate the reusable cells as RESTful services and containerize them as portal components, 2) provide a composition tool for describing workflow logic of those reusable components, and 3) automate the execution on remote cloud infrastructure. Empirically, we validate the solution's usability via a use case from the Ecology and Earth Science domain, illustrating the processing of massive Light Detection and Ranging (LiDAR) data. The demonstration and analysis show that our method is feasible, but that it needs further improvement, especially on integrating distributed workflow scheduling, automatic deployment, and execution to develop as a mature approach.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46210347","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 4
Analysis of Pioneering Computable Biomedical Knowledge Repositories and their Emerging Governance Structures 开创性的可计算生物医学知识库及其新兴治理结构分析
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-03-14 DOI: 10.1162/dint_a_00148
P. Amara, M. Conte, Allen J. Flynn, Jodyn E. Platt, Grace Trinidad
{"title":"Analysis of Pioneering Computable Biomedical Knowledge Repositories and their Emerging Governance Structures","authors":"P. Amara, M. Conte, Allen J. Flynn, Jodyn E. Platt, Grace Trinidad","doi":"10.1162/dint_a_00148","DOIUrl":"https://doi.org/10.1162/dint_a_00148","url":null,"abstract":"Abstract A growing interest in producing and sharing computable biomedical knowledge artifacts (CBKs) is increasing the demand for repositories that validate, catalog, and provide shared access to CBKs. However, there is a lack of evidence on how best to manage and sustain CBK repositories. In this paper, we present the results of interviews with several pioneering CBK repository owners. These interviews were informed by the Trusted Repositories Audit and Certification (TRAC) framework. Insights gained from these interviews suggest that the organizations operating CBK repositories are somewhat new, that their initial approaches to repository governance are informal, and that achieving economic sustainability for their CBK repositories is a major challenge. To enable a learning health system to make better use of its data intelligence, future approaches to CBK repository management will require enhanced governance and closer adherence to best practice frameworks to meet the needs of myriad biomedical science and health communities. More effort is needed to find sustainable funding models for accessible CBK artifact collections.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-03-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47280853","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canonical Workflows in Simulation-based Climate Sciences 基于模拟的气候科学中的规范工作流程
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-03-07 DOI: 10.1162/dint_a_00127
I. Anders, Karsten Peters-von Gehlen, H. Thiemann
{"title":"Canonical Workflows in Simulation-based Climate Sciences","authors":"I. Anders, Karsten Peters-von Gehlen, H. Thiemann","doi":"10.1162/dint_a_00127","DOIUrl":"https://doi.org/10.1162/dint_a_00127","url":null,"abstract":"Abstract In this paper we present the derivation of Canonical Workflow Modules from current workflows in simulation-based climate science in support of the elaboration of a corresponding framework for simulation-based research. We first identified the different users and user groups in simulation-based climate science based on their reasons for using the resources provided at the German Climate Computing Center (DKRZ). What is special about this is that the DKRZ provides the climate science community with resources like high performance computing (HPC), data storage and specialised services, and hosts the World Data Center for Climate (WDCC). Therefore, users can perform their entire research workflows up to the publication of the data on the same infrastructure. Our analysis shows, that the resources are used by two primary user types: those who require the HPC-system to perform resource intensive simulations to subsequently analyse them and those who reuse, build-on and analyse existing data. We then further subdivided these top-level user categories based on their specific goals and analysed their typical, idealised workflows applied to achieve the respective project goals. We find that due to the subdivision and further granulation of the user groups, the workflows show apparent differences. Nevertheless, similar “Canonical Workflow Modules” can be clearly made out. These modules are “Data and Software (Re)use”, “Compute”, “Data and Software Storing”, “Data and Software Publication”, “Generating Knowledge” and in their entirety form the basis for a Canonical Workflow Framework for Research (CWFR). It is desirable that parts of the workflows in a CWFR act as FDOs, but we view this aspect critically. Also, we reflect on the question whether the derivation of Canonical Workflow modules from the analysis of current user behaviour still holds for future systems and work processes.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44864013","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Reproducible Research Publication Workflow: A Canonical Workflow Framework and FAIR Digital Object Approach to Quality Research Output 可重复的研究出版工作流程:一个规范的工作流程框架和公平的数字对象方法的质量研究成果
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-03-07 DOI: 10.1162/dint_a_00133
Limor Peer, Claudia Biniossek, Dirk Betz, Thu-Mai Christian
{"title":"Reproducible Research Publication Workflow: A Canonical Workflow Framework and FAIR Digital Object Approach to Quality Research Output","authors":"Limor Peer, Claudia Biniossek, Dirk Betz, Thu-Mai Christian","doi":"10.1162/dint_a_00133","DOIUrl":"https://doi.org/10.1162/dint_a_00133","url":null,"abstract":"Abstract In this paper we present the Reproducible Research Publication Workflow (RRPW) as an example of how generic canonical workflows can be applied to a specific context. The RRPW includes essential steps between submission and final publication of the manuscript and the research artefacts (i.e., data, code, etc.) that underlie the scholarly claims in the manuscript. A key aspect of the RRPW is the inclusion of artefact review and metadata creation as part of the publication workflow. The paper discusses a formalized technical structure around a set of canonical steps which helps codify and standardize the process for researchers, curators, and publishers. The proposed application of canonical workflows can help achieve the goals of improved transparency and reproducibility, increase FAIR compliance of all research artefacts at all steps, and facilitate better exchange of annotated and machine-readable metadata.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46094283","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Using a Workflow Management Platform in Textual Data Management 工作流管理平台在文本数据管理中的应用
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-03-07 DOI: 10.1162/dint_a_00139
T. Doan, S. Bingert, R. Yahyapour
{"title":"Using a Workflow Management Platform in Textual Data Management","authors":"T. Doan, S. Bingert, R. Yahyapour","doi":"10.1162/dint_a_00139","DOIUrl":"https://doi.org/10.1162/dint_a_00139","url":null,"abstract":"Abstract The paper gives a brief introduction about the workflow management platform, Flowable, and how it is used for textual-data management. It is relatively new with its first release on 13 October, 2016. Despite the short time on the market, it seems to be quickly well-noticed with 4.6 thousand stars on GitHub at the moment. The focus of our project is to build a platform for text analysis on a large scale by including many different text resources. Currently, we have successfully connected to four different text resources and obtained more than one million works. Some resources are dynamic, which means that they might add more data or modify their current data. Therefore, it is necessary to keep data, both the metadata and the raw data, from our side up to date with the resources. In addition, to comply with FAIR principles, each work is assigned a persistent identifier (PID) and indexed for searching purposes. In the last step, we perform some standard analyses on the data to enhance our search engine and to generate a knowledge graph. End-users can utilize our platform to search on our data or get access to the knowledge graph. Furthermore, they can submit their code for their analyses to the system. The code will be executed on a High-Performance Cluster (HPC) and users can receive the results later on. In this case, Flowable can take advantage of PIDs for digital objects identification and management to facilitate the communication with the HPC system. As one may already notice, the whole process can be expressed as a workflow. A workflow, including error handling and notification, has been created and deployed. Workflow execution can be triggered manually or after predefined time intervals. According to our evaluation, the Flowable platform proves to be powerful and flexible. Further usage of the platform is already planned or implemented for many of our projects.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":null,"pages":null},"PeriodicalIF":3.9,"publicationDate":"2022-03-07","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"44504533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信