Data Intelligence最新文献

筛选
英文 中文
Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems 回归问题机器学习模型的比较评价与综合分析
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-07-01 DOI: 10.1162/dint_a_00155
Boran Sekerogiu, Y. K. Ever, Kamil Dimililer, F. Al-turjman
{"title":"Comparative Evaluation and Comprehensive Analysis of Machine Learning Models for Regression Problems","authors":"Boran Sekerogiu, Y. K. Ever, Kamil Dimililer, F. Al-turjman","doi":"10.1162/dint_a_00155","DOIUrl":"https://doi.org/10.1162/dint_a_00155","url":null,"abstract":"Abstract Artificial intelligence and machine learning applications are of significant importance almost in every field of human life to solve problems or support human experts. However, the determination of the machine learning model to achieve a superior result for a particular problem within the wide real-life application areas is still a challenging task for researchers. The success of a model could be affected by several factors such as dataset characteristics, training strategy and model responses. Therefore, a comprehensive analysis is required to determine model ability and the efficiency of the considered strategies. This study implemented ten benchmark machine learning models on seventeen varied datasets. Experiments are performed using four different training strategies 60:40, 70:30, and 80:20 hold-out and five-fold cross-validation techniques. We used three evaluation metrics to evaluate the experimental results: mean squared error, mean absolute error, and coefficient of determination (R2 score). The considered models are analyzed, and each model's advantages, disadvantages, and data dependencies are indicated. As a result of performed excess number of experiments, the deep Long-Short Term Memory (LSTM) neural network outperformed other considered models, namely, decision tree, linear regression, support vector regression with a linear and radial basis function kernels, random forest, gradient boosting, extreme gradient boosting, shallow neural network, and deep neural network. It has also been shown that cross-validation has a tremendous impact on the results of the experiments and should be considered for the model evaluation in regression studies where data mining or selection is not performed.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"620-652"},"PeriodicalIF":3.9,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42312983","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 6
Fuzzy-Constrained Graph Pattern Matching in Medical Knowledge Graphs 医学知识图中的模糊约束图模式匹配
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-07-01 DOI: 10.1162/dint_a_00153
Lei Li, Xun Du, Zan Zhang, Zhenchao Tao
{"title":"Fuzzy-Constrained Graph Pattern Matching in Medical Knowledge Graphs","authors":"Lei Li, Xun Du, Zan Zhang, Zhenchao Tao","doi":"10.1162/dint_a_00153","DOIUrl":"https://doi.org/10.1162/dint_a_00153","url":null,"abstract":"Abstract The research on graph pattern matching (GPM) has attracted a lot of attention. However, most of the research has focused on complex networks, and there are few researches on GPM in the medical field. Hence, with GPM this paper is to make a breast cancer-oriented diagnosis before the surgery. Technically, this paper has firstly made a new definition of GPM, aiming to explore the GPM in the medical field, especially in Medical Knowledge Graphs (MKGs). Then, in the specific matching process, this paper introduces fuzzy calculation, and proposes a multi-threaded bidirectional routing exploration (M-TBRE) algorithm based on depth first search and a two-way routing matching algorithm based on multi-threading. In addition, fuzzy constraints are introduced in the M-TBRE algorithm, which leads to the Fuzzy-M-TBRE algorithm. The experimental results on the two datasets show that compared with existing algorithms, our proposed algorithm is more efficient and effective.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"599-619"},"PeriodicalIF":3.9,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"46254545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text 临床文本中复杂时间表达的知识表示与推理
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-07-01 DOI: 10.1162/dint_a_00152
Danyang Hu, Meng Wang, Feng Gao, Fangfang Xu, J. Gu
{"title":"Knowledge Representation and Reasoning for Complex Time Expression in Clinical Text","authors":"Danyang Hu, Meng Wang, Feng Gao, Fangfang Xu, J. Gu","doi":"10.1162/dint_a_00152","DOIUrl":"https://doi.org/10.1162/dint_a_00152","url":null,"abstract":"Abstract Temporal information is pervasive and crucial in medical records and other clinical text, as it formulates the development process of medical conditions and is vital for clinical decision making. However, providing a holistic knowledge representation and reasoning framework for various time expressions in the clinical text is challenging. In order to capture complex temporal semantics in clinical text, we propose a novel Clinical Time Ontology (CTO) as an extension from OWL framework. More specifically, we identified eight time-related problems in clinical text and created 11 core temporal classes to conceptualize the fuzzy time, cyclic time, irregular time, negations and other complex aspects of clinical time. Then, we extended Allen's and TEO's temporal relations and defined the relation concept description between complex and simple time. Simultaneously, we provided a formulaic and graphical presentation of complex time and complex time relationships. We carried out empirical study on the expressiveness and usability of CTO using real-world healthcare datasets. Finally, experiment results demonstrate that CTO could faithfully represent and reason over 93% of the temporal expressions, and it can cover a wider range of time-related classes in clinical domain.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"573-598"},"PeriodicalIF":3.9,"publicationDate":"2022-07-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"48042330","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 2
The Integration of a Canonical Workflow Framework with an Informatics System for Disease Area Research 典型工作流程框架与疾病领域研究信息系统的集成
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00125
V. Navale, Matthew McAuliffe
{"title":"The Integration of a Canonical Workflow Framework with an Informatics System for Disease Area Research","authors":"V. Navale, Matthew McAuliffe","doi":"10.1162/dint_a_00125","DOIUrl":"https://doi.org/10.1162/dint_a_00125","url":null,"abstract":"Abstract A recurring pattern of access to existing databases, data analyses, formulation of new hypotheses, use of an experimental design, institutional review board approvals, data collection, curation, and storage within trusted digital repositories is observable during clinical research work. The workflows that support the repeated nature of these activities can be ascribed as a Canonical Workflow Framework for Research (CWFR). Disease area clinical research is protocol specific, and during data collection, the electronic case report forms can use Common Data Elements (CDEs) that have precisely defined questions and are associated with the specified value(s) as responses. The CDE-based CWFR is integrated with a biomedical research informatics computing system, which consists of a complete stack of technical layers including the Protocol and Form Research Management System. The unique data dictionaries associated with the CWFR for Traumatic Brain Injury and Parkinson's Disease resulted in the development of the Federal Interagency Traumatic Brain Injury and Parkinson's Disease Biomarker systems. Due to a canonical workflow, these two systems can use similar tools, applications, and service modules to create findable, accessible, interoperable, and reusable Digital Objects. The Digital Objects for Traumatic Brain Injury and Parkinson's disease contain all relevant information needed from the time data is collected, validated, and maintained within a Storage Repository for future access. All Traumatic Brain Injury and Parkinson's Disease studies can be shared as Research Objects that can be produced by aggregating related resources as information packages and is findable on the Internet by using unique identifiers. Overall, the integration of CWFR with an informatics system has resulted in the reuse of software applications for several National Institutes of Health-supported biomedical research programs.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"186-195"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42284006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
Evaluation of Application Possibilities for Packaging Technologies in Canonical Workflows 规范工作流中包装技术应用可能性的评估
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00137
T. Jejkal, Sabrine Chelbi, A. Pfeil, P. Wittenburg
{"title":"Evaluation of Application Possibilities for Packaging Technologies in Canonical Workflows","authors":"T. Jejkal, Sabrine Chelbi, A. Pfeil, P. Wittenburg","doi":"10.1162/dint_a_00137","DOIUrl":"https://doi.org/10.1162/dint_a_00137","url":null,"abstract":"Abstract In Canonical Workflow Framework for Research (CWFR) “packages” are relevant in two different directions. In data science, workflows are in general being executed on a set of files which have been aggregated for specific purposes, such as for training a model in deep learning. We call this type of “package” a data collection and its aggregation and metadata description is motivated by research interests. The other type of “packages” relevant for CWFR are supposed to represent workflows in a self-describing and self-contained way for later execution. In this paper, we will review different packaging technologies and investigate their usability in the context of CWFR. For this purpose, we draw on an exemplary use case and show how packaging technologies can support its realization. We conclude that packaging technologies of different flavors help on providing inputs and outputs for workflow steps in a machine-readable way, as well as on representing a workflow and all its artifacts in a self-describing and self-contained way.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"372-385"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"47834491","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Galaxy: A Decade of Realising CWFR Concepts 银河:实现CWFR概念的十年
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00136
Beatriz Serrano-Solano, A. Fouilloux, Ignacio Eguinoa, Matúš Kalaš, B. Grüning, Frederik Coppens
{"title":"Galaxy: A Decade of Realising CWFR Concepts","authors":"Beatriz Serrano-Solano, A. Fouilloux, Ignacio Eguinoa, Matúš Kalaš, B. Grüning, Frederik Coppens","doi":"10.1162/dint_a_00136","DOIUrl":"https://doi.org/10.1162/dint_a_00136","url":null,"abstract":"Abstract Despite recent encouragement to follow the FAIR principles, the day-to-day research practices have not changed substantially. Due to new developments and the increasing pressure to apply best practices, initiatives to improve the efficiency and reproducibility of scientific workflows are becoming more prevalent. In this article, we discuss the importance of well-annotated tools and the specific requirements to ensure reproducible research with FAIR outputs. We detail how Galaxy, an open-source workflow management system with a web-based interface, has implemented the concepts that are put forward by the Canonical Workflow Framework for Research (CWFR), whilst minimising changes to the practices of scientific communities. Although we showcase concrete applications from two different domains, this approach is generalisable to any domain and particularly useful in interdisciplinary research and science-based applications.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"358-371"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49187666","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 3
Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research 编者注:研究规范工作流框架特刊
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_e_00122
P. Wittenburg, A. Hardisty, Amirpasha Mozzafari, Limor Peer, N. Skvortsov, A. Spinuso, Zhiming Zhao
{"title":"Editors’ Note: Special Issue on Canonical Workflow Frameworks for Research","authors":"P. Wittenburg, A. Hardisty, Amirpasha Mozzafari, Limor Peer, N. Skvortsov, A. Spinuso, Zhiming Zhao","doi":"10.1162/dint_e_00122","DOIUrl":"https://doi.org/10.1162/dint_e_00122","url":null,"abstract":"1Gemeindweg 55, 47533 Kleve, Germany 2Cardiff University, Cardiff, South Glamorgan , CF14 3UX, Wales, UK 3Forschungszentrum Jülich GmbH, 52425 Jülich, Germany 4Institution for Social and Policy Studies, Yale University, New Haven, CT 06520, USA 5Vavilov 44/2, 121351 Moscow, Russia 6Utrechtseweg 297, 3731 GA De Bilt, the Netherlands 7University of Amsterdam, PO-Box 94323, 1090 GH Amsterdam, the Netherlands","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"149-154"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"45697513","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canonical Workflow for Experimental Research 规范的实验研究工作流程
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00123
Dirk Betz, Claudia Biniossek, Christophe Blanchi, Felix Henninger, T. Lauer, P. Wieder, P. Wittenburg, M. Zünkeler
{"title":"Canonical Workflow for Experimental Research","authors":"Dirk Betz, Claudia Biniossek, Christophe Blanchi, Felix Henninger, T. Lauer, P. Wieder, P. Wittenburg, M. Zünkeler","doi":"10.1162/dint_a_00123","DOIUrl":"https://doi.org/10.1162/dint_a_00123","url":null,"abstract":"Abstract The overall expectation of introducing Canonical Workflow for Experimental Research and FAIR digital objects (FDOs) can be summarised as reducing the gap between workflow technology and research practices to make experimental work more efficient and improve FAIRness without adding administrative load on the researchers. In this document, we will describe, with the help of an example, how CWFR could work in detail and improve research procedures. We have chosen the example of “experiments with human subjects” which stretches from planning an experiment to storing the collected data in a repository. While we focus on experiments with human subjects, we are convinced that CWFR can be applied to many other data generation processes based on experiments. The main challenge is to identify repeating patterns in existing research practices that can be abstracted to create CWFR. In this document, we will include detailed examples from different disciplines to demonstrate that CWFR can be implemented without violating specific disciplinary or methodological requirements. We do not claim to be comprehensive in all aspects, since these examples are meant to prove the concept of CWFR.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"155-172"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"42683678","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Canonical Workflow for Machine Learning Tasks 机器学习任务的规范工作流
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00124
Christophe Blanchi, B. Gebre, P. Wittenburg
{"title":"Canonical Workflow for Machine Learning Tasks","authors":"Christophe Blanchi, B. Gebre, P. Wittenburg","doi":"10.1162/dint_a_00124","DOIUrl":"https://doi.org/10.1162/dint_a_00124","url":null,"abstract":"Abstract There is a huge gap between (1) the state of workflow technology on the one hand and the practices in the many labs working with data driven methods on the other and (2) the awareness of the FAIR principles and the lack of changes in practices during the last 5 years. The CWFR concept has been defined which is meant to combine these two intentions, increasing the use of workflow technology and improving FAIR compliance. In the study described in this paper we indicate how this could be applied to machine learning which is now used by almost all research disciplines with the well-known effects of a huge lack of repeatability and reproducibility. Researchers will only change practices if they can work efficiently and are not loaded with additional tasks. A comprehensive CWFR framework would be an umbrella for all steps that need to be carried out to do machine learning on selected data collections and immediately create a comprehensive and FAIR compliant documentation. The researcher is guided by such a framework and information once entered can easily be shared and reused. The many iterations normally required in machine learning can be dealt with efficiently using CWFR methods. Libraries of components that can be easily orchestrated using FAIR Digital Objects as a common entity to document all actions and to exchange information between steps without the researcher needing to understand anything about PIDs and FDO details is probably the way to increase efficiency in repeating research workflows. As the Galaxy project indicates, the availability of supporting tools will be important to let researchers use these methods. Other as the Galaxy framework suggests, however, it would be necessary to include all steps necessary for doing a machine learning task including those that require human interaction and to document all phases with the help of structured FDOs.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"173-185"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"41320073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Enabling Canonical Analysis Workflows Documented Data Harmonization on Global Air Quality Data 实现全球空气质量数据规范化分析工作流程文档化数据协调
IF 3.9 3区 计算机科学
Data Intelligence Pub Date : 2022-04-01 DOI: 10.1162/dint_a_00130
S. Schröder, Eleonora Epp, A. Mozaffari, M. Romberg, Niklas Selke, M. Schultz
{"title":"Enabling Canonical Analysis Workflows Documented Data Harmonization on Global Air Quality Data","authors":"S. Schröder, Eleonora Epp, A. Mozaffari, M. Romberg, Niklas Selke, M. Schultz","doi":"10.1162/dint_a_00130","DOIUrl":"https://doi.org/10.1162/dint_a_00130","url":null,"abstract":"Abstract Data harmonization and documentation of the data processing are essential prerequisites for enabling Canonical Analysis Workflows. The recently revised Terabyte-scale air quality database system, which the Tropospheric Ozone Assessment Report (TOAR) created, contains one of the world's largest collections of near-surface air quality measurements and considers FAIR data principles as an integral part. A special feature of our data service is the on-demand processing and product generation of several air quality metrics directly from the underlying database. In this paper, we show that the necessary data harmonization for establishing such online analysis services goes much deeper than the obvious issues of common data formats, variable names, and measurement units, and we explore how the generation of FAIR Digital Objects (FDO) in combination with automatically generated documentation may support Canonical Analysis Workflows for air quality and related data.","PeriodicalId":34023,"journal":{"name":"Data Intelligence","volume":"4 1","pages":"259-270"},"PeriodicalIF":3.9,"publicationDate":"2022-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"64531481","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 1
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信