Artificial intelligence in the life sciences最新文献_第3页

Federated learning for predicting compound mechanism of action based on image-data from cell painting 基于细胞绘画图像数据预测化合物作用机制的联合学习

Artificial intelligence in the life sciences Pub Date : 2024-05-09 DOI: 10.1016/j.ailsci.2024.100098

Li Ju , Andreas Hellander , Ola Spjuth

{"title":"Federated learning for predicting compound mechanism of action based on image-data from cell painting","authors":"Li Ju , Andreas Hellander , Ola Spjuth","doi":"10.1016/j.ailsci.2024.100098","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100098","url":null,"abstract":"<div><p>Having access to sufficient data is essential in order to train accurate machine learning models, but much data is not publicly available. In drug discovery this is particularly evident, as much data is withheld at pharmaceutical companies for various reasons. Federated Learning (FL) aims at training a joint model between multiple parties but without disclosing data between the parties. In this work, we leverage Federated Learning to predict compound Mechanism of Action (MoA) using fluorescence image data from cell painting. Our study evaluates the effectiveness and efficiency of FL, comparing to non-collaborative and data-sharing collaborative learning in diverse scenarios. Specifically, we investigate the impact of data heterogeneity across participants on MoA prediction, an essential concern in real-life applications of FL, and demonstrate the benefits for all involved parties. This work highlights the potential of federated learning in multi-institutional collaborative machine learning for drug discovery and assessment of chemicals, offering a promising avenue to overcome data-sharing constraints.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"5 ","pages":"Article 100098"},"PeriodicalIF":0.0,"publicationDate":"2024-05-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000059/pdfft?md5=100e1ed9ac27f95816db906647d11bc0&pid=1-s2.0-S2667318524000059-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140951069","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

An integrated approach to predict activators of NRF2 - the transcription factor for oxidative stress response 预测氧化应激反应转录因子 NRF2 激活因子的综合方法

Artificial intelligence in the life sciences Pub Date : 2024-04-13 DOI: 10.1016/j.ailsci.2024.100097

Yaroslav Chushak , Rebecca A. Clewell

{"title":"An integrated approach to predict activators of NRF2 - the transcription factor for oxidative stress response","authors":"Yaroslav Chushak , Rebecca A. Clewell","doi":"10.1016/j.ailsci.2024.100097","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100097","url":null,"abstract":"<div><p>A variety of environmental and physiological conditions can cause oxidative stress that damage cellular components such as DNA, proteins and lipids. Oxidative stress is implicated in many human diseases including cancer, cardiovascular diseases, neurological diseases, inflammatory diseases, and aging. The nuclear factor erythroid 2–related factor 2 (NRF2) is a transcriptional factor that plays a key role in the cellular antioxidant defense system as it regulates transcription of antioxidant proteins and detoxifying enzymes. There is an urgent need to identify novel compounds that activate NRF2 and enhance antioxidant defense. We collected data from the high-throughput screening of NRF2 activators and identified molecular fragments (structural alerts) associated with the activation of NRF2. We also developed ten classification models using different types of molecular descriptors and machine learning techniques. Two approaches were used to establish the applicability domain of developed models: the structure-based approach and the distance to model approach. The best performing model that used message passing neural network (MPNN) technique showed accuracy of 87 % for the test set of chemicals within the distance to model of 0.3. The integrative approach using a combination of generated structural alerts and MPNN model was used to screen approved drugs collected in the DrugBank to identify potential NRF2 activators. Out of 2393 screened chemicals 138 compounds were predicted as NRF2 activators by both approaches. Analysis of these compounds showed that some drugs were already known activators of NRF2 while others are potentially novel activators.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"5 ","pages":"Article 100097"},"PeriodicalIF":0.0,"publicationDate":"2024-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000047/pdfft?md5=29a2ee24a6813324417f266b95b1e48d&pid=1-s2.0-S2667318524000047-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"140606623","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Artificial intelligence-open science symbiosis in chemoinformatics 化学信息学中的人工智能-开放科学共生关系

Artificial intelligence in the life sciences Pub Date : 2024-03-21 DOI: 10.1016/j.ailsci.2024.100096

Filip Miljković , José L. Medina-Franco

引用次数: 0

Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs 面对 GPT 虚伪的理性主义：以人类专家编辑的生物医学知识图谱为基准测试大型语言模型的输出结果

Artificial intelligence in the life sciences Pub Date : 2024-02-01 DOI: 10.1016/j.ailsci.2024.100095

Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius

{"title":"Rationalism in the face of GPT hypes: Benchmarking the output of large language models against human expert-curated biomedical knowledge graphs","authors":"Negin Sadat Babaiha , Sathvik Guru Rao , Jürgen Klein , Bruce Schultz , Marc Jacobs , Martin Hofmann-Apitius","doi":"10.1016/j.ailsci.2024.100095","DOIUrl":"https://doi.org/10.1016/j.ailsci.2024.100095","url":null,"abstract":"<div><p>Biomedical knowledge graphs (KGs) hold valuable information regarding biomedical entities such as genes, diseases, biological processes, and drugs. KGs have been successfully employed in challenging biomedical areas such as the identification of pathophysiology mechanisms or drug repurposing. The creation of high-quality KGs typically requires labor-intensive multi-database integration or substantial human expert curation, both of which take time and contribute to the workload of data processing and annotation. Therefore, the use of automatic systems for KG building and maintenance is a prerequisite for the wide uptake and utilization of KGs. Technologies supporting the automated generation and updating of KGs typically make use of Natural Language Processing (NLP), which is optimized for extracting implicit triples described in relevant biomedical text sources. At the core of this challenge is how to improve the accuracy and coverage of the information extraction module by utilizing different models and tools. The emergence of pre-trained large language models (LLMs), such as ChatGPT which has grown in popularity dramatically, has revolutionized the field of NLP, making them a potential candidate to be used in text-based graph creation as well. So far, no previous work has investigated the power of LLMs on the generation of cause-and-effect networks and KGs encoded in Biological Expression Language (BEL). In this paper, we present initial studies towards one-shot BEL relation extraction using two different versions of the Generative Pre-trained Transformer (GPT) models and evaluate its performance by comparing the extracted results to a highly accurate, manually curated BEL KG curated by domain experts.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"5 ","pages":"Article 100095"},"PeriodicalIF":0.0,"publicationDate":"2024-02-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318524000023/pdfft?md5=9137dd2a207653e4d13cb5b99ca17d48&pid=1-s2.0-S2667318524000023-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"139710160","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Origins and progression of the polypharmacology concept in drug discovery 药物发现中多药理学概念的起源与发展

Artificial intelligence in the life sciences Pub Date : 2024-01-03 DOI: 10.1016/j.ailsci.2024.100094

Jürgen Bajorath

引用次数: 0

Potential inconsistencies or artifacts in deriving and interpreting deep learning models and key criteria for scientifically sound applications in the life sciences 推导和解释深度学习模型时可能出现的不一致或人为因素，以及在生命科学领域科学合理应用的关键标准

Artificial intelligence in the life sciences Pub Date : 2023-12-11 DOI: 10.1016/j.ailsci.2023.100093

Jürgen Bajorath

引用次数: 0

Yoked learning in molecular data science 分子数据科学中的交配学习

Artificial intelligence in the life sciences Pub Date : 2023-12-02 DOI: 10.1016/j.ailsci.2023.100089

Zhixiong Li, Yan Xiang, Yujing Wen, Daniel Reker

{"title":"Yoked learning in molecular data science","authors":"Zhixiong Li, Yan Xiang, Yujing Wen, Daniel Reker","doi":"10.1016/j.ailsci.2023.100089","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100089","url":null,"abstract":"<div><p>Active machine learning is an established and increasingly popular experimental design technique where the machine learning model can request additional data to improve the model's predictive performance. It is generally assumed that this data is optimal for the machine learning model since it relies on the model's predictions or model architecture and therefore cannot be transferred to other models. Inspired by research in pedagogy, we here introduce the concept of yoked machine learning where a second machine learning model learns from the data selected by another model. We found that in 48% of the benchmarked combinations, yoked learning performed similar or better than active learning. We analyze distinct cases in which yoked learning can improve active learning performance. In particular, we prototype yoked deep learning (YoDeL) where a classic machine learning model provides data to a deep neural network, thereby mitigating challenges of active deep learning such as slow refitting time per learning iteration and poor performance on small datasets. In summary, we expect the new concept of yoked (deep) learning to provide a competitive option to boost the performance of active learning and benefit from distinct capabilities of multiple machine learning models during data acquisition, training, and deployment.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"5 ","pages":"Article 100089"},"PeriodicalIF":0.0,"publicationDate":"2023-12-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S2667318523000338/pdfft?md5=798e4cffb7539da96cce07297e51e3de&pid=1-s2.0-S2667318523000338-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138570365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

A supervised machine learning workflow for the reduction of highly dimensional biological data 用于减少高维生物数据的有监督机器学习工作流程

Artificial intelligence in the life sciences Pub Date : 2023-11-25 DOI: 10.1016/j.ailsci.2023.100090

Linnea K. Andersen , Benjamin J. Reading

{"title":"A supervised machine learning workflow for the reduction of highly dimensional biological data","authors":"Linnea K. Andersen , Benjamin J. Reading","doi":"10.1016/j.ailsci.2023.100090","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100090","url":null,"abstract":"<div><p>Recent technological advancements have revolutionized research capabilities across the biological sciences by enabling the collection of large data that provides a broader picture of systems from the cellular to ecosystem level at a more refined resolution. The rapid rate of generating these data has exacerbated bottlenecks in study design and data analysis approaches, especially as conventional methods that incorporate traditional statistical tests and assumptions are not suitable or sufficient for highly dimensional data (i.e., more than 1,000 variables). The application of machine learning techniques in large data analysis is one promising solution that is increasingly popular. However, limitations in expertise such that the results from machine learning models can be interpreted to gain meaningful biological insight pose a great challenge. To address this challenge, a user-friendly machine learning workflow that can be applied to a wide variety of data types to reduce these large data to those variables (attributes) most determinant of experimental and/or observed conditions is provided, as well as a general overview of data analysis and machine learning approaches and considerations thereof. The workflow presented here has been beta-tested with great success and is recommended to be incorporated into analysis pipelines of large data as a standardized approach to reduce data dimensionality. Moreover, the workflow is flexible, and the underlying concepts and steps can be modified to best suit user needs, objectives, and study parameters.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"5 ","pages":"Article 100090"},"PeriodicalIF":0.0,"publicationDate":"2023-11-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.sciencedirect.com/science/article/pii/S266731852300034X/pdfft?md5=c41b31c74fb0a867fbb87db01c8f6190&pid=1-s2.0-S266731852300034X-main.pdf","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"138739061","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

First-generation themed article collections 第一代主题文集

Artificial intelligence in the life sciences Pub Date : 2023-11-15 DOI: 10.1016/j.ailsci.2023.100088

Jürgen Bajorath, Steve Gardner, Francesca Grisoni, Carolina Horta Andrade, Johannes Kirchmair, Melissa Landon, José L. Medina-Franco, Filip Miljković, Floriane Montantari, Raquel Rodríguez-Pérez

引用次数: 0

Experimental Uncertainty in Training Data for Protein-Ligand Binding Affinity Prediction Models 蛋白质配体结合亲和力预测模型训练数据的实验不确定性

Artificial intelligence in the life sciences Pub Date : 2023-10-04 DOI: 10.1016/j.ailsci.2023.100087

Carlos A. Hernández-Garrido , Norberto Sánchez-Cruz

{"title":"Experimental Uncertainty in Training Data for Protein-Ligand Binding Affinity Prediction Models","authors":"Carlos A. Hernández-Garrido , Norberto Sánchez-Cruz","doi":"10.1016/j.ailsci.2023.100087","DOIUrl":"https://doi.org/10.1016/j.ailsci.2023.100087","url":null,"abstract":"<div><p>The accuracy of machine learning models for protein-ligand binding affinity prediction depends on the quality of the experimental data they are trained on. Most of these models are trained and tested on different subsets of the PDBbind database, which is the main source of protein-ligand complexes with annotated binding affinity in the public domain. However, estimating its experimental uncertainty is not straightforward because just a few protein-ligand complexes have more than one measurement associated. In this work, we analyze bioactivity data from ChEMBL to estimate the experimental uncertainty associated with the three binding affinity measures included in the PDBbind (K<sub>i</sub>, K<sub>d</sub>, and IC<sub>50</sub>), as well as the effect of combining them. The experimental uncertainty of combining these three affinity measures was characterized by a mean absolute error of 0.78 logarithmic units, a root mean square error of 1.04 and a Pearson correlation coefficient of 0.76. These estimations were contrasted with the performances obtained by state-of-the-art machine learning models for binding affinity prediction, showing that these models tend to be overoptimistic when evaluated on the core set from PDBbind.</p></div>","PeriodicalId":72304,"journal":{"name":"Artificial intelligence in the life sciences","volume":"4 ","pages":"Article 100087"},"PeriodicalIF":0.0,"publicationDate":"2023-10-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"49711349","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0