预测脂质纳米颗粒核酸递送性能的机器学习框架

IF 4.7 Q2 MATERIALS SCIENCE, BIOMATERIALS

ACS Applied Bio Materials Pub Date : 2025-04-23 DOI:10.1021/acsabm.4c0171610.1021/acsabm.4c01716

Gaurav Kumar*, and , Arezoo M. Ardekani,

{"title":"预测脂质纳米颗粒核酸递送性能的机器学习框架","authors":"Gaurav Kumar*,  and , Arezoo M. Ardekani, ","doi":"10.1021/acsabm.4c0171610.1021/acsabm.4c01716","DOIUrl":null,"url":null,"abstract":"Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and in vitro/in vivo activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge in vitro-to-in vivo prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the in vivo data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"8 5","pages":"3717–3727 3717–3727"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine-Learning Framework to Predict the Performance of Lipid Nanoparticles for Nucleic Acid Delivery\",\"authors\":\"Gaurav Kumar*,  and , Arezoo M. Ardekani, \",\"doi\":\"10.1021/acsabm.4c0171610.1021/acsabm.4c01716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and in vitro/in vivo activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge in vitro-to-in vivo prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the in vivo data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":\"8 5\",\"pages\":\"3717–3727 3717–3727\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acsabm.4c01716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsabm.4c01716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}

引用次数: 0

摘要

脂质纳米颗粒（LNPs）是基因治疗的高效载体，包括mRNA和siRNA的递送，因为它们具有跨生物膜运输核酸的能力、低细胞毒性、改善的药代动力学和可扩展性。建立LNPs的典型方法是在其组成和体外/体内活性之间建立定量构效关系（QSAR），从而可以基于分子结构预测活性。然而，由于多组分配方的复杂性、与生物膜的相互作用、生理环境的稳定性以及不同的物理化学性质，为LNPs开发QSAR可能具有挑战性。为了解决这些挑战，我们开发了一个机器学习（ML）框架来预测LNPs的活性和细胞活力，以进行核酸递送。我们整理了21项独立研究报告的6454种LNP配方的数据，并实施了11种不同的分子特征技术，从描述符和指纹到基于图的表示，以及六种用于二元和多类分类的ML算法。使用基于支架的5倍交叉验证，我们的模型在活性和细胞活力预测任务中实现了超过90%的分类精度。在所有的模型-特征组合中，基于描述符的特征与集成模型（如平衡随机森林和额外树）相结合产生了最高的性能。通过基于shap的特征归因和相互作用分析，我们确定了驱动LNP性能的关键物理化学性质和组成特征，强调了多个分子特征之间协同效应的重要性。此外，我们开发了一种迁移学习策略，通过结合基本模型预测以及额外的生物属性，如粒径，多分散性指数和ζ电位，来弥合体外到体内的预测差距。尽管体内数据集的规模较小且固有的类别不平衡，但迁移学习模型显示出有希望的预测性能，准确率超过82%。我们的研究结果强调了可解释的ML框架在指导合理LNP设计和为复杂纳米医学系统中的QSAR建模提供可扩展方法方面的潜力。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

Machine-Learning Framework to Predict the Performance of Lipid Nanoparticles for Nucleic Acid Delivery

查看原文本刊更多论文

Machine-Learning Framework to Predict the Performance of Lipid Nanoparticles for Nucleic Acid Delivery

Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and in vitro/in vivo activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge in vitro-to-in vivo prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the in vivo data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

ACS Applied Bio Materials Chemistry-Chemistry (all)

CiteScore

9.40

自引率

2.10%

发文量

464

期刊介绍： ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications. The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.