{"title":"预测脂质纳米颗粒核酸递送性能的机器学习框架","authors":"Gaurav Kumar*, and , Arezoo M. Ardekani, ","doi":"10.1021/acsabm.4c0171610.1021/acsabm.4c01716","DOIUrl":null,"url":null,"abstract":"<p >Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and <i>in vitro</i>/<i>in vivo</i> activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge <i>in vitro</i>-to-<i>in vivo</i> prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the <i>in vivo</i> data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.</p>","PeriodicalId":2,"journal":{"name":"ACS Applied Bio Materials","volume":"8 5","pages":"3717–3727 3717–3727"},"PeriodicalIF":4.7000,"publicationDate":"2025-04-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine-Learning Framework to Predict the Performance of Lipid Nanoparticles for Nucleic Acid Delivery\",\"authors\":\"Gaurav Kumar*, and , Arezoo M. Ardekani, \",\"doi\":\"10.1021/acsabm.4c0171610.1021/acsabm.4c01716\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and <i>in vitro</i>/<i>in vivo</i> activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge <i>in vitro</i>-to-<i>in vivo</i> prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the <i>in vivo</i> data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.</p>\",\"PeriodicalId\":2,\"journal\":{\"name\":\"ACS Applied Bio Materials\",\"volume\":\"8 5\",\"pages\":\"3717–3727 3717–3727\"},\"PeriodicalIF\":4.7000,\"publicationDate\":\"2025-04-23\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"ACS Applied Bio Materials\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acsabm.4c01716\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"MATERIALS SCIENCE, BIOMATERIALS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"ACS Applied Bio Materials","FirstCategoryId":"1085","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acsabm.4c01716","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"MATERIALS SCIENCE, BIOMATERIALS","Score":null,"Total":0}
Machine-Learning Framework to Predict the Performance of Lipid Nanoparticles for Nucleic Acid Delivery
Lipid nanoparticles (LNPs) are highly effective carriers for gene therapies, including mRNA and siRNA delivery, due to their ability to transport nucleic acids across biological membranes, low cytotoxicity, improved pharmacokinetics, and scalability. A typical approach to formulate LNPs is to establish a quantitative structure–activity relationship (QSAR) between their compositions and in vitro/in vivo activities, which allows for the prediction of activity based on molecular structure. However, developing QSAR for LNPs can be challenging due to the complexity of multicomponent formulations, interactions with biological membranes, stability in physiological environments, and diverse physicochemical properties. To address these challenges, we developed a machine-learning (ML) framework to predict the activity and cell viability of LNPs for nucleic acid delivery. We curated data from 6454 LNP formulations reported across 21 independent studies and implemented 11 different molecular featurization techniques, ranging from descriptors and fingerprints to graph-based representations, alongside six ML algorithms for binary and multiclass classification. Using scaffold-based 5-fold cross-validation, our models achieved classification accuracies exceeding 90% for both activity and cell viability prediction tasks. Among all model-feature combinations, descriptor-based features combined with ensemble models such as balanced random forest and extra trees yielded the highest performance. Through SHAP-based feature attribution and interaction analysis, we identified key physicochemical properties and compositional features driving the LNP performance, highlighting the importance of synergistic effects among multiple molecular features. Furthermore, we developed a transfer-learning strategy to bridge in vitro-to-in vivo prediction gaps by incorporating base model predictions along with additional biological attributes, such as the particle size, polydispersity index, and ζ potential. Despite the smaller size and inherent class imbalance of the in vivo data set, the transfer-learning models demonstrated a promising predictive performance, with accuracies exceeding 82%. Our findings underscore the potential of interpretable ML frameworks to guide rational LNP design and provide a scalable approach to QSAR modeling in complex nanomedicine systems.
期刊介绍:
ACS Applied Bio Materials is an interdisciplinary journal publishing original research covering all aspects of biomaterials and biointerfaces including and beyond the traditional biosensing, biomedical and therapeutic applications.
The journal is devoted to reports of new and original experimental and theoretical research of an applied nature that integrates knowledge in the areas of materials, engineering, physics, bioscience, and chemistry into important bio applications. The journal is specifically interested in work that addresses the relationship between structure and function and assesses the stability and degradation of materials under relevant environmental and biological conditions.