{"title":"GraphSLA:用于预测小分子- lncRNA关联的图机器学习","authors":"Ashish Panghalia, Parth Kumar, Vikram Singh","doi":"10.1016/j.aichem.2025.100094","DOIUrl":null,"url":null,"abstract":"<div><div>Long non-coding RNAs are increasingly reported to have critical roles in gene expression, regulation of cellular processes, and in the onset and manifestation of various diseases. Recent studies have highlighted the role of small molecules (SMs) in controlling the functioning of lncRNAs, making SM-lncRNA associations (SLAs) a promising approach for therapeutic development. In this study, using 3563 curated SLAs among 115 SMs and 2826 lncRNAs, five graph learning algorithms are developed for the SLA classification. Node2Vec was used to extract the contextual features of SMs and lncRNAs from their bipartite association network, while Mol2Vec and Doc2Vec algorithms were used for the extraction of molecular features of the SMs and lncRNAs, respectively. Principal components corresponding to the 95 % variability in feature vectors were used to train five graph-learning models, namely, Graph Neural Network (GNN), Graph Convolutional Network (GCN), Graph Attention Network (GAT), Graph Sample and Aggregate (GraphSAGE), and Simplified Graph Convolution (SGConv). Among these five models, GraphSAGE achieved the best performance with an accuracy of 98.0 % and an AUC-ROC of 99.4 % when evaluated over 10 training epochs. Generalizability studies were also conducted to assess whether the developed models maintain robustness, reliability, and practical utility when applied to real-world data. The overall results reported in this work exhibit better performance over previously developed SLA prediction methods. This study underscores the potential of graph-learning methods to effectively capture the intricate associations among SMs and lncRNAs, facilitating the discovery of novel SLAs.</div></div>","PeriodicalId":72302,"journal":{"name":"Artificial intelligence chemistry","volume":"3 2","pages":"Article 100094"},"PeriodicalIF":0.0000,"publicationDate":"2025-08-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"GraphSLA: Graph machine learning for predicting small molecule - lncRNA associations\",\"authors\":\"Ashish Panghalia, Parth Kumar, Vikram Singh\",\"doi\":\"10.1016/j.aichem.2025.100094\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<div><div>Long non-coding RNAs are increasingly reported to have critical roles in gene expression, regulation of cellular processes, and in the onset and manifestation of various diseases. Recent studies have highlighted the role of small molecules (SMs) in controlling the functioning of lncRNAs, making SM-lncRNA associations (SLAs) a promising approach for therapeutic development. In this study, using 3563 curated SLAs among 115 SMs and 2826 lncRNAs, five graph learning algorithms are developed for the SLA classification. Node2Vec was used to extract the contextual features of SMs and lncRNAs from their bipartite association network, while Mol2Vec and Doc2Vec algorithms were used for the extraction of molecular features of the SMs and lncRNAs, respectively. Principal components corresponding to the 95 % variability in feature vectors were used to train five graph-learning models, namely, Graph Neural Network (GNN), Graph Convolutional Network (GCN), Graph Attention Network (GAT), Graph Sample and Aggregate (GraphSAGE), and Simplified Graph Convolution (SGConv). Among these five models, GraphSAGE achieved the best performance with an accuracy of 98.0 % and an AUC-ROC of 99.4 % when evaluated over 10 training epochs. Generalizability studies were also conducted to assess whether the developed models maintain robustness, reliability, and practical utility when applied to real-world data. The overall results reported in this work exhibit better performance over previously developed SLA prediction methods. This study underscores the potential of graph-learning methods to effectively capture the intricate associations among SMs and lncRNAs, facilitating the discovery of novel SLAs.</div></div>\",\"PeriodicalId\":72302,\"journal\":{\"name\":\"Artificial intelligence chemistry\",\"volume\":\"3 2\",\"pages\":\"Article 100094\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-08-11\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Artificial intelligence chemistry\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://www.sciencedirect.com/science/article/pii/S2949747725000119\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Artificial intelligence chemistry","FirstCategoryId":"1085","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S2949747725000119","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
GraphSLA: Graph machine learning for predicting small molecule - lncRNA associations
Long non-coding RNAs are increasingly reported to have critical roles in gene expression, regulation of cellular processes, and in the onset and manifestation of various diseases. Recent studies have highlighted the role of small molecules (SMs) in controlling the functioning of lncRNAs, making SM-lncRNA associations (SLAs) a promising approach for therapeutic development. In this study, using 3563 curated SLAs among 115 SMs and 2826 lncRNAs, five graph learning algorithms are developed for the SLA classification. Node2Vec was used to extract the contextual features of SMs and lncRNAs from their bipartite association network, while Mol2Vec and Doc2Vec algorithms were used for the extraction of molecular features of the SMs and lncRNAs, respectively. Principal components corresponding to the 95 % variability in feature vectors were used to train five graph-learning models, namely, Graph Neural Network (GNN), Graph Convolutional Network (GCN), Graph Attention Network (GAT), Graph Sample and Aggregate (GraphSAGE), and Simplified Graph Convolution (SGConv). Among these five models, GraphSAGE achieved the best performance with an accuracy of 98.0 % and an AUC-ROC of 99.4 % when evaluated over 10 training epochs. Generalizability studies were also conducted to assess whether the developed models maintain robustness, reliability, and practical utility when applied to real-world data. The overall results reported in this work exhibit better performance over previously developed SLA prediction methods. This study underscores the potential of graph-learning methods to effectively capture the intricate associations among SMs and lncRNAs, facilitating the discovery of novel SLAs.