Smita Sahay, Jingran Wen, Daniel R Scoles, Anton Simeonov, Thomas S Dexheimer, Ajit Jadhav, Stephen C Kales, Hongmao Sun, Stefan M Pulst, Julio C Facelli, David E Jones
{"title":"利用高通量筛选和机器学习鉴定2型脊髓小脑共济失调的Ataxin-2抑制剂的分子特性。","authors":"Smita Sahay, Jingran Wen, Daniel R Scoles, Anton Simeonov, Thomas S Dexheimer, Ajit Jadhav, Stephen C Kales, Hongmao Sun, Stefan M Pulst, Julio C Facelli, David E Jones","doi":"10.3390/biology14050522","DOIUrl":null,"url":null,"abstract":"<p><p>Spinocerebellar ataxia type 2 (SCA2) is an autosomal dominant neurodegenerative disorder marked by cerebellar dysfunction, ataxic gait, and progressive motor impairments. SCA2 is caused by the pathologic expansion of CAG repeats in the ataxin-2 (<i>ATXN2</i>) gene, leading to a toxic gain-of-function mutation of the ataxin-2 protein. Currently, SCA2 therapeutic efforts are expanding beyond symptomatic relief to include disease-modifying approaches such as antisense oligonucleotides (ASOs), high-throughput screening (HTS) for small molecule inhibitors, and gene therapy aimed at reducing <i>ATXN2</i> expression. In the present study, data mining and machine learning techniques were employed to analyze HTS data and identify robust molecular properties of potential inhibitors of <i>ATXN2</i>. Three HTS datasets were selected for analysis: <i>ATXN2</i> gene expression, CMV promoter expression, and biochemical control (luciferase) gene expression. Compounds displaying significant <i>ATXN2</i> inhibition with minimal impact on control assays were deciphered based on effectiveness (E) values (<i>n</i> = 1321). Molecular descriptors associated with these compounds were calculated using MarvinSketch (<i>n</i> = 82). The molecular descriptor data (MD model) was analyzed separately from the experimentally determined screening data (S model) as well as together (MD-S model). Compounds were clustered based on structural similarity independently for the three models using the SimpleKMeans algorithm into the optimal number of clusters (<i>n</i> = 26). For each model, the maximum response assay values were analyzed, and E values and total rank values were applied. The S clusters were further subclustered, and the molecular properties of compounds in the top candidate subcluster were compared to those from the bottom candidate subcluster. Six compounds with high <i>ATXN2</i> inhibiting potential and 16 molecular descriptors were identified as significantly unique to those compounds (<i>p</i> < 0.05). These results are consistent with a quantitative HTS study that identified and validated similar small-molecule compounds, like cardiac glycosides, that reduce endogenous ATXN2 in a dose-dependent manner. Overall, these findings demonstrate that the integration of HTS analysis with data mining and machine learning is a promising approach for discovering chemical properties of candidate drugs for SCA2.</p>","PeriodicalId":48624,"journal":{"name":"Biology-Basel","volume":"14 5","pages":""},"PeriodicalIF":3.6000,"publicationDate":"2025-05-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108740/pdf/","citationCount":"0","resultStr":"{\"title\":\"Identifying Molecular Properties of Ataxin-2 Inhibitors for Spinocerebellar Ataxia Type 2 Utilizing High-Throughput Screening and Machine Learning.\",\"authors\":\"Smita Sahay, Jingran Wen, Daniel R Scoles, Anton Simeonov, Thomas S Dexheimer, Ajit Jadhav, Stephen C Kales, Hongmao Sun, Stefan M Pulst, Julio C Facelli, David E Jones\",\"doi\":\"10.3390/biology14050522\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Spinocerebellar ataxia type 2 (SCA2) is an autosomal dominant neurodegenerative disorder marked by cerebellar dysfunction, ataxic gait, and progressive motor impairments. SCA2 is caused by the pathologic expansion of CAG repeats in the ataxin-2 (<i>ATXN2</i>) gene, leading to a toxic gain-of-function mutation of the ataxin-2 protein. Currently, SCA2 therapeutic efforts are expanding beyond symptomatic relief to include disease-modifying approaches such as antisense oligonucleotides (ASOs), high-throughput screening (HTS) for small molecule inhibitors, and gene therapy aimed at reducing <i>ATXN2</i> expression. In the present study, data mining and machine learning techniques were employed to analyze HTS data and identify robust molecular properties of potential inhibitors of <i>ATXN2</i>. Three HTS datasets were selected for analysis: <i>ATXN2</i> gene expression, CMV promoter expression, and biochemical control (luciferase) gene expression. Compounds displaying significant <i>ATXN2</i> inhibition with minimal impact on control assays were deciphered based on effectiveness (E) values (<i>n</i> = 1321). Molecular descriptors associated with these compounds were calculated using MarvinSketch (<i>n</i> = 82). The molecular descriptor data (MD model) was analyzed separately from the experimentally determined screening data (S model) as well as together (MD-S model). Compounds were clustered based on structural similarity independently for the three models using the SimpleKMeans algorithm into the optimal number of clusters (<i>n</i> = 26). For each model, the maximum response assay values were analyzed, and E values and total rank values were applied. The S clusters were further subclustered, and the molecular properties of compounds in the top candidate subcluster were compared to those from the bottom candidate subcluster. Six compounds with high <i>ATXN2</i> inhibiting potential and 16 molecular descriptors were identified as significantly unique to those compounds (<i>p</i> < 0.05). These results are consistent with a quantitative HTS study that identified and validated similar small-molecule compounds, like cardiac glycosides, that reduce endogenous ATXN2 in a dose-dependent manner. Overall, these findings demonstrate that the integration of HTS analysis with data mining and machine learning is a promising approach for discovering chemical properties of candidate drugs for SCA2.</p>\",\"PeriodicalId\":48624,\"journal\":{\"name\":\"Biology-Basel\",\"volume\":\"14 5\",\"pages\":\"\"},\"PeriodicalIF\":3.6000,\"publicationDate\":\"2025-05-08\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12108740/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Biology-Basel\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.3390/biology14050522\",\"RegionNum\":3,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Biology-Basel","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.3390/biology14050522","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOLOGY","Score":null,"Total":0}
Identifying Molecular Properties of Ataxin-2 Inhibitors for Spinocerebellar Ataxia Type 2 Utilizing High-Throughput Screening and Machine Learning.
Spinocerebellar ataxia type 2 (SCA2) is an autosomal dominant neurodegenerative disorder marked by cerebellar dysfunction, ataxic gait, and progressive motor impairments. SCA2 is caused by the pathologic expansion of CAG repeats in the ataxin-2 (ATXN2) gene, leading to a toxic gain-of-function mutation of the ataxin-2 protein. Currently, SCA2 therapeutic efforts are expanding beyond symptomatic relief to include disease-modifying approaches such as antisense oligonucleotides (ASOs), high-throughput screening (HTS) for small molecule inhibitors, and gene therapy aimed at reducing ATXN2 expression. In the present study, data mining and machine learning techniques were employed to analyze HTS data and identify robust molecular properties of potential inhibitors of ATXN2. Three HTS datasets were selected for analysis: ATXN2 gene expression, CMV promoter expression, and biochemical control (luciferase) gene expression. Compounds displaying significant ATXN2 inhibition with minimal impact on control assays were deciphered based on effectiveness (E) values (n = 1321). Molecular descriptors associated with these compounds were calculated using MarvinSketch (n = 82). The molecular descriptor data (MD model) was analyzed separately from the experimentally determined screening data (S model) as well as together (MD-S model). Compounds were clustered based on structural similarity independently for the three models using the SimpleKMeans algorithm into the optimal number of clusters (n = 26). For each model, the maximum response assay values were analyzed, and E values and total rank values were applied. The S clusters were further subclustered, and the molecular properties of compounds in the top candidate subcluster were compared to those from the bottom candidate subcluster. Six compounds with high ATXN2 inhibiting potential and 16 molecular descriptors were identified as significantly unique to those compounds (p < 0.05). These results are consistent with a quantitative HTS study that identified and validated similar small-molecule compounds, like cardiac glycosides, that reduce endogenous ATXN2 in a dose-dependent manner. Overall, these findings demonstrate that the integration of HTS analysis with data mining and machine learning is a promising approach for discovering chemical properties of candidate drugs for SCA2.
期刊介绍:
Biology (ISSN 2079-7737) is an international, peer-reviewed, quick-refereeing open access journal of Biological Science published by MDPI online. It publishes reviews, research papers and communications in all areas of biology and at the interface of related disciplines. Our aim is to encourage scientists to publish their experimental and theoretical results in as much detail as possible. There is no restriction on the length of the papers. The full experimental details must be provided so that the results can be reproduced. Electronic files regarding the full details of the experimental procedure, if unable to be published in a normal way, can be deposited as supplementary material.