GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability.

Paolo Climaco, Noelle M Mitchell, Matthew Tyler, Kyungae Yang, Anne M Andrews, Andrea L Bertozzi
{"title":"GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability.","authors":"Paolo Climaco, Noelle M Mitchell, Matthew Tyler, Kyungae Yang, Anne M Andrews, Andrea L Bertozzi","doi":"10.1016/j.mbs.2025.109485","DOIUrl":null,"url":null,"abstract":"<p><p>Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.</p>","PeriodicalId":94129,"journal":{"name":"Mathematical biosciences","volume":" ","pages":"109485"},"PeriodicalIF":0.0000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical biosciences","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1016/j.mbs.2025.109485","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.

GMFOLD:用于高通量dna适体二级结构分类和机器学习可解释性的子图匹配。
适配体是一种寡核苷酸受体,能以高亲和力与靶标结合。在这里,我们考虑由单链DNA组成的适体,经过靶结合诱导的构象变化,产生独特的二级和三级结构。给定特定的适配体一级序列,有完善的计算工具(特别是mfold)通过自由能最小化算法来预测二级结构。当mfold为单个序列生成二级结构时,需要一个高通量的过程,以便在交互式设置中实时预测数千个DNA结构,当与适体选择相结合时,产生的候选池太大而无法进行实验查询。我们开发了一个新的Python代码用于高通量适配体二级结构确定(GMfold)。GMfold采用子图匹配方法,根据二级结构相似性对候选适配体进行分组。我们还改进了一个开源代码SeqFold,以纳入子图匹配的概念。我们将每个二级结构表示为DNA图与自身匹配的最低能量二部子图。这些新工具可以使用机器学习算法,根据它们的二级结构对数千个DNA序列进行比较。当分析通过配体的系统进化通过指数富集(SELEX)的适体选择产生的序列时,该过程是有利的。这项工作是未来基于机器学习的dna适体选择过程的基石,以识别具有更高靶标亲和力和选择性的适体,并推进适体生物传感器和治疗方法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信