Paolo Climaco , Noelle M. Mitchell , Matthew J. Tyler , Kyungae Yang , Anne M. Andrews , Andrea L. Bertozzi
{"title":"GMFOLD: Subgraph matching for high-throughput DNA-aptamer secondary structure classification and machine learning interpretability","authors":"Paolo Climaco , Noelle M. Mitchell , Matthew J. Tyler , Kyungae Yang , Anne M. Andrews , Andrea L. Bertozzi","doi":"10.1016/j.mbs.2025.109485","DOIUrl":null,"url":null,"abstract":"<div><div>Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.</div></div>","PeriodicalId":51119,"journal":{"name":"Mathematical Biosciences","volume":"387 ","pages":"Article 109485"},"PeriodicalIF":1.8000,"publicationDate":"2025-06-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Mathematical Biosciences","FirstCategoryId":"99","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0025556425001117","RegionNum":4,"RegionCategory":"数学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOLOGY","Score":null,"Total":0}
引用次数: 0
Abstract
Aptamers are oligonucleotide receptors that bind to their targets with high affinity. Here, we consider aptamers comprised of single-stranded DNA that undergo target-binding-induced conformational changes, giving rise to unique secondary and tertiary structures. Given a specific aptamer primary sequence, there are well-established computational tools (notably mfold) to predict the secondary structure via free energy minimization algorithms. While mfold generates secondary structures for individual sequences, there is a need for a high-throughput process whereby thousands of DNA structures can be predicted in real-time for use in an interactive setting, when combined with aptamer selections that generate candidate pools that are too large to be experimentally interrogated. We developed a new Python code for high-throughput aptamer secondary structure determination (GMfold). GMfold uses subgraph matching methods to group aptamer candidates by secondary structure similarities. We also improve an open-source code, SeqFold, to incorporate subgraph matching concepts. We represent each secondary structure as a lowest-energy bipartite subgraph matching of the DNA graph to itself. These new tools enable thousands of DNA sequences to be compared based on their secondary structures, using machine-learning algorithms. This process is advantageous when analyzing sequences that arise from aptamer selections via systematic evolution of ligands by exponential enrichment (SELEX). This work is a building block for future machine-learning-informed DNA-aptamer selection processes to identify aptamers with improved target affinity and selectivity and advance aptamer biosensors and therapeutics.
期刊介绍:
Mathematical Biosciences publishes work providing new concepts or new understanding of biological systems using mathematical models, or methodological articles likely to find application to multiple biological systems. Papers are expected to present a major research finding of broad significance for the biological sciences, or mathematical biology. Mathematical Biosciences welcomes original research articles, letters, reviews and perspectives.