{"title":"超越序列:预测DNA突变的物理信息机器学习框架。","authors":"M Suárez-Villagrán, N Mitsakos, J H Miller","doi":"10.1016/j.csbj.2025.08.033","DOIUrl":null,"url":null,"abstract":"<p><p>This paper investigates how incorporating information from a quantum tight-binding model can enhance the predictive capability of machine learning models for identifying mutation-prone sites in mitochondrial DNA (mtDNA). We employ quantum Hamiltonian techniques and machine learning to explore mutations in mitochondrial DNA's hypervariable segment 1 (HVR1). This region is recognized for its high variability and is frequently used in genealogical DNA testing and research. Our approach considers the local energy associated with each base pair, as well as the interactions among electrons within the DNA chain. For this study, we analyze data from the Mitomap database. Our findings suggest that both the local ionization energies and the context-dependent nature of the base pairs significantly influence the locations of mutations within DNA. Specifically, our machine learning model can extract valuable insights when examining homopolymeric runs-regions where a single base pair repeats multiple times within a sequence.</p>","PeriodicalId":10715,"journal":{"name":"Computational and structural biotechnology journal","volume":"27 ","pages":"3985-3992"},"PeriodicalIF":4.1000,"publicationDate":"2025-09-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465030/pdf/","citationCount":"0","resultStr":"{\"title\":\"Beyond sequence: A physics-informed machine learning framework for predicting DNA mutations.\",\"authors\":\"M Suárez-Villagrán, N Mitsakos, J H Miller\",\"doi\":\"10.1016/j.csbj.2025.08.033\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>This paper investigates how incorporating information from a quantum tight-binding model can enhance the predictive capability of machine learning models for identifying mutation-prone sites in mitochondrial DNA (mtDNA). We employ quantum Hamiltonian techniques and machine learning to explore mutations in mitochondrial DNA's hypervariable segment 1 (HVR1). This region is recognized for its high variability and is frequently used in genealogical DNA testing and research. Our approach considers the local energy associated with each base pair, as well as the interactions among electrons within the DNA chain. For this study, we analyze data from the Mitomap database. Our findings suggest that both the local ionization energies and the context-dependent nature of the base pairs significantly influence the locations of mutations within DNA. Specifically, our machine learning model can extract valuable insights when examining homopolymeric runs-regions where a single base pair repeats multiple times within a sequence.</p>\",\"PeriodicalId\":10715,\"journal\":{\"name\":\"Computational and structural biotechnology journal\",\"volume\":\"27 \",\"pages\":\"3985-3992\"},\"PeriodicalIF\":4.1000,\"publicationDate\":\"2025-09-10\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465030/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Computational and structural biotechnology journal\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1016/j.csbj.2025.08.033\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/1/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"Q2\",\"JCRName\":\"BIOCHEMISTRY & MOLECULAR BIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Computational and structural biotechnology journal","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1016/j.csbj.2025.08.033","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/1/1 0:00:00","PubModel":"eCollection","JCR":"Q2","JCRName":"BIOCHEMISTRY & MOLECULAR BIOLOGY","Score":null,"Total":0}
Beyond sequence: A physics-informed machine learning framework for predicting DNA mutations.
This paper investigates how incorporating information from a quantum tight-binding model can enhance the predictive capability of machine learning models for identifying mutation-prone sites in mitochondrial DNA (mtDNA). We employ quantum Hamiltonian techniques and machine learning to explore mutations in mitochondrial DNA's hypervariable segment 1 (HVR1). This region is recognized for its high variability and is frequently used in genealogical DNA testing and research. Our approach considers the local energy associated with each base pair, as well as the interactions among electrons within the DNA chain. For this study, we analyze data from the Mitomap database. Our findings suggest that both the local ionization energies and the context-dependent nature of the base pairs significantly influence the locations of mutations within DNA. Specifically, our machine learning model can extract valuable insights when examining homopolymeric runs-regions where a single base pair repeats multiple times within a sequence.
期刊介绍:
Computational and Structural Biotechnology Journal (CSBJ) is an online gold open access journal publishing research articles and reviews after full peer review. All articles are published, without barriers to access, immediately upon acceptance. The journal places a strong emphasis on functional and mechanistic understanding of how molecular components in a biological process work together through the application of computational methods. Structural data may provide such insights, but they are not a pre-requisite for publication in the journal. Specific areas of interest include, but are not limited to:
Structure and function of proteins, nucleic acids and other macromolecules
Structure and function of multi-component complexes
Protein folding, processing and degradation
Enzymology
Computational and structural studies of plant systems
Microbial Informatics
Genomics
Proteomics
Metabolomics
Algorithms and Hypothesis in Bioinformatics
Mathematical and Theoretical Biology
Computational Chemistry and Drug Discovery
Microscopy and Molecular Imaging
Nanotechnology
Systems and Synthetic Biology