基于量子隧穿纳米隙结的人工DNA序列机器学习识别

IF 2.9 2区 化学 Q3 CHEMISTRY, PHYSICAL
Milan Kumar Jena, Sneha Mittal and Biswarup Pathak*, 
{"title":"基于量子隧穿纳米隙结的人工DNA序列机器学习识别","authors":"Milan Kumar Jena,&nbsp;Sneha Mittal and Biswarup Pathak*,&nbsp;","doi":"10.1021/acs.jpcb.4c0627010.1021/acs.jpcb.4c06270","DOIUrl":null,"url":null,"abstract":"<p >Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (<i>x</i>DNA and <i>y</i>DNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques. By embedding these nucleobases within a solid-state nanogap junction, we calculated their fingerprint transmission and current readouts and also analyzed the influence of electronic coupling and molecular orbital delocalization on these properties. The trained ML model achieved a predictive basecalling accuracy of up to 100% for <i>x</i>DNA nucleobases and 99.80% for <i>y</i>DNA transmission readout data sets. ML explainability study revealed that normalized descriptors have a greater impact on nucleobase prediction than the original transmission function, proving more effective in disentangling overlapping artificial DNA nucleobase signals. Quaternary classification results highlighted higher recognition accuracy for <i>x</i>DNA nucleobases than for <i>y</i>DNA nucleobases. Furthermore, precise calling of complementary, purine, and pyrimidine base pair combinations was demonstrated with high sensitivity and an F1 score. Our findings reveal the feasibility of highly sensitive and precise electrical recognition of artificial DNA nucleobases, which can transform genetic research and spur advancements in genetic data storage, synthetic biology, and diagnostics.</p>","PeriodicalId":60,"journal":{"name":"The Journal of Physical Chemistry B","volume":"129 3","pages":"853–865 853–865"},"PeriodicalIF":2.9000,"publicationDate":"2025-01-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Machine Learning Recognition of Artificial DNA Sequence with Quantum Tunneling Nanogap Junction\",\"authors\":\"Milan Kumar Jena,&nbsp;Sneha Mittal and Biswarup Pathak*,&nbsp;\",\"doi\":\"10.1021/acs.jpcb.4c0627010.1021/acs.jpcb.4c06270\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p >Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (<i>x</i>DNA and <i>y</i>DNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques. By embedding these nucleobases within a solid-state nanogap junction, we calculated their fingerprint transmission and current readouts and also analyzed the influence of electronic coupling and molecular orbital delocalization on these properties. The trained ML model achieved a predictive basecalling accuracy of up to 100% for <i>x</i>DNA nucleobases and 99.80% for <i>y</i>DNA transmission readout data sets. ML explainability study revealed that normalized descriptors have a greater impact on nucleobase prediction than the original transmission function, proving more effective in disentangling overlapping artificial DNA nucleobase signals. Quaternary classification results highlighted higher recognition accuracy for <i>x</i>DNA nucleobases than for <i>y</i>DNA nucleobases. Furthermore, precise calling of complementary, purine, and pyrimidine base pair combinations was demonstrated with high sensitivity and an F1 score. Our findings reveal the feasibility of highly sensitive and precise electrical recognition of artificial DNA nucleobases, which can transform genetic research and spur advancements in genetic data storage, synthetic biology, and diagnostics.</p>\",\"PeriodicalId\":60,\"journal\":{\"name\":\"The Journal of Physical Chemistry B\",\"volume\":\"129 3\",\"pages\":\"853–865 853–865\"},\"PeriodicalIF\":2.9000,\"publicationDate\":\"2025-01-09\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The Journal of Physical Chemistry B\",\"FirstCategoryId\":\"1\",\"ListUrlMain\":\"https://pubs.acs.org/doi/10.1021/acs.jpcb.4c06270\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q3\",\"JCRName\":\"CHEMISTRY, PHYSICAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The Journal of Physical Chemistry B","FirstCategoryId":"1","ListUrlMain":"https://pubs.acs.org/doi/10.1021/acs.jpcb.4c06270","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q3","JCRName":"CHEMISTRY, PHYSICAL","Score":null,"Total":0}
引用次数: 0

摘要

人工合成的DNA在解决基本生化问题和推动生物技术、遗传学和DNA数字数据存储的进步方面具有重要的前景。对这些人工DNA链进行快速、精确的电鉴定是其有效应用的关键。在此,我们利用量子隧道传输和机器学习(ML)技术对八种人工合成DNA (xDNA和yDNA)核碱基的电识别进行了全面研究。通过将这些核碱基嵌入固态纳米隙结中,我们计算了它们的指纹传输和电流读数,并分析了电子耦合和分子轨道离域对这些性质的影响。训练后的ML模型对xDNA核碱基的预测调用准确率高达100%,对yDNA传输读出数据集的预测调用准确率高达99.80%。ML可解释性研究表明,规范化描述符对核碱基预测的影响大于原始传输函数,证明在解纠缠重叠的人工DNA核碱基信号方面更有效。第四纪分类结果显示,xDNA核碱基的识别准确率高于yDNA核碱基。此外,互补、嘌呤和嘧啶碱基对组合的精确调用具有高灵敏度和F1评分。我们的研究结果揭示了人工DNA核碱基的高灵敏度和精确电识别的可行性,这可以改变遗传研究并促进遗传数据存储,合成生物学和诊断的进步。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

Machine Learning Recognition of Artificial DNA Sequence with Quantum Tunneling Nanogap Junction

Machine Learning Recognition of Artificial DNA Sequence with Quantum Tunneling Nanogap Junction

Artificially synthesized DNA holds significant promise in addressing fundamental biochemical questions and driving advancements in biotechnology, genetics, and DNA digital data storage. Rapid and precise electric identification of these artificial DNA strands is crucial for their effective application. Herein, we present a comprehensive investigation into the electric recognition of eight artificial synthesized DNA (xDNA and yDNA) nucleobases using quantum tunneling transport and machine learning (ML) techniques. By embedding these nucleobases within a solid-state nanogap junction, we calculated their fingerprint transmission and current readouts and also analyzed the influence of electronic coupling and molecular orbital delocalization on these properties. The trained ML model achieved a predictive basecalling accuracy of up to 100% for xDNA nucleobases and 99.80% for yDNA transmission readout data sets. ML explainability study revealed that normalized descriptors have a greater impact on nucleobase prediction than the original transmission function, proving more effective in disentangling overlapping artificial DNA nucleobase signals. Quaternary classification results highlighted higher recognition accuracy for xDNA nucleobases than for yDNA nucleobases. Furthermore, precise calling of complementary, purine, and pyrimidine base pair combinations was demonstrated with high sensitivity and an F1 score. Our findings reveal the feasibility of highly sensitive and precise electrical recognition of artificial DNA nucleobases, which can transform genetic research and spur advancements in genetic data storage, synthetic biology, and diagnostics.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
5.80
自引率
9.10%
发文量
965
审稿时长
1.6 months
期刊介绍: An essential criterion for acceptance of research articles in the journal is that they provide new physical insight. Please refer to the New Physical Insights virtual issue on what constitutes new physical insight. Manuscripts that are essentially reporting data or applications of data are, in general, not suitable for publication in JPC B.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信