HKDE-LACM:基于k-mer和DNABERT-2包埋融合循环DE-BO优化的乳酸菌分类混合模型

IF 3.7 2区 生物学 Q2 BIOTECHNOLOGY & APPLIED MICROBIOLOGY
Jie Zou, Weichi Liu, Jinhui Dai, Gaifang Dong
{"title":"HKDE-LACM:基于k-mer和DNABERT-2包埋融合循环DE-BO优化的乳酸菌分类混合模型","authors":"Jie Zou, Weichi Liu, Jinhui Dai, Gaifang Dong","doi":"10.1186/s12864-025-12009-7","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Lactic acid bacteria (LAB) play vital roles in food production and clinical applications. Accurate classification of LAB strains facilitates their functional development and targeted utilization. Although machine learning and deep learning methods have been widely applied to genome sequence classification, challenges remain in capturing comprehensive feature representations and enhancing model generalizability.</p><p><strong>Results: </strong>We present HKDE-LACM, a hybrid classification model that integrates high-dimensional k-mer frequency features with contextual embeddings derived from DNABERT-2. To optimize model hyperparameters, we introduce a Cyclic Differential Evolution and Bayesian Optimization with Failure Avoidance (C-DBFA) framework. We conducted 10-fold cross-validation on three LAB datasets and evaluated performance. Experimental results demonstrate that HKDE-LACM outperforms existing methods in terms of both classification accuracy and robustness.</p><p><strong>Conclusions: </strong>HKDE-LACM overcomes the limitations of traditional k-mer features by incorporating semantic embeddings, thereby enriching the representation of genomic sequences. In addition, the model can automatically identify optimal combinations of feature extractors and classifiers through the C-DBFA optimization framework. These advantages effectively enhance the model's generalization ability, making it a promising tool for genome-based LAB classification and related tasks.</p>","PeriodicalId":9030,"journal":{"name":"BMC Genomics","volume":"26 1","pages":"815"},"PeriodicalIF":3.7000,"publicationDate":"2025-09-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465904/pdf/","citationCount":"0","resultStr":"{\"title\":\"HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization.\",\"authors\":\"Jie Zou, Weichi Liu, Jinhui Dai, Gaifang Dong\",\"doi\":\"10.1186/s12864-025-12009-7\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Background: </strong>Lactic acid bacteria (LAB) play vital roles in food production and clinical applications. Accurate classification of LAB strains facilitates their functional development and targeted utilization. Although machine learning and deep learning methods have been widely applied to genome sequence classification, challenges remain in capturing comprehensive feature representations and enhancing model generalizability.</p><p><strong>Results: </strong>We present HKDE-LACM, a hybrid classification model that integrates high-dimensional k-mer frequency features with contextual embeddings derived from DNABERT-2. To optimize model hyperparameters, we introduce a Cyclic Differential Evolution and Bayesian Optimization with Failure Avoidance (C-DBFA) framework. We conducted 10-fold cross-validation on three LAB datasets and evaluated performance. Experimental results demonstrate that HKDE-LACM outperforms existing methods in terms of both classification accuracy and robustness.</p><p><strong>Conclusions: </strong>HKDE-LACM overcomes the limitations of traditional k-mer features by incorporating semantic embeddings, thereby enriching the representation of genomic sequences. In addition, the model can automatically identify optimal combinations of feature extractors and classifiers through the C-DBFA optimization framework. These advantages effectively enhance the model's generalization ability, making it a promising tool for genome-based LAB classification and related tasks.</p>\",\"PeriodicalId\":9030,\"journal\":{\"name\":\"BMC Genomics\",\"volume\":\"26 1\",\"pages\":\"815\"},\"PeriodicalIF\":3.7000,\"publicationDate\":\"2025-09-25\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12465904/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"BMC Genomics\",\"FirstCategoryId\":\"99\",\"ListUrlMain\":\"https://doi.org/10.1186/s12864-025-12009-7\",\"RegionNum\":2,\"RegionCategory\":\"生物学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q2\",\"JCRName\":\"BIOTECHNOLOGY & APPLIED MICROBIOLOGY\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Genomics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12864-025-12009-7","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOTECHNOLOGY & APPLIED MICROBIOLOGY","Score":null,"Total":0}
引用次数: 0

摘要

背景:乳酸菌在食品生产和临床应用中发挥着重要作用。准确分类LAB菌株有利于其功能开发和针对性利用。尽管机器学习和深度学习方法已广泛应用于基因组序列分类,但在捕获全面的特征表示和增强模型的可泛化性方面仍然存在挑战。结果:我们提出了HKDE-LACM,这是一种混合分类模型,将高维k-mer频率特征与来自DNABERT-2的上下文嵌入相结合。为了优化模型超参数,我们引入了循环差分进化和贝叶斯优化与故障避免(C-DBFA)框架。我们对三个LAB数据集进行了10倍交叉验证并评估了性能。实验结果表明,HKDE-LACM在分类精度和鲁棒性方面都优于现有方法。结论:HKDE-LACM通过结合语义嵌入克服了传统k-mer特征的局限性,从而丰富了基因组序列的表示。此外,该模型可以通过C-DBFA优化框架自动识别特征提取器和分类器的最优组合。这些优点有效地增强了模型的泛化能力,使其成为基于基因组的LAB分类及相关任务的一个很有前景的工具。
本文章由计算机程序翻译,如有差异,请以英文原文为准。

HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization.

HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization.

HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization.

HKDE-LACM: a hybrid model for lactic acid bacteria classification via k-mer and DNABERT-2 embedding fusion with cyclic DE-BO optimization.

Background: Lactic acid bacteria (LAB) play vital roles in food production and clinical applications. Accurate classification of LAB strains facilitates their functional development and targeted utilization. Although machine learning and deep learning methods have been widely applied to genome sequence classification, challenges remain in capturing comprehensive feature representations and enhancing model generalizability.

Results: We present HKDE-LACM, a hybrid classification model that integrates high-dimensional k-mer frequency features with contextual embeddings derived from DNABERT-2. To optimize model hyperparameters, we introduce a Cyclic Differential Evolution and Bayesian Optimization with Failure Avoidance (C-DBFA) framework. We conducted 10-fold cross-validation on three LAB datasets and evaluated performance. Experimental results demonstrate that HKDE-LACM outperforms existing methods in terms of both classification accuracy and robustness.

Conclusions: HKDE-LACM overcomes the limitations of traditional k-mer features by incorporating semantic embeddings, thereby enriching the representation of genomic sequences. In addition, the model can automatically identify optimal combinations of feature extractors and classifiers through the C-DBFA optimization framework. These advantages effectively enhance the model's generalization ability, making it a promising tool for genome-based LAB classification and related tasks.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
BMC Genomics
BMC Genomics 生物-生物工程与应用微生物
CiteScore
7.40
自引率
4.50%
发文量
769
审稿时长
6.4 months
期刊介绍: BMC Genomics is an open access, peer-reviewed journal that considers articles on all aspects of genome-scale analysis, functional genomics, and proteomics. BMC Genomics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信