iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.

IF 3.9 2区生物学 Q1 MATHEMATICAL & COMPUTATIONAL BIOLOGY

Interdisciplinary Sciences: Computational Life Sciences Pub Date : 2025-09-01 Epub Date: 2025-05-07 DOI:10.1007/s12539-025-00703-9

Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao

{"title":"iEnhancer-GDM: A Deep Learning Framework Based on Generative Adversarial Network and Multi-head Attention Mechanism to Identify Enhancers and Their Strength.","authors":"Xiaomei Yang, Meng Liao, Bin Ye, Junfeng Xia, Jianping Zhao","doi":"10.1007/s12539-025-00703-9","DOIUrl":null,"url":null,"abstract":"<p><p>Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.</p>","PeriodicalId":13670,"journal":{"name":"Interdisciplinary Sciences: Computational Life Sciences","volume":" ","pages":"662-672"},"PeriodicalIF":3.9000,"publicationDate":"2025-09-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Interdisciplinary Sciences: Computational Life Sciences","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1007/s12539-025-00703-9","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/5/7 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"MATHEMATICAL & COMPUTATIONAL BIOLOGY","Score":null,"Total":0}

引用次数: 0

Abstract

Enhancers are short DNA fragments capable of significantly increase the frequency of gene transcription. They often exert their effects on targeted genes over long distances, either in cis or in trans configurations. Identifying enhancers poses a challenge due to their variable position and sensitivities. Genetic variants within enhancer regions have been implicated in human diseases, highlighting critical importance of enhancers identification and strength prediction. Here, we develop a two-layer predictor named iEnhancer-GDM to identify enhancers and to predict enhancer strength. To address the challenges posed by the limited size of enhancer training dataset, which could cause issues such as model overfitting and low classification accuracy, we introduce a Wasserstein generative adversarial network (WGAN-GP) to augment the dataset. We employ a dna2vec embedding layer to encode raw DNA sequences into numerical feature representations, and then integrate multi-scale convolutional neural network, bidirectional long short-term memory network and multi-head attention mechanism for feature representation and classification. Our results validate the effectiveness of data augmentation in WGAN-GP. Our model iEnhancer-GDM achieves superior performance on an independent test dataset, and outperforms the existing models with improvements of 2.45% for enhancer identification and 11.5% for enhancer strength prediction by benchmarking against current methods. iEnhancer-GDM advances the precise enhancer identification and strength prediction, thereby helping to understand the functions of enhancers and their associations on genomics.

查看原文本刊更多论文

基于生成对抗网络和多头注意机制的深度学习框架，用于识别增强器及其强度。

增强子是能够显著提高基因转录频率的短DNA片段。它们经常远距离作用于目标基因，要么是顺式，要么是反式。由于增强子的位置和敏感性不同，识别增强子是一项挑战。增强子区域内的遗传变异与人类疾病有关，突出了增强子鉴定和强度预测的关键重要性。在这里，我们开发了一个名为iEnhancer-GDM的两层预测器来识别增强子并预测增强子的强度。为了解决增强器训练数据集规模有限所带来的挑战，这可能导致模型过拟合和分类精度低等问题，我们引入了Wasserstein生成对抗网络（WGAN-GP）来增强数据集。采用dna2vec嵌入层将原始DNA序列编码为数字特征表示，然后结合多尺度卷积神经网络、双向长短期记忆网络和多头注意机制进行特征表示和分类。我们的结果验证了WGAN-GP中数据增强的有效性。我们的模型iEnhancer-GDM在独立的测试数据集上取得了优异的性能，通过对现有方法的基准测试，增强器识别的性能提高了2.45%，增强器强度预测的性能提高了11.5%。iEnhancer-GDM促进了增强子的精确鉴定和强度预测，从而有助于了解增强子的功能及其与基因组学的关联。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

求助全文

约1分钟内获得全文求助全文

来源期刊

Interdisciplinary Sciences: Computational Life Sciences MATHEMATICAL & COMPUTATIONAL BIOLOGY-

CiteScore

8.60

自引率

4.20%

发文量

期刊介绍： Interdisciplinary Sciences--Computational Life Sciences aims to cover the most recent and outstanding developments in interdisciplinary areas of sciences, especially focusing on computational life sciences, an area that is enjoying rapid development at the forefront of scientific research and technology. The journal publishes original papers of significant general interest covering recent research and developments. Articles will be published rapidly by taking full advantage of internet technology for online submission and peer-reviewing of manuscripts, and then by publishing OnlineFirstTM through SpringerLink even before the issue is built or sent to the printer. The editorial board consists of many leading scientists with international reputation, among others, Luc Montagnier (UNESCO, France), Dennis Salahub (University of Calgary, Canada), Weitao Yang (Duke University, USA). Prof. Dongqing Wei at the Shanghai Jiatong University is appointed as the editor-in-chief; he made important contributions in bioinformatics and computational physics and is best known for his ground-breaking works on the theory of ferroelectric liquids. With the help from a team of associate editors and the editorial board, an international journal with sound reputation shall be created.