Diff-SE:用于超增强预测的扩散增强对比学习框架。

IF 5.3 2区 化学 Q1 CHEMISTRY, MEDICINAL
Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo
{"title":"Diff-SE:用于超增强预测的扩散增强对比学习框架。","authors":"Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo","doi":"10.1021/acs.jcim.5c01005","DOIUrl":null,"url":null,"abstract":"<p><p>Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and <i>F</i>1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7789-7799"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diff-SE: A Diffusion-Augmented Contrastive Learning Framework for Super-Enhancer Prediction.\",\"authors\":\"Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo\",\"doi\":\"10.1021/acs.jcim.5c01005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and <i>F</i>1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"7789-7799\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.5c01005\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01005","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
引用次数: 0

摘要

超级增强子(se)是顺式调控元件,在基因表达中起关键作用,与癌症和阿尔茨海默氏症等疾病有关。传统的鉴定方法依赖于ChIP-seq实验,成本高,耗时长。虽然最近的计算方法利用序列特征进行SE预测,但它们往往存在严重的类不平衡和跨物种的较差泛化。为了解决这些限制,我们提出了Diff-SE,这是一个深度学习框架,将基于扩散的数据增强与对比学习相结合。扩散模块对se的连续分布进行建模,生成具有生物学意义的合成阳性样本,有效地平衡训练数据。然后使用对比学习策略通过最大化类内相似性和类间分离来增强特征表示。8个数据集的实验结果表明,difff - se始终优于基线模型,在精度(PRE)、马修斯相关系数(MCC)和f1评分方面提高了10%-30%。此外,Diff-SE在人类和小鼠细胞系之间的跨物种验证中表现出优越的通用性。代码和数据集可在https://github.com/15831959673/Diff-SE上获得,以便在SE预测中进行进一步研究和应用。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Diff-SE: A Diffusion-Augmented Contrastive Learning Framework for Super-Enhancer Prediction.

Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and F1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
CiteScore
9.80
自引率
10.70%
发文量
529
审稿时长
1.4 months
期刊介绍: The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery. Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field. As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信