{"title":"Diff-SE:用于超增强预测的扩散增强对比学习框架。","authors":"Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo","doi":"10.1021/acs.jcim.5c01005","DOIUrl":null,"url":null,"abstract":"<p><p>Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and <i>F</i>1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":"7789-7799"},"PeriodicalIF":5.3000,"publicationDate":"2025-07-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Diff-SE: A Diffusion-Augmented Contrastive Learning Framework for Super-Enhancer Prediction.\",\"authors\":\"Haolu Zhou, Yu Han, Yude Bai, Yun Zuo, Wenying He, Fei Guo\",\"doi\":\"10.1021/acs.jcim.5c01005\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and <i>F</i>1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.</p>\",\"PeriodicalId\":44,\"journal\":{\"name\":\"Journal of Chemical Information and Modeling \",\"volume\":\" \",\"pages\":\"7789-7799\"},\"PeriodicalIF\":5.3000,\"publicationDate\":\"2025-07-28\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Journal of Chemical Information and Modeling \",\"FirstCategoryId\":\"92\",\"ListUrlMain\":\"https://doi.org/10.1021/acs.jcim.5c01005\",\"RegionNum\":2,\"RegionCategory\":\"化学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/7/4 0:00:00\",\"PubModel\":\"Epub\",\"JCR\":\"Q1\",\"JCRName\":\"CHEMISTRY, MEDICINAL\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Journal of Chemical Information and Modeling ","FirstCategoryId":"92","ListUrlMain":"https://doi.org/10.1021/acs.jcim.5c01005","RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/7/4 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"CHEMISTRY, MEDICINAL","Score":null,"Total":0}
Diff-SE: A Diffusion-Augmented Contrastive Learning Framework for Super-Enhancer Prediction.
Super-enhancers (SEs) are cis-regulatory elements that play crucial roles in gene expression and are implicated in diseases such as cancer and Alzheimer's. Traditional identification methods rely on ChIP-seq experiments, which are costly and time-consuming. While recent computational approaches have leveraged sequence features for SE prediction, they often suffer from severe class imbalance and poor generalization across species. To address these limitations, we propose Diff-SE, a deep learning framework that integrates diffusion-based data augmentation with contrastive learning. The diffusion module models the continuous distribution of SEs to generate biologically meaningful synthetic positive samples, effectively balancing training data. A contrastive learning strategy is then used to enhance feature representation by maximizing intraclass similarity and interclass separation. Experimental results across eight data sets demonstrate that Diff-SE consistently outperforms the baseline model, achieving 10%-30% improvements in precision (PRE), Matthews correlation coefficient (MCC), and F1-score. Furthermore, Diff-SE exhibits superior generalization in cross-species validation between human and mouse cell lines. The code and data sets are available at https://github.com/15831959673/Diff-SE, enabling further research and applications in SE prediction.
期刊介绍:
The Journal of Chemical Information and Modeling publishes papers reporting new methodology and/or important applications in the fields of chemical informatics and molecular modeling. Specific topics include the representation and computer-based searching of chemical databases, molecular modeling, computer-aided molecular design of new materials, catalysts, or ligands, development of new computational methods or efficient algorithms for chemical software, and biopharmaceutical chemistry including analyses of biological activity and other issues related to drug discovery.
Astute chemists, computer scientists, and information specialists look to this monthly’s insightful research studies, programming innovations, and software reviews to keep current with advances in this integral, multidisciplinary field.
As a subscriber you’ll stay abreast of database search systems, use of graph theory in chemical problems, substructure search systems, pattern recognition and clustering, analysis of chemical and physical data, molecular modeling, graphics and natural language interfaces, bibliometric and citation analysis, and synthesis design and reactions databases.