Xuchao Zhang, Jing Chen, Yongtian Wang, Xiaofeng Wang, Jialu Hu, Jiajie Peng, Xuequn Shang, Yanpu Wang, Tao Wang
{"title":"cfMethylPre: deep transfer learning enhances cancer detection based on circulating cell-free DNA methylation profiling.","authors":"Xuchao Zhang, Jing Chen, Yongtian Wang, Xiaofeng Wang, Jialu Hu, Jiajie Peng, Xuequn Shang, Yanpu Wang, Tao Wang","doi":"10.1093/bib/bbaf303","DOIUrl":"10.1093/bib/bbaf303","url":null,"abstract":"<p><p>Cancer remains a significant global health burden, underscoring the need for innovative diagnostic tools to enable early detection and improve patient outcomes. While circulating cell-free DNA (cfDNA) methylation has emerged as a promising biomarker for noninvasive cancer diagnostics, existing methods often face limitations in handling the high-dimensionality of methylation data, small sample sizes, and a lack of biological interpretability. To address these challenges, we propose cfMethylPre, a novel deep transfer learning framework tailored for cancer detection using cfDNA methylation data. cfMethylPre leverages large language model pretrained embeddings from DNA sequence information and integrates them with methylation profiles to enhance feature representation. The deep transfer learning process involves pretraining on bulk DNA methylation data encompassing 2801 samples across 82 cancer types and normal controls, followed by fine-tuning with cfDNA methylation data. This approach ensures robust adaptation to cfDNA's unique characteristics while improving predictive accuracy. Our model achieved superior predictive accuracy compared with state-of-the-art methods, with a weighted Matthews Correlation Coefficient of 0.926 and a weighted F1-score of 0.942. Through model interpretation and biological experimental validation, we identified three novel breast cancer genes-PCDHA10, PRICKLE2, and PRTG-demonstrating their inhibitory effects on cell proliferation and migration in breast cancer cell lines. These findings establish cfMethylPre as a powerful and interpretable tool for cancer diagnostics and biological discovery, paving the way for its application in precision oncology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12206449/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526475","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Computational modeling of single-cell dynamics data.","authors":"Wenbo Guo, Zeyu Chen, Jin Gu","doi":"10.1093/bib/bbaf305","DOIUrl":"10.1093/bib/bbaf305","url":null,"abstract":"<p><p>Deciphering the cell dynamics in complex biological systems is of great significance for understanding the mechanisms of life and facilitating disease treatment. Recent advances in single-cell sequencing technologies have enabled the measurement of single-cell characteristics over multiple time points. However, the integration and analysis of these dynamic single-cell data face many challenges and raise new demands for computational methodologies. In this review, we first elaborate these challenges in the context of experimental limitations, data features, and biological discoveries. Then, we provide an overview of the algorithmic advancements across four key tasks: inferring single-cell dynamics, dissecting dynamic mechanisms, predicting future cell fates, and integrating lineage tracing information to characterize cell dynamics. Finally, we discuss that the cutting-edge developments in biological technologies and artificial intelligence algorithms may greatly enhance our ability to explore complex life processes from a spatiotemporal systemic perspective.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12207405/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526476","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: ScInfeR: an efficient method for annotating cell types and sub-types in single-cell RNA-seq, ATAC-seq, and spatial omics.","authors":"","doi":"10.1093/bib/bbaf337","DOIUrl":"10.1093/bib/bbaf337","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12205367/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144526477","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMsurv: a multimodal multi-instance multi-cancer survival prediction model integrating pathological images, clinical information, and sequencing data.","authors":"Hailong Yang, Jia Wang, Wenyan Wang, Shufang Shi, Lijing Liu, Yuhua Yao, Geng Tian, Peizhen Wang, Jialiang Yang","doi":"10.1093/bib/bbaf209","DOIUrl":"10.1093/bib/bbaf209","url":null,"abstract":"<p><p>Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144075688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feifei Xia, Max Adriaan Verbiest, Oxana Lundström, Tugce Bilgin Sonay, Michael Baudis, Maria Anisimova
{"title":"Multicancer analyses of short tandem repeat variations reveal shared gene regulatory mechanisms.","authors":"Feifei Xia, Max Adriaan Verbiest, Oxana Lundström, Tugce Bilgin Sonay, Michael Baudis, Maria Anisimova","doi":"10.1093/bib/bbaf219","DOIUrl":"10.1093/bib/bbaf219","url":null,"abstract":"<p><p>Short tandem repeats (STRs) have been reported to influence gene expression across various human tissues. While STR variations are enriched in colorectal, stomach, and endometrial cancers, particularly in microsatellite instable tumors, their functional effects and regulatory mechanisms on gene expression remain poorly understood across these cancer types. Here, we leverage whole-exome sequencing and gene expression data to identify STRs for which repeat lengths are associated with the expression of nearby genes (eSTRs) in colorectal, stomach, and endometrial tumors. While most eSTRs are cancer-specific, shared eSTRs across multiple cancers exhibit consistent effects on gene expression. Notably, coding-region eSTRs identified in all three cancer types show positive correlations with nearby gene expression. We further validate the functional effects of eSTRs by demonstrating associations between somatic eSTR mutations and gene expression changes during the transition from normal to tumor tissues, suggesting their potential roles in tumorigenesis. Combined with DNA methylation data, we perform the first quantitative analysis of the interplay between STR variations and DNA methylation in tumors. We identify eSTRs where repeat lengths are associated with methylation levels of nearby CpG sites (meSTRs) and show that >70% of eSTRs are significantly linked to local DNA methylation. Importantly, the effects of meSTRs on DNA methylation remain consistent across cancer types. Overall, our findings enhance the understanding of how functional STR variations influence gene expression and DNA methylation. Our study highlights shared regulatory mechanisms of STRs across multiple cancers, offering a foundation for future research into their broader implications in tumor biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144118817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives.","authors":"Alain J Mbebi, Facundo Mercado, David Hobby, Hao Tong, Zoran Nikoloski","doi":"10.1093/bib/bbaf211","DOIUrl":"10.1093/bib/bbaf211","url":null,"abstract":"<p><p>Traits in any organism are not independent, but show considerable integration, observed in a form of couplings and trade-offs. Therefore, improvement in one trait may affect other traits, often in undesired direction. To account for this problem, crop breeding increasingly relies on multi-trait genomic prediction (MT-GP) approaches that leverage the availability of genetic markers from different populations along with advances in high-throughput precision phenotyping. While significant progress has been made to jointly model multiple traits using a variety of statistical and machine learning approaches, there is no systematic comparison of advantages and shortcomings of the existing classes of MT-GP models. Here, we fill this knowledge gap by first classifying the existing MT-GP models and briefly summarizing their general principles, modeling assumptions, and potential limitations. We then perform an extensive comparative analysis with 10 traits measured in an Oryza sativa diversity panel using cross-validation scenarios relevant in breeding practice. Finally, we discuss directions that can enable the building of next generation MT-GP models in addressing pressing challenges in crop breeding.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sara H Mohamed, Mohamed Hamed, Hussain A Alamoudi, Zayd Jastaniah, Fadhl M Alakwaa, Asmaa Reda
{"title":"Multi-omics analysis of Helicobacter pylori-associated gastric cancer identifies hub genes as a novel therapeutic biomarker.","authors":"Sara H Mohamed, Mohamed Hamed, Hussain A Alamoudi, Zayd Jastaniah, Fadhl M Alakwaa, Asmaa Reda","doi":"10.1093/bib/bbaf241","DOIUrl":"10.1093/bib/bbaf241","url":null,"abstract":"<p><p>Helicobacter pylori infection is one of the most common gastric pathogens; however, the molecular mechanisms driving its progression to gastric cancer remain poorly understood. This study aimed to identify the key transcriptomic drivers and therapeutic targets of H. pylori-associated gastric cancer through an integrative transcriptomic analysis. This analysis integrates microarray and RNA-seq datasets to identify significant differentially expressed genes (DEGs) involved in the progression of H. pylori-associated gastric cancer. In addition to independent analyses, data were integrated using ComBat to detect consistent expression patterns of hub genes. This approach revealed distinct clustering patterns and stage-specific transcriptional changes in common DEGs across disease progression, including H. pylori infection, gastritis, atrophy, and gastric cancer. Genes such as TPX2, MKI67, EXO1, and CTHRC1 exhibited progressive upregulation from infection to cancer, highlighting involvement in cell cycle regulation, DNA repair, and extracellular matrix remodeling. These findings provide insights into molecular shifts linking inflammation-driven infection to malignancy. Furthermore, network analysis identified hub genes, including CXCL1, CCL20, IL12B, and STAT4, which are enriched in immune pathways such as chemotaxis, leukocyte migration, and cytokine signaling. This emphasizes their role in immune dysregulation and tumor development. Expression profiling demonstrated the upregulation of hub genes in gastric cancer and stage-specific changes correlating with disease progression. Finally, drug-gene interaction analysis identified therapeutic opportunities, with hub genes interacting with approved drugs like abatacept and zoledronic acid, as well as developmental drugs such as adjuvant and relapladib. These findings highlight the key role of these hub genes as biomarkers and therapeutic targets, providing a foundation for advancing precision medicine in H. pylori-associated gastric cancer. Overall, this study paves the way for advancing precision medicine in H. pylori-associated gastric cancer by providing insights into the development of early detection biomarkers, risk stratification, and targeted therapies. This supports the clinical translation of precision medicine strategies in H. pylori-associated gastric cancer.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyue Tom Hu, Yaodong Yu, Ruoqiao Chen, Shan-Ju Yeh, Bin Chen, Haiyan Huang
{"title":"Large-scale information retrieval and correction of noisy pharmacogenomic datasets through residual thresholded deep matrix factorization.","authors":"Zhiyue Tom Hu, Yaodong Yu, Ruoqiao Chen, Shan-Ju Yeh, Bin Chen, Haiyan Huang","doi":"10.1093/bib/bbaf226","DOIUrl":"10.1093/bib/bbaf226","url":null,"abstract":"<p><p>Pharmacogenomics studies are attracting an increasing amount of interest from researchers in precision medicine. The advances in high-throughput experiments and multiplexed approaches allow the large-scale quantification of drug sensitivities in molecularly characterized cancer cell lines (CCLs), resulting in a number of open drug sensitivity datasets for drug biomarker discovery. However, a significant inconsistency in drug sensitivity values among these datasets has been noted. Such inconsistency indicates the presence of substantial noise, subsequently hindering downstream analyses. To address the noise in drug sensitivity data, we introduce a robust and scalable deep learning framework, Residual Thresholded Deep Matrix Factorization (RT-DMF). This method takes a single drug sensitivity data matrix as its sole input and outputs a corrected and imputed matrix. Deep matrix factorization (DMF) excels at uncovering subtle patterns, due to its minimal reliance on data structure assumptions. This attribute significantly boosts DMF's ability to identify complex hidden patterns among nuisance effects in the data, thereby facilitating the detection of signals that are therapeutically relevant. Furthermore, RT-DMF incorporates an iterative residual thresholding procedure, which plays a crucial role in retaining signals more likely to hold therapeutic importance. Validation using simulated datasets and real pharmacogenomics datasets demonstrates the effectiveness of our approach in correcting noise and imputing missing data in drug sensitivity datasets (open-source package available at https://github.com/tomwhoooo/rtdmf).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Enhancing LncRNA-miRNA interaction prediction with multimodal contrastive representation learning.","authors":"Zhixia Teng, Zhaowen Tian, Murong Zhou, Guohua Wang, Zhen Tian, Yuming Zhao","doi":"10.1093/bib/bbaf281","DOIUrl":"10.1093/bib/bbaf281","url":null,"abstract":"<p><p>Interactions between long non-coding RNAs (lncRNAs) and microRNAs (miRNAs) play an important role in the development of complex human diseases by collaboratively regulating gene transcription and expression. Therefore, identifying lncRNA-miRNA interactions (LMIs) is essential for diagnosing and treating complex human diseases. Because identifying LMIs with wet experiments is time-consuming and labor-intensive, some computational methods have been developed to infer LMIs. However, these approaches excel at utilizing single-modal information but struggle to integrate multimodal data from lncRNAs and miRNAs, which is essential for uncovering complex patterns in LMIs, ultimately limiting their performance. Therefore, this article proposes a novel multimodal contrastive representation learning model (MCRLMI) for LMI predictions. The model fully integrates multi-source similarity information and sequence encodings of lncRNAs and miRNAs. It leverages a graph convolutional network (GCN) and a Transformer to capture local neighborhood structural features and long-distance dependencies, respectively, enabling the collaborative modeling of structural and semantic information. Subsequently, to effectively integrate multimodal characteristics with encoded information, a multichannel attention mechanism and contrastive learning are introduced to fuse the extracted features. Finally, a Kolmogorov-Arnold Network (KAN) is trained with the optimized embeddings to predict LMIs. Extensive experiments show that the proposed MCRLMI consistently outperforms existing methods. Moreover, case studies further validate the potential of MCRLMI to identify novel LMIs in practical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12199918/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144309542","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Xudong Liu, Zhiwei Nie, Haorui Si, Xurui Shen, Yutian Liu, Xiansong Huang, Tianyi Dong, Fan Xu, Zhixiang Ren, Peng Zhou, Jie Chen
{"title":"Generative prediction of real-world prevalent SARS-CoV-2 mutation with in silico virus evolution.","authors":"Xudong Liu, Zhiwei Nie, Haorui Si, Xurui Shen, Yutian Liu, Xiansong Huang, Tianyi Dong, Fan Xu, Zhixiang Ren, Peng Zhou, Jie Chen","doi":"10.1093/bib/bbaf276","DOIUrl":"10.1093/bib/bbaf276","url":null,"abstract":"<p><p>Predicting the mutation prevalence trends of emerging viruses in the real world is an efficient means to update vaccines or drugs in advance. It is crucial to develop a computational method for the prediction of real-world prevalent SARS-CoV-2 mutations considering the impact of multiple selective pressures within and between hosts. Here, a deep-learning generative framework for real-world prevalent SARS-CoV-2 mutation prediction, named ViralForesight, is developed on top of protein language models and in silico virus evolution. Through the paradigm of host-to-herd in silico virus evolution, ViralForesight reproduced previous real-world prevalent SARS-CoV-2 mutations for multiple lineages with superior performance. More importantly, ViralForesight correctly predicted the future prevalent mutations that dominated the COVID-19 pandemic in the real world more than half a year in advance with in vitro experimental validation. Overall, ViralForesight demonstrates a proactive approach to the prevention of emerging viral infections, accelerating the process of discovering future prevalent mutations with the power of generative deep learning.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12204194/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144324526","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}