{"title":"GKNnet: an relational graph convolutional network-based method with knowledge-augmented activation layer for microbial structural variation detection.","authors":"Fengyi Guo, Yuanbo Li, Hongyuan Zhao, Xiaogang Liu, Jian Mao, Dongna Ma, Shuangping Liu","doi":"10.1093/bib/bbaf200","DOIUrl":"https://doi.org/10.1093/bib/bbaf200","url":null,"abstract":"<p><p>Structural variants (SVs) in microbial genomes play a critical role in phenotypic changes, environmental adaptation, and species evolution, with deletion variations particularly closely linked to phenotypic traits. Therefore, accurate and comprehensive identification of deletion variations is essential. Although long-read sequencing technology can detect more SVs, its high error rate introduces substantial noise, leading to high false-positive and low recall rates in existing SV detection algorithms. This paper presents an SV detection method based on graph convolutional networks (GCNs). The model first represents node features through a heterogeneous graph, leveraging the GCN to precisely identify variant regions. Additionally, a knowledge-augmented activation layer (KANLayer) with a learnable activation function is introduced to reduce noise around variant regions, thereby improving model precision and reducing false positives. A clustering algorithm then aggregates multiple overlapping regions near the variant center into a single accurate SV interval, further enhancing recall. Validation on both simulated and real datasets demonstrates that our method achieves superior F1 scores compared to benchmark methods (cuteSV, Sniffles, Svim, and Pbsv), highlighting its advantage and robustness in SV detection and offering an innovative solution for microbial genome structural variation research.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12052243/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143954334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lauren Theunissen, Thomas Mortier, Yvan Saeys, Willem Waegeman
{"title":"Evaluation of out-of-distribution detection methods for data shifts in single-cell transcriptomics.","authors":"Lauren Theunissen, Thomas Mortier, Yvan Saeys, Willem Waegeman","doi":"10.1093/bib/bbaf239","DOIUrl":"10.1093/bib/bbaf239","url":null,"abstract":"<p><p>Automatic cell-type annotation methods assign cell-type labels to new, unlabeled datasets by leveraging relationships from a reference RNA-seq atlas. However, new datasets may include labels absent from the reference dataset or exhibit feature distributions that diverge from it. These scenarios can significantly affect the reliability of cell type predictions, a factor often overlooked in current automatic annotation methods. The field of out-of-distribution detection (OOD), primarily focused on computer vision, addresses the identification of instances that differ from the training distribution. Therefore, the implementation of OOD methods in the context of novel cell type annotation and data shift detection for single-cell transcriptomics may enhance annotation accuracy and trustworthiness. We evaluate six OOD detection methods: LogitNorm, MC dropout, Deep Ensembles, Energy-based OOD, Deep NN, and Posterior networks, for their annotation and OOD detection performance in both synthetical and real-life application settings. We show that OOD detection methods can accurately identify novel cell types and demonstrate potential to detect significant data shifts in non-integrated datasets. Moreover, we find that integration of the OOD datasets does not interfere with OOD detection of novel cell types.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121363/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Fei Ye, Mao Chen, Yixuan Huang, Ruihao Zhang, Xuqi Li, Xiuyuan Wang, Sanyang Han, Lan Ma, Xiao Liu
{"title":"LightCTL: lightweight contrastive TCR-pMHC specificity learning with context-aware prompt.","authors":"Fei Ye, Mao Chen, Yixuan Huang, Ruihao Zhang, Xuqi Li, Xiuyuan Wang, Sanyang Han, Lan Ma, Xiao Liu","doi":"10.1093/bib/bbaf246","DOIUrl":"10.1093/bib/bbaf246","url":null,"abstract":"<p><p>Identification of T cell receptor (TCR) specificities for antigens from large-scale single-cell or bulk TCR repertoire data plays a vital role in disease diagnosis and immunotherapy. In silico prediction models have emerged in recent years. However, the generalizability and transferability of current computational models remain significant hurdles in accurately predicting TCR-pMHC binding specificity, primarily due to the limited availability of experimental data and the vast diversity of TCR sequences. In this paper, we propose a lightweight contrastive TCR-pMHC learning with context-aware prompts, named LightCTL, to infer TCR-pMHC binding specificity. For each TCR and peptide-MHC sequence, we utilize a TCR encoding module and a pMHC encoding module to transform them into latent representations. Specifically, we introduce a contrastive TCR-pMHC learning paradigm to enhance the generalization ability of TCR-pMHC binding specificity prediction by learning the matching relationship between TCR-pMHC and MHC-peptide. We fuse the TCR and pMHC latent representations and employ a novel context-aware prompt module to consider the varying importance of different feature maps. Compared with existing methods, LightCTL substantially improves the accuracy of predicting TCR-pMHC binding specificity. Moreover, comparative experiments across eight independent datasets demonstrate the generalization ability of LightCTL, showing superior performance for predicting unknown TCR-pMHC pairs. Finally, we assess LightCTL's efficacy across different TCR sequence lengths and distinct unseen epitopes, as well as estimate cytomegalovirus-specific TCR diversity and clone frequency from peripheral TCR repertoire data. Overall, our findings highlight LightCTL as a versatile analytical method for advancing novel T-cell therapies and identifying novel biomarkers for disease diagnosis.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121355/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172783","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: Bioinformatics in Russia: history and present-day landscape.","authors":"","doi":"10.1093/bib/bbaf180","DOIUrl":"10.1093/bib/bbaf180","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12101723/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144131837","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Ruilin Su, Binyang Huang, Junyan Tan, Zhencai Shen, Ping Zhong, Jianfeng Liu
{"title":"Mutual information stacking method for prediction of the growth traits in pigs.","authors":"Ruilin Su, Binyang Huang, Junyan Tan, Zhencai Shen, Ping Zhong, Jianfeng Liu","doi":"10.1093/bib/bbaf231","DOIUrl":"10.1093/bib/bbaf231","url":null,"abstract":"<p><p>Genomic prediction is a crucial technique for phenotype estimation, with the genomic best linear unbiased prediction (GBLUP) being the most widely adopted method. Yet, GBLUP falls short in capturing the intricate nonlinear relationships between genomic data and phenotypes. Given its ability to more effectively capture nonlinear genetic effects, machine learning (ML) has become increasingly appealing in genomic prediction. However, almost GBLUP and ML methods utilize all single nucleotide polymorphisms (SNPs) data for prediction, ignoring the fact that only a subset of SNPs are effective. This not only consumes computation time but also has poor prediction accuracy. So, this paper proposed a mutual information stacking method (MISM). Firstly, mutual information was introduced to select the SNPs with effect and remove the redundant SNPs. Then, we constructed a stacking model that can capture both linear and nonlinear relationships between SNPs and phenotypes to improve the prediction accuracy. To assess the effectiveness of MISM, we compared its performance on pig growth traits with GBLUP and other ML methods. The statistical analysis results indicated that MISM outperformed other ML models and GBLUP.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12104626/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141454","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lili Liu, Haixiang Zhang, Yinan Zheng, Tao Gao, Cheng Zheng, Kai Zhang, Lifang Hou, Lei Liu
{"title":"High-dimensional mediation analysis for longitudinal mediators and survival outcomes.","authors":"Lili Liu, Haixiang Zhang, Yinan Zheng, Tao Gao, Cheng Zheng, Kai Zhang, Lifang Hou, Lei Liu","doi":"10.1093/bib/bbaf206","DOIUrl":"10.1093/bib/bbaf206","url":null,"abstract":"<p><p>Mediation analysis with high-dimensional mediators is crucial for identifying epigenetic pathways linking environmental exposures to health outcomes. However, high-dimensional mediation analysis methods for longitudinal mediators and a survival outcome remain underdeveloped. This study fills that gap by introducing a method that captures mediation effects over time using multivariate, longitudinally measured time-varying mediators. Our approach uses a longitudinal mixed effects model to examine the relationship between the exposure and the mediating process. We connect the mediating process to the survival outcome using a Cox proportional hazards model with time-varying mediators. To handle high-dimensional data, we first employ a mediation-based sure independence screening method for dimension reduction. A Lasso inference procedure is further utilized to identify significant time-varying mediators. We adopt a joint significance test to accurately control the family wise error rate in testing high-dimensional mediation hypotheses. Simulation studies and an analysis of the Coronary Artery Risk Development in Young Adults Study demonstrate the utility and validity of our method.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12066418/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143969975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Jelard Aquino, Daniel Witoslawski, Steve Park, Jessica Holder, Amei Amei, Mira V Han
{"title":"A novel splicing graph allows a direct comparison between exon-based and splice junction-based approaches to alternative splicing detection.","authors":"Jelard Aquino, Daniel Witoslawski, Steve Park, Jessica Holder, Amei Amei, Mira V Han","doi":"10.1093/bib/bbaf204","DOIUrl":"10.1093/bib/bbaf204","url":null,"abstract":"<p><p>There are primarily two computational approaches to alternative splicing (AS) detection using short reads: splice junction-based and exon-based approaches. Despite their shared goal of addressing the same biological problem, these approaches have not been reconciled before. We devised a novel graph structure and algorithm aimed at mapping between the exonic parts and splicing events detected by the two different methods. Through simulations, we demonstrated disparities in sensitivity and specificity between splice junction-based and exon-based methods. When applied to empirical data, there were large discrepancies in the results, suggesting that the methods are complementary. With the discrepancies localized to individual events and exonic parts, we were able to gain insights into the strengths and weaknesses inherent in each approach. Finally, we integrated the results to generate a comprehensive list of both common and unique AS events detected by both methodologies.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062524/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143959545","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SVHunter: long-read-based structural variation detection through the transformer model.","authors":"Runtian Gao, Heng Hu, Zhongjun Jiang, Shuqi Cao, Guohua Wang, Yuming Zhao, Tao Jiang","doi":"10.1093/bib/bbaf203","DOIUrl":"10.1093/bib/bbaf203","url":null,"abstract":"<p><p>Structural variations (SVs) are genomic rearrangements larger than 50 bp, that are widely present in the human genome and are associated with various complex diseases. Existing long-read-based SV detection tools often rely on fixed rules or heuristic algorithms, which can oversimplify the complexity of SV signatures. Therefore, these methods usually lack flexibility and cannot fully capture SV signals, leading to reduced accuracy and robustness. To address these issues, we propose SVHunter, a transformer-based method for long-read SV detection. SVHunter combines convolutional neural networks and transformers to capture both local and global SV signatures, enabling accurate identification of SVs. Additionally, SVHunter employs the mean shift clustering algorithm, which dynamically adjusts bandwidth parameters to accommodate different types of SVs without requiring a preset number of clusters, thus allowing precise breakpoint clustering. Validation across multiple sequencing platforms and datasets demonstrates that SVHunter excels at detecting various types of SVs, with a notable reduction in the false discovery rate. This highlights considerable strong potential for both research and clinical applications.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12062572/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143980874","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Robustness and resilience of computational deconvolution methods for bulk RNA sequencing data.","authors":"Su Xu, Duan Chen, Xue Wang, Shaoyu Li","doi":"10.1093/bib/bbaf264","DOIUrl":"10.1093/bib/bbaf264","url":null,"abstract":"<p><p>This study benchmarks the robustness and resilience of computational deconvolution methods for estimating cell-type proportions in bulk tissues, with a focus on comparing reference-based and reference-free methods. Robustness is evaluated by generating in silico pseudo-bulk tissue RNA sequencing data from cell-level gene expression profiles derived from four different tissue types, with simulated cellular composition at varying levels of heterogeneity. To assess resilience, we intentionally alter single-cell RNA profiles to create pseudo-bulk tissue RNA-seq data. Deconvolution estimates are compared with ground truth using Pearson's correlation coefficient, root mean squared deviation, and mean absolute deviation. The results show that reference-based methods are more robust when reliable reference data are available, whereas reference-free methods excel in scenarios lacking suitable reference data. Furthermore, variations in cell-level transcriptomic profiles and cell composition have emerged as critical factors influencing the performance of deconvolution methods. This study provides significant insights into the factors affecting bulk tissue deconvolution performance, which are essential for guiding users and advancing the development of more powerful and reliable algorithms in the future.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274196","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sina Majidian, Stephen Hwang, Mohsen Zakeri, Ben Langmead
{"title":"EvANI benchmarking workflow for evolutionary distance estimation.","authors":"Sina Majidian, Stephen Hwang, Mohsen Zakeri, Ben Langmead","doi":"10.1093/bib/bbaf267","DOIUrl":"10.1093/bib/bbaf267","url":null,"abstract":"<p><p>Advances in long-read sequencing technology have led to a rapid increase in high-quality genome assemblies. These make it possible to compare genome sequences across the Tree of Life, deepening our understanding of evolutionary relationships. Average nucleotide identity (ANI) is a metric for estimating the genetic similarity between two genomes, usually calculated as the mean identity of their shared genomic regions. These regions are typically found with genome aligners like Basic Local Alignment Search Tool BLAST or MUMmer. ANI has been applied to species delineation, building guide trees, and searching large sequence databases. Since computing ANI via genome alignment is computationally expensive, the field has increasingly turned to sketch-based approaches that use assumptions and heuristics to speed this up. We propose a suite of simulated and real benchmark datasets, together with a rank-correlation-based metric, to study how these assumptions and heuristics impact distance estimates. We call this evaluation framework EvANI. With EvANI, we show that ANIb is the ANI estimation algorithm that best captures tree distance, though it is also the least efficient. We show that k-mer-based approaches are extremely efficient and have consistently strong accuracy. We also show that some clades have inter-sequence distances that are best computed using multiple values of $k$, e.g. $k=10$ and $k=19$ for Chlamydiales. Finally, we highlight that approaches based on maximal exact matches may represent an advantageous compromise, achieving an intermediate level of computational efficiency while avoiding over-reliance on a single fixed k-mer length.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159288/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274194","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}