{"title":"De-motif sampling: an approach to decompose hierarchical motifs with applications in T cell recognition.","authors":"Xinyi Tang, Ran Liu","doi":"10.1093/bib/bbaf221","DOIUrl":"10.1093/bib/bbaf221","url":null,"abstract":"<p><p>T cell immune recognition requires the interactions among antigen peptides, Major Histocompatibility Complex (MHC) molecules, and T cell receptors (TCRs). While research into the interactions between MHC and peptides is well established, the specific preferences of TCRs for peptides remain less understood. This gap largely stems from the requirement that antigen peptides must be bound to MHC and presented on the cell surface prior to recognition by TCRs. Typically, motifs related to TCR recognition are influenced by MHC characteristics, limiting the direct identification of TCR-specific motifs. To address this challenge, this study introduces a Bayesian method designed to decompose hierarchical motifs independently of MHC constraints. This model, rigorously tested through comprehensive simulation experiments and applied to real data, establishes a clear hierarchical structure for motifs related to T cell recognition.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12082833/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144076073","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rhys Gillman, Matt A Field, Ulf Schmitz, Lionel Hebbard
{"title":"TARGET-SL: precision essential gene prediction using driver prioritisation and synthetic lethality.","authors":"Rhys Gillman, Matt A Field, Ulf Schmitz, Lionel Hebbard","doi":"10.1093/bib/bbaf255","DOIUrl":"10.1093/bib/bbaf255","url":null,"abstract":"<p><p>The ability to identify patient-specific vulnerabilities to guide cancer treatments is a vital area of research. However, predictive bioinformatics tools are difficult to translate into clinical applications due to a lack of in vitro and in vivo validation. While the increasing number of personalised driver prioritisation algorithms (PDPAs) report powerful patient-specific information, the results do not easily translate into treatment strategies. Critical in addressing this gap is the ability to meaningfully benchmark and validate PDPA predictions. To address this, we developed Tumour-specific Algorithm for Ranking GEnetic Targets via Synthetic Lethality (TARGET-SL), which utilises PDPA predictions to produce a ranked list of predicted essential genes that can be validated in vitro and in vivo. This framework employs a novel strategy to benchmark PDPAs, by comparing predictions with ground truth gene essentiality data from large-scale CRISPR-knockout and drug sensitivity screens. Importantly TARGET-SL identifies vulnerabilities that are more exclusive to individual tumours than predictions based on canonical driver genes. We further find that TARGET-SL is better at identifying sample-specific vulnerabilities than other similar tools.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12145226/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144246529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scATD: a high-throughput and interpretable framework for single-cell cancer drug resistance prediction and biomarker identification.","authors":"Murong Zhou, Zeyu Luo, Yu-Hang Yin, Qiaoming Liu, Guohua Wang, Yuming Zhao","doi":"10.1093/bib/bbaf268","DOIUrl":"10.1093/bib/bbaf268","url":null,"abstract":"<p><p>Transfer learning has been widely applied to drug sensitivity prediction based on single-cell RNA sequencing, leveraging knowledge from large datasets of cancer cell lines or other sources to improve the prediction of drug responses. However, previous studies require model fine-tuning for different patient single-cell datasets, limiting their ability to meet the clinical need for high-throughput rapid prediction. In this research, we introduce single-cell Adaptive Transfer and Distillation model (scATD), a transfer learning framework leveraging large language models for high-throughput drug sensitivity prediction. Based on different large language models (scFoundation and Geneformer) and transfer strategies, scATD includes three distinct sub-models: scATD-sf, scATD-gf, and scATD-sf-dist. scATD-sf and scATD-gf employs an important bidirectional style transfer to enable predictions for new patients without model parameter training. Additionally, scATD-sf-dist uses knowledge distillation from large models to enhance prediction performance, improve efficiency, and reduce resource requirements. Benchmarking across more diverse datasets demonstrates scATD's superior accuracy, generalization and efficiency. Besides, by rigorously selecting reference background samples for feature attribution algorithms, scATD also provides more meaningful insights into the relationship between gene expression and drug resistance mechanisms. Making scATD more interpretability for addressing critical challenges in precision oncology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12159290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144274197","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MMsurv: a multimodal multi-instance multi-cancer survival prediction model integrating pathological images, clinical information, and sequencing data.","authors":"Hailong Yang, Jia Wang, Wenyan Wang, Shufang Shi, Lijing Liu, Yuhua Yao, Geng Tian, Peizhen Wang, Jialiang Yang","doi":"10.1093/bib/bbaf209","DOIUrl":"10.1093/bib/bbaf209","url":null,"abstract":"<p><p>Accurate prediction of patient survival rates in cancer treatment is essential for effective therapeutic planning. Unfortunately, current models often underutilize the extensive multimodal data available, affecting confidence in predictions. This study presents MMSurv, an interpretable multimodal deep learning model to predict survival in different types of cancer. MMSurv integrates clinical information, sequencing data, and hematoxylin and eosin-stained whole-slide images (WSIs) to forecast patient survival. Specifically, we segment tumor regions from WSIs into image tiles and employ neural networks to encode each tile into one-dimensional feature vectors. We then optimize clinical features by applying word embedding techniques, inspired by natural language processing, to the clinical data. To better utilize the complementarity of multimodal data, this study proposes a novel fusion method, multimodal fusion method based on compact bilinear pooling and transformer, which integrates bilinear pooling with Transformer architecture. The fused features are then processed through a dual-layer multi-instance learning model to remove prognosis-irrelevant image patches and predict each patient's survival risk. Furthermore, we employ cell segmentation to investigate the cellular composition within the tiles that received high attention from the model, thereby enhancing its interpretive capacity. We evaluate our approach on six cancer types from The Cancer Genome Atlas. The results demonstrate that utilizing multimodal data leads to higher predictive accuracy compared to using single-modal image data, with an average C-index increase from 0.6750 to 0.7283. Additionally, we compare our proposed baseline model with state-of-the-art methods using the C-index and five-fold cross-validation approach, revealing a significant average improvement of nearly 10% in our model's performance.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12077396/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144075688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Feifei Xia, Max Adriaan Verbiest, Oxana Lundström, Tugce Bilgin Sonay, Michael Baudis, Maria Anisimova
{"title":"Multicancer analyses of short tandem repeat variations reveal shared gene regulatory mechanisms.","authors":"Feifei Xia, Max Adriaan Verbiest, Oxana Lundström, Tugce Bilgin Sonay, Michael Baudis, Maria Anisimova","doi":"10.1093/bib/bbaf219","DOIUrl":"10.1093/bib/bbaf219","url":null,"abstract":"<p><p>Short tandem repeats (STRs) have been reported to influence gene expression across various human tissues. While STR variations are enriched in colorectal, stomach, and endometrial cancers, particularly in microsatellite instable tumors, their functional effects and regulatory mechanisms on gene expression remain poorly understood across these cancer types. Here, we leverage whole-exome sequencing and gene expression data to identify STRs for which repeat lengths are associated with the expression of nearby genes (eSTRs) in colorectal, stomach, and endometrial tumors. While most eSTRs are cancer-specific, shared eSTRs across multiple cancers exhibit consistent effects on gene expression. Notably, coding-region eSTRs identified in all three cancer types show positive correlations with nearby gene expression. We further validate the functional effects of eSTRs by demonstrating associations between somatic eSTR mutations and gene expression changes during the transition from normal to tumor tissues, suggesting their potential roles in tumorigenesis. Combined with DNA methylation data, we perform the first quantitative analysis of the interplay between STR variations and DNA methylation in tumors. We identify eSTRs where repeat lengths are associated with methylation levels of nearby CpG sites (meSTRs) and show that >70% of eSTRs are significantly linked to local DNA methylation. Importantly, the effects of meSTRs on DNA methylation remain consistent across cancer types. Overall, our findings enhance the understanding of how functional STR variations influence gene expression and DNA methylation. Our study highlights shared regulatory mechanisms of STRs across multiple cancers, offering a foundation for future research into their broader implications in tumor biology.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12096010/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144118817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Advances in multi-trait genomic prediction approaches: classification, comparative analysis, and perspectives.","authors":"Alain J Mbebi, Facundo Mercado, David Hobby, Hao Tong, Zoran Nikoloski","doi":"10.1093/bib/bbaf211","DOIUrl":"10.1093/bib/bbaf211","url":null,"abstract":"<p><p>Traits in any organism are not independent, but show considerable integration, observed in a form of couplings and trade-offs. Therefore, improvement in one trait may affect other traits, often in undesired direction. To account for this problem, crop breeding increasingly relies on multi-trait genomic prediction (MT-GP) approaches that leverage the availability of genetic markers from different populations along with advances in high-throughput precision phenotyping. While significant progress has been made to jointly model multiple traits using a variety of statistical and machine learning approaches, there is no systematic comparison of advantages and shortcomings of the existing classes of MT-GP models. Here, we fill this knowledge gap by first classifying the existing MT-GP models and briefly summarizing their general principles, modeling assumptions, and potential limitations. We then perform an extensive comparative analysis with 10 traits measured in an Oryza sativa diversity panel using cross-validation scenarios relevant in breeding practice. Finally, we discuss directions that can enable the building of next generation MT-GP models in addressing pressing challenges in crop breeding.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070487/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143961401","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sara H Mohamed, Mohamed Hamed, Hussain A Alamoudi, Zayd Jastaniah, Fadhl M Alakwaa, Asmaa Reda
{"title":"Multi-omics analysis of Helicobacter pylori-associated gastric cancer identifies hub genes as a novel therapeutic biomarker.","authors":"Sara H Mohamed, Mohamed Hamed, Hussain A Alamoudi, Zayd Jastaniah, Fadhl M Alakwaa, Asmaa Reda","doi":"10.1093/bib/bbaf241","DOIUrl":"10.1093/bib/bbaf241","url":null,"abstract":"<p><p>Helicobacter pylori infection is one of the most common gastric pathogens; however, the molecular mechanisms driving its progression to gastric cancer remain poorly understood. This study aimed to identify the key transcriptomic drivers and therapeutic targets of H. pylori-associated gastric cancer through an integrative transcriptomic analysis. This analysis integrates microarray and RNA-seq datasets to identify significant differentially expressed genes (DEGs) involved in the progression of H. pylori-associated gastric cancer. In addition to independent analyses, data were integrated using ComBat to detect consistent expression patterns of hub genes. This approach revealed distinct clustering patterns and stage-specific transcriptional changes in common DEGs across disease progression, including H. pylori infection, gastritis, atrophy, and gastric cancer. Genes such as TPX2, MKI67, EXO1, and CTHRC1 exhibited progressive upregulation from infection to cancer, highlighting involvement in cell cycle regulation, DNA repair, and extracellular matrix remodeling. These findings provide insights into molecular shifts linking inflammation-driven infection to malignancy. Furthermore, network analysis identified hub genes, including CXCL1, CCL20, IL12B, and STAT4, which are enriched in immune pathways such as chemotaxis, leukocyte migration, and cytokine signaling. This emphasizes their role in immune dysregulation and tumor development. Expression profiling demonstrated the upregulation of hub genes in gastric cancer and stage-specific changes correlating with disease progression. Finally, drug-gene interaction analysis identified therapeutic opportunities, with hub genes interacting with approved drugs like abatacept and zoledronic acid, as well as developmental drugs such as adjuvant and relapladib. These findings highlight the key role of these hub genes as biomarkers and therapeutic targets, providing a foundation for advancing precision medicine in H. pylori-associated gastric cancer. Overall, this study paves the way for advancing precision medicine in H. pylori-associated gastric cancer by providing insights into the development of early detection biomarkers, risk stratification, and targeted therapies. This supports the clinical translation of precision medicine strategies in H. pylori-associated gastric cancer.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12123523/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186602","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Alex McSweeney-Davis, Chengran Fang, Emmanuel Caruyer, Anne Kerbrat, Jing-Rebecca Li
{"title":"Alpha_Mesh_Swc: automatic and robust surface mesh generation from the skeleton description of brain cells.","authors":"Alex McSweeney-Davis, Chengran Fang, Emmanuel Caruyer, Anne Kerbrat, Jing-Rebecca Li","doi":"10.1093/bib/bbaf258","DOIUrl":"10.1093/bib/bbaf258","url":null,"abstract":"<p><p>In recent years, there has been a significant increase in publicly available skeleton descriptions of real brain cells from laboratories all over the world. In theory, this should make it is possible to perform large-scale realistic simulations on brain cells. However, currently there is still a gap between the skeleton descriptions and high-quality simulation-ready surface and volume meshes of brain cells. We propose and implement a tool called Alpha_Mesh_Swc (AMS) to generate automatically and efficiently triangular surface meshes that are optimized for finite element simulations. We use an Alpha Wrapping method with an offset parameter on component surface meshes to efficiently generate a global watertight mesh. Then mesh simplification and re-meshing are used to produce an optimal surface mesh. Our methodology limits the number of surface triangles, while preserving geometrical accuracy, permit cutting, and gluing of cell components, is robust to imperfect skeleton descriptions and allows mixed cell descriptions (surface meshes combined with skeletons). We compared the robustness, performance and accuracy of AMS against existing tools and found significant improvement in terms of mesh accuracy. We show, on average, we can generate fully automatically a brain cell (neurons or glia) surface mesh in a couple of minutes on a laptop computer resulting in a simplified surface mesh with only around 10k nodes. The resulting meshes were used to perform diffusion MRI simulations in neurons and microglia. The code and a number of sample brain cell surface meshes have been made publicly available.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12146268/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144246483","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Gated-GPS: enhancing protein-protein interaction site prediction with scalable learning and imbalance-aware optimization.","authors":"Xin Gao, Hanqun Cao, Jinpeng Li, Jiezhong Qiu, Guangyong Chen, Pheng-Ann Heng","doi":"10.1093/bib/bbaf248","DOIUrl":"10.1093/bib/bbaf248","url":null,"abstract":"<p><p>In protein-protein interaction site (PPIS) prediction, existing machine learning models struggle with small datasets, limiting their predictive accuracy for unseen proteins. Additionally, class imbalance in protein complexes, where binding residues constitute a small fraction of all residues, hinders model performance. To address these challenges, we constructed a training dataset 9$times $ larger than previous benchmarks by filtering the latest protein-protein complex data, improving diversity and generalization. We propose Gated-GPS, a Graph Transformer model with a novel gating mechanism designed to effectively leverage this expanded dataset. Additionally, we integrate cross-entropy loss with Tversky Loss to adjust sensitivity to positive and negative samples, mitigating class imbalance by emphasizing underrepresented binding residues. Experimental results show that Gated-GPS outperforms state-of-the-art (SOTA) models across four test sets. Notably, on the UBTest dataset, designed to evaluate generalization on unbounded proteins, our method improves MCC and AUPRC by 18.5% and 21.4%, respectively, over the previous SOTA. In a case study of snake venom toxin-protein interactions, our model accurately identified interaction sites, demonstrating its potential for therapeutic design and advancing the understanding of complex protein interactions.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12133684/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144214975","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Zhiyue Tom Hu, Yaodong Yu, Ruoqiao Chen, Shan-Ju Yeh, Bin Chen, Haiyan Huang
{"title":"Large-scale information retrieval and correction of noisy pharmacogenomic datasets through residual thresholded deep matrix factorization.","authors":"Zhiyue Tom Hu, Yaodong Yu, Ruoqiao Chen, Shan-Ju Yeh, Bin Chen, Haiyan Huang","doi":"10.1093/bib/bbaf226","DOIUrl":"10.1093/bib/bbaf226","url":null,"abstract":"<p><p>Pharmacogenomics studies are attracting an increasing amount of interest from researchers in precision medicine. The advances in high-throughput experiments and multiplexed approaches allow the large-scale quantification of drug sensitivities in molecularly characterized cancer cell lines (CCLs), resulting in a number of open drug sensitivity datasets for drug biomarker discovery. However, a significant inconsistency in drug sensitivity values among these datasets has been noted. Such inconsistency indicates the presence of substantial noise, subsequently hindering downstream analyses. To address the noise in drug sensitivity data, we introduce a robust and scalable deep learning framework, Residual Thresholded Deep Matrix Factorization (RT-DMF). This method takes a single drug sensitivity data matrix as its sole input and outputs a corrected and imputed matrix. Deep matrix factorization (DMF) excels at uncovering subtle patterns, due to its minimal reliance on data structure assumptions. This attribute significantly boosts DMF's ability to identify complex hidden patterns among nuisance effects in the data, thereby facilitating the detection of signals that are therapeutically relevant. Furthermore, RT-DMF incorporates an iterative residual thresholding procedure, which plays a crucial role in retaining signals more likely to hold therapeutic importance. Validation using simulated datasets and real pharmacogenomics datasets demonstrates the effectiveness of our approach in correcting noise and imputing missing data in drug sensitivity datasets (open-source package available at https://github.com/tomwhoooo/rtdmf).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 3","pages":""},"PeriodicalIF":6.8,"publicationDate":"2025-05-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12106859/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149295","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}