Long Xu, Qiang Yang, Weihe Dong, Xiaokun Li, Kuanquan Wang, Suyu Dong, Xianyu Zhang, Tiansong Yang, Gongning Luo, Xingyu Liao, Xin Gao, Guohua Wang
{"title":"Meta learning for mutant HLA class I epitope immunogenicity prediction to accelerate cancer clinical immunotherapy.","authors":"Long Xu, Qiang Yang, Weihe Dong, Xiaokun Li, Kuanquan Wang, Suyu Dong, Xianyu Zhang, Tiansong Yang, Gongning Luo, Xingyu Liao, Xin Gao, Guohua Wang","doi":"10.1093/bib/bbae625","DOIUrl":"10.1093/bib/bbae625","url":null,"abstract":"<p><p>Accurate prediction of binding between human leukocyte antigen (HLA) class I molecules and antigenic peptide segments is a challenging task and a key bottleneck in personalized immunotherapy for cancer. Although existing prediction tools have demonstrated significant results using established datasets, most can only predict the binding affinity of antigenic peptides to HLA and do not enable the immunogenic interpretation of new antigenic epitopes. This limitation results from the training data for the computational models relying heavily on a large amount of peptide-HLA (pHLA) eluting ligand data, in which most of the candidate epitopes lack immunogenicity. Here, we propose an adaptive immunogenicity prediction model, named MHLAPre, which is trained on the large-scale MS-derived HLA I eluted ligandome (mostly presented by epitopes) that are immunogenic. Allele-specific and pan-allelic prediction models are also provided for endogenous peptide presentation. Using a meta-learning strategy, MHLAPre rapidly assessed HLA class I peptide affinities across the whole pHLA pairs and accurately identified tumor-associated endogenous antigens. During the process of adaptive immune response of T-cells, pHLA-specific binding in the antigen presentation is only a pre-task for CD8+ T-cell recognition. The key factor in activating the immune response is the interaction between pHLA complexes and T-cell receptors (TCRs). Therefore, we performed transfer learning on the pHLA model using the pHLA-TCR dataset. In pHLA binding task, MHLAPre demonstrated significant improvement in identifying neoepitope immunogenicity compared with five state-of-the-art models, proving its effectiveness and robustness. After transfer learning of the pHLA-TCR data, MHLAPre also exhibited relatively superior performance in revealing the mechanism of immunotherapy. MHLAPre is a powerful tool to identify neoepitopes that can interact with TCR and induce immune responses. We believe that the proposed method will greatly contribute to clinical immunotherapy, such as anti-tumor immunity, tumor-specific T-cell engineering, and personalized tumor vaccine.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11630330/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827301","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tong Lu, Wei Guo, Wei Guo, Wangyang Meng, Tianyi Han, Zizhen Guo, Chengqiang Li, Shugeng Gao, Youqiong Ye, Hecheng Li
{"title":"A novel computational model ITHCS for enhanced prognostic risk stratification in ESCC by correcting for intratumor heterogeneity.","authors":"Tong Lu, Wei Guo, Wei Guo, Wangyang Meng, Tianyi Han, Zizhen Guo, Chengqiang Li, Shugeng Gao, Youqiong Ye, Hecheng Li","doi":"10.1093/bib/bbae631","DOIUrl":"10.1093/bib/bbae631","url":null,"abstract":"<p><p>Intratumor heterogeneity significantly challenges the accuracy of existing prognostic models for esophageal squamous cell carcinoma (ESCC) by introducing biases related to the varied genetic and molecular landscapes within tumors. Traditional models, relying on single-sample, single-region bulk RNA sequencing, fall short of capturing the complexity of intratumor heterogeneity. To fill this gap, we developed a computational model for intratumor heterogeneity corrected signature (ITHCS) by employing both multiregion bulk and single-cell RNA sequencing to pinpoint genes that exhibit consistent expression patterns across different tumor regions but vary significantly among patients. Utilizing these genes, we applied multiple machine-learning algorithms for sophisticated feature selection and model construction. The ITHCS model significantly outperforms existing prognostic indicators in accuracy and generalizability, markedly reducing sampling biases caused by intratumor heterogeneity. This improvement is especially notable in the prognostic assessment of early-stage ESCC patients, where the model exhibits exceptional predictive power. Additionally, we found that the risk score based on ITHCS may be associated with epithelial-mesenchymal transition characteristics, indicating that high-risk patients may exhibit a diminished efficacy to immunotherapy.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11652613/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142845781","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Willem Stock, Coralie Rousseau, Glen Dierickx, Sofie D'hondt, Luz Amadei Martínez, Simon M Dittami, Luna M van der Loos, Olivier De Clerck
{"title":"Breaking free from references: a consensus-based approach for community profiling with long amplicon nanopore data.","authors":"Willem Stock, Coralie Rousseau, Glen Dierickx, Sofie D'hondt, Luz Amadei Martínez, Simon M Dittami, Luna M van der Loos, Olivier De Clerck","doi":"10.1093/bib/bbae642","DOIUrl":"10.1093/bib/bbae642","url":null,"abstract":"<p><p>Third-generation sequencing platforms, such as Oxford Nanopore Technology (ONT), have made it possible to characterize communities through the sequencing of long amplicons. While this theoretically allows for an increased taxonomic resolution compared to short-read sequencing platforms such as Illumina, the high error rate remains problematic for accurately identifying the community members present within a sample. Here, we present and validate CONCOMPRA, a tool that allows the detection of closely related strains within a community by drafting and mapping to consensus sequences. We show that CONCOMPRA outperforms several other tools for profiling bacterial communities using full-length 16S rRNA gene sequencing. Since CONCOMPRA does not rely on a sequence database for profiling communities, it is applicable to systems and amplicons for which little to no reference data exists. Our validation test shows that the amplification of long PCR products is likely to produce chimeric byproducts that inflate alpha diversity and skew community structure, stressing the importance of chimera detection. CONCOMPRA is available on GitHub (https://github.com/willem-stock/CONCOMPRA).</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11647271/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142827374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Mingzhi Yuan, Ao Shen, Yingfan Ma, Jie Du, Bohan An, Manning Wang
{"title":"ProteinF3S: boosting enzyme function prediction by fusing protein sequence, structure, and surface.","authors":"Mingzhi Yuan, Ao Shen, Yingfan Ma, Jie Du, Bohan An, Manning Wang","doi":"10.1093/bib/bbae695","DOIUrl":"10.1093/bib/bbae695","url":null,"abstract":"<p><p>Proteins can be represented in different data forms, including sequence, structure, and surface, each of which has unique advantages and certain limitations. It is promising to fuse the complementary information among them. In this work, we propose a framework called ProteinF3S for enzyme function prediction that fuses the complementary information across protein sequence, structure, and surface. To achieve more effective fusion, we propose a multi-scale bidirectional fusion strategy between protein structure and surface, in which the hierarchical features of a surface encoder and a structure encoder interact with each other bidirectionally. Based on these interactions, more distinctive features can be obtained. After that, we achieve further fusion by concatenating the sequence features with the features containing structure and surface information, so that better performance can be achieved. To validate our method, we conduct extensive experiments on tasks including enzyme reaction classification and enzyme commission number prediction. Our method achieves new state-of-the-art performance and shows that fusing different forms of data is effective in enzyme function prediction.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11697223/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142920885","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A framework of multi-view machine learning for biological spectral unmixing of fluorophores with overlapping excitation and emission spectra.","authors":"Ruogu Wang, Yunlong Feng, Alex M Valm","doi":"10.1093/bib/bbaf005","DOIUrl":"10.1093/bib/bbaf005","url":null,"abstract":"<p><p>The accuracy of assigning fluorophore identity and abundance, known as spectral unmixing, in biological fluorescence microscopy images remains a significant challenge due to the substantial overlap in emission spectra among fluorophores. In traditional laser scanning confocal spectral microscopy, fluorophore information is acquired by recording emission spectra with a single combination of discrete excitation wavelengths. However, organic fluorophores possess characteristic excitation spectra in addition to their unique emission spectral signatures. In this paper, we propose a generalized multi-view machine learning approach that leverages both excitation and emission spectra to significantly improve the accuracy in differentiating multiple highly overlapping fluorophores in a single image. By recording emission spectra of the same field with multiple combinations of excitation wavelengths, we obtain data representing different views of the underlying fluorophore distribution in the sample. We then propose a multi-view machine learning framework that allows for the flexible incorporation of noise information and abundance constraints, enabling the extraction of spectral signatures from reference images and efficient recovery of corresponding abundances in unknown mixed images. Numerical experiments on simulated image data demonstrate the method's efficacy in improving accuracy, allowing for the discrimination of 100 fluorophores with highly overlapping spectra. Furthermore, validation on images of mixtures of fluorescently labeled Escherichia coli highlights the power of the proposed multi-view strategy in discriminating fluorophores with spectral overlap in real biological images.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11726699/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142969714","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junhang Cao, Jun Zhang, Qiyuan Yu, Junkai Ji, Jianqiang Li, Shan He, Zexuan Zhu
{"title":"TG-CDDPM: text-guided antimicrobial peptides generation based on conditional denoising diffusion probabilistic model.","authors":"Junhang Cao, Jun Zhang, Qiyuan Yu, Junkai Ji, Jianqiang Li, Shan He, Zexuan Zhu","doi":"10.1093/bib/bbae644","DOIUrl":"10.1093/bib/bbae644","url":null,"abstract":"<p><p>Antimicrobial peptides (AMPs) have emerged as a promising substitution to antibiotics thanks to their boarder range of activities, less likelihood of drug resistance, and low toxicity. Traditional biochemical methods for AMP discovery are costly and inefficient. Deep generative models, including the long-short term memory model, variational autoencoder model, and generative adversarial model, have been widely introduced to expedite AMP discovery. However, these models tend to suffer from the lack of diversity in generating AMPs. The denoising diffusion probabilistic model serves as a good candidate for solving this issue. We proposed a three-stage Text-Guided Conditional Denoising Diffusion Probabilistic Model (TG-CDDPM) to generate novel and homologous AMPs. In the first two stages, contrastive learning and inferring models are crafted to create better conditions for guiding AMP generation, respectively. In the last stage, a pre-trained conditional denoising diffusion probabilistic model is leveraged to enrich the peptide knowledge and fine-tuned to learn feature representation in downstream. TG-CDDPM was compared to the state-of-the-art generative models for AMP generation, and it demonstrated competitive or better performance with the assistance of text description as supervised information. The membrane penetration capabilities of the identified candidate AMPs by TG-CDDPM were also validated through molecular weight dynamics experiments.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11637771/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142817187","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork embedding for protein complex identification.","authors":"Shilong Wang, Hai Cui, Yanchen Qu, Yijia Zhang","doi":"10.1093/bib/bbae718","DOIUrl":"10.1093/bib/bbae718","url":null,"abstract":"<p><p>Identifying biologically significant protein complexes from protein-protein interaction (PPI) networks and understanding their roles are essential for elucidating protein functions, life processes, and disease mechanisms. Current methods typically rely on static PPI networks and model PPI data as pairwise relationships, which presents several limitations. Firstly, static PPI networks do not adequately represent the scopes and temporal dynamics of protein interactions. Secondly, a large amount of available biological resources have not been fully integrated. Moreover, PPIs in biological systems are not merely one-to-one relationships but involve higher order non-pairwise interactions. To alleviate these issues, we propose HGST, a multi-source biological knowledge-guided hypergraph spatiotemporal subnetwork (subnet) embedding method for identifying biologically significant protein complexes from PPI networks. HGST initially constructs spatiotemporal PPI subnets using the scopes and temporal dynamics of proteins derived from multi-source biological knowledge, treating them as dynamic networks through fine-grained spatiotemporal partitioning. The spatiotemporal subnets are then transformed into hypergraphs, which model higher order non-pairwise relationships via hypergraph embedding. Simultaneously, fine-grained amino acid sequence features and coarse-grained gene ontology attributes are introduced for multi-dimensional feature fusion. Finally, protein complexes are identified from the reweighted subnets based on fused feature representations using the core-attachment strategy. Evaluations on four real PPI datasets demonstrate that HGST achieves competitive performance. Furthermore, a series of biological analyses confirm the high biological significance of the complexes identified by HGST. The source code is available at https://github.com/qifen37/HGST.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11735048/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000354","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"scGO: interpretable deep neural network for cell status annotation and disease diagnosis.","authors":"You Wu, Pengfei Xu, Liyuan Wang, Shuai Liu, Yingnan Hou, Hui Lu, Peng Hu, Xiaofei Li, Xiang Yu","doi":"10.1093/bib/bbaf018","DOIUrl":"10.1093/bib/bbaf018","url":null,"abstract":"<p><p>Machine learning has emerged as a transformative tool for elucidating cellular heterogeneity in single-cell RNA sequencing. However, a significant challenge lies in the \"black box\" nature of deep learning models, which obscures the decision-making process and limits interpretability in cell status annotation. In this study, we introduced scGO, a Gene Ontology (GO)-inspired deep learning framework designed to provide interpretable cell status annotation for scRNA-seq data. scGO employs sparse neural networks to leverage the intrinsic biological relationships among genes, transcription factors, and GO terms, significantly augmenting interpretability and reducing computational cost. scGO outperforms state-of-the-art methods in the precise characterization of cell subtypes across diverse datasets. Our extensive experimentation across a spectrum of scRNA-seq datasets underscored the remarkable efficacy of scGO in disease diagnosis, prediction of developmental stages, and evaluation of disease severity and cellular senescence status. Furthermore, we incorporated in silico individual gene manipulations into the scGO model, introducing an additional layer for discovering therapeutic targets. Our results provide an interpretable model for accurately annotating cell status, capturing latent biological knowledge, and informing clinical practice.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11737892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143000433","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Classification-based pathway analysis using GPNet with novel P-value computation.","authors":"Hao Lu, Mostafa Rezapour, Haseebullah Baha, Muhammad Khalid Khan Niazi, Aarthi Narayanan, Metin Nafi Gurcan","doi":"10.1093/bib/bbaf039","DOIUrl":"10.1093/bib/bbaf039","url":null,"abstract":"<p><p>Pathway analysis plays a critical role in bioinformatics, enabling researchers to identify biological pathways associated with various conditions by analyzing gene expression data. However, the rise of large, multi-center datasets has highlighted limitations in traditional methods like Over-Representation Analysis (ORA) and Functional Class Scoring (FCS), which struggle with low signal-to-noise ratios (SNR) and large sample sizes. To tackle these challenges, we use a deep learning-based classification method, Gene PointNet, and a novel $P$-value computation approach leveraging the confusion matrix to address pathway analysis tasks. We validated our method effectiveness through a comparative study using a simulated dataset and RNA-Seq data from The Cancer Genome Atlas breast cancer dataset. Our method was benchmarked against traditional techniques (ORA, FCS), shallow machine learning models (logistic regression, support vector machine), and deep learning approaches (DeepHisCom, PASNet). The results demonstrate that GPNet outperforms these methods in low-SNR, large-sample datasets, where it remains robust and reliable, significantly reducing both Type I error and improving power. This makes our method well suited for pathway analysis in large, multi-center studies. The code can be found at https://github.com/haolu123/GPNet_pathway\">https://github.com/haolu123/GPNet_pathway.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11775473/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143063819","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Correction to: HHOMR: a hybrid high-order moment residual model for miRNA-disease association prediction.","authors":"","doi":"10.1093/bib/bbae684","DOIUrl":"10.1093/bib/bbae684","url":null,"abstract":"","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":"26 1","pages":""},"PeriodicalIF":6.8,"publicationDate":"2024-11-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11649758/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142833836","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}