Haochen Ning, Ian Boyes, Ibrahim Numanagić, Michael Rott, Li Xing, Xuekui Zhang
{"title":"Diagnostics of viral infections using high-throughput genome sequencing data.","authors":"Haochen Ning, Ian Boyes, Ibrahim Numanagić, Michael Rott, Li Xing, Xuekui Zhang","doi":"10.1093/bib/bbae501","DOIUrl":"https://doi.org/10.1093/bib/bbae501","url":null,"abstract":"<p><p>Plant viral infections cause significant economic losses, totalling $350 billion USD in 2021. With no treatment for virus-infected plants, accurate and efficient diagnosis is crucial to preventing and controlling these diseases. High-throughput sequencing (HTS) enables cost-efficient identification of known and unknown viruses. However, existing diagnostic pipelines face challenges. First, many methods depend on subjectively chosen parameter values, undermining their robustness across various data sources. Second, artifacts (e.g. false peaks) in the mapped sequence data can lead to incorrect diagnostic results. While some methods require manual or subjective verification to address these artifacts, others overlook them entirely, affecting the overall method performance and leading to imprecise or labour-intensive outcomes. To address these challenges, we introduce IIMI, a new automated analysis pipeline using machine learning to diagnose infections from 1583 plant viruses with HTS data. It adopts a data-driven approach for parameter selection, reducing subjectivity, and automatically filters out regions affected by artifacts, thus improving accuracy. Testing with in-house and published data shows IIMI's superiority over existing methods. Besides a prediction model, IIMI also provides resources on plant virus genomes, including annotations of regions prone to artifacts. The method is available as an R package (iimi) on CRAN and will integrate with the web application www.virtool.ca, enhancing accessibility and user convenience.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483527/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142486027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"HMPA: a pioneering framework for the noncanonical peptidome from discovery to functional insights.","authors":"Xinwan Su, Chengyu Shi, Fangzhou Liu, Manman Tan, Ying Wang, Linyu Zhu, Yu Chen, Meng Yu, Xinyi Wang, Jian Liu, Yang Liu, Weiqiang Lin, Zhaoyuan Fang, Qiang Sun, Tianhua Zhou, Aifu Lin","doi":"10.1093/bib/bbae510","DOIUrl":"https://doi.org/10.1093/bib/bbae510","url":null,"abstract":"<p><p>Advancements in peptidomics have revealed numerous small open reading frames with coding potential and revealed that some of these micropeptides are closely related to human cancer. However, the systematic analysis and integration from sequence to structure and function remains largely undeveloped. Here, as a solution, we built a workflow for the collection and analysis of proteomic data, transcriptomic data, and clinical outcomes for cancer-associated micropeptides using publicly available datasets from large cohorts. We initially identified 19 586 novel micropeptides by reanalyzing proteomic profile data from 3753 samples across 8 cancer types. Further quantitative analysis of these micropeptides, along with associated clinical data, identified 3065 that were dysregulated in cancer, with 370 of them showing a strong association with prognosis. Moreover, we employed a deep learning framework to construct a micropeptide-protein interaction network for further bioinformatics analysis, revealing that micropeptides are involved in multiple biological processes as bioactive molecules. Taken together, our atlas provides a benchmark for high-throughput prediction and functional exploration of micropeptides, providing new insights into their biological mechanisms in cancer. The HMPA is freely available at http://hmpa.zju.edu.cn.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483136/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458374","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Matthijs Vynck, Wim Trypsteen, Olivier Thas, Jo Vandesompele, Ward De Spiegelaere
{"title":"Digital PCR threshold robustness analysis and optimization using dipcensR.","authors":"Matthijs Vynck, Wim Trypsteen, Olivier Thas, Jo Vandesompele, Ward De Spiegelaere","doi":"10.1093/bib/bbae507","DOIUrl":"https://doi.org/10.1093/bib/bbae507","url":null,"abstract":"<p><p>Digital polymerase chain reaction (dPCR) is a best-in-class molecular biology technique for the accurate and precise quantification of nucleic acids. The recent maturation of dPCR technology allows the quantification of up to thousands of targeted nucleic acids per instrument per day. A key step in the dPCR data analysis workflow is the classification of partitions into two classes based on their partition intensities: partitions either containing or lacking target nucleic acids of interest. Much effort has been invested in the design and tailoring of automated dPCR partition classification procedures, and such procedures will be increasingly important as the technology ventures into high-throughput applications. However, automated partition classification is not fail-safe, and evaluation of its accuracy is highly advised. This accuracy evaluation is a manual endeavor and is becoming a bottleneck for high-throughput dPCR applications. Here, we introduce dipcensR, the first data-analysis procedure that automates the assessment of any linear partition classifier's partition classification accuracy, offering potentially substantial efficiency gains. dipcensR is based on a robustness evaluation of said partition classification and flags classifications with low robustness as needing review. Additionally, dipcensR's robustness analysis underpins (optional) automatic optimization of partition classification to achieve maximal robustness. A freely available R implementation supports dipcensR's use.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11472245/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458371","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.","authors":"Peihao Bai, Guanghui Li, Jiawei Luo, Cheng Liang","doi":"10.1093/bib/bbae568","DOIUrl":"https://doi.org/10.1093/bib/bbae568","url":null,"abstract":"<p><p>The functional study of proteins is a critical task in modern biology, playing a pivotal role in understanding the mechanisms of pathogenesis, developing new drugs, and discovering novel drug targets. However, existing computational models for subcellular localization face significant challenges, such as reliance on known Gene Ontology (GO) annotation databases or overlooking the relationship between GO annotations and subcellular localization. To address these issues, we propose DeepMTC, an end-to-end deep learning-based multi-task collaborative training model. DeepMTC integrates the interrelationship between subcellular localization and the functional annotation of proteins, leveraging multi-task collaborative training to eliminate dependence on known GO databases. This strategy gives DeepMTC a distinct advantage in predicting newly discovered proteins without prior functional annotations. First, DeepMTC leverages pre-trained language model with high accuracy to obtain the 3D structure and sequence features of proteins. Additionally, it employs a graph transformer module to encode protein sequence features, addressing the problem of long-range dependencies in graph neural networks. Finally, DeepMTC uses a functional cross-attention mechanism to efficiently combine upstream learned functional features to perform the subcellular localization task. The experimental results demonstrate that DeepMTC outperforms state-of-the-art models in both protein function prediction and subcellular localization. Moreover, interpretability experiments revealed that DeepMTC can accurately identify the key residues and functional domains of proteins, confirming its superior performance. The code and dataset of DeepMTC are freely available at https://github.com/ghli16/DeepMTC.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567262","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sébastien De Landtsheer, Apurva Badkas, Dagmar Kulms, Thomas Sauter
{"title":"Model ensembling as a tool to form interpretable multi-omic predictors of cancer pharmacosensitivity.","authors":"Sébastien De Landtsheer, Apurva Badkas, Dagmar Kulms, Thomas Sauter","doi":"10.1093/bib/bbae567","DOIUrl":"https://doi.org/10.1093/bib/bbae567","url":null,"abstract":"<p><p>Stratification of patients diagnosed with cancer has become a major goal in personalized oncology. One important aspect is the accurate prediction of the response to various drugs. It is expected that the molecular characteristics of the cancer cells contain enough information to retrieve specific signatures, allowing for accurate predictions based solely on these multi-omic data. Ideally, these predictions should be explainable to clinicians, in order to be integrated in the patients care. We propose a machine-learning framework based on ensemble learning to integrate multi-omic data and predict sensitivity to an array of commonly used and experimental compounds, including chemotoxic compounds and targeted kinase inhibitors. We trained a set of classifiers on the different parts of our dataset to produce omic-specific signatures, then trained a random forest classifier on these signatures to predict drug responsiveness. We used the Cancer Cell Line Encyclopedia dataset, comprising multi-omic and drug sensitivity measurements for hundreds of cell lines, to build the predictive models, and validated the results using nested cross-validation. Our results show good performance for several compounds (Area under the Receiver-Operating Curve >79%) across the most frequent cancer types. Furthermore, the simplicity of our approach allows to examine which omic layers have a greater importance in the models and identify new putative markers of drug responsiveness. We propose several models based on small subsets of transcriptional markers with the potential to become useful tools in personalized oncology, paving the way for clinicians to use the molecular characteristics of the tumors to predict sensitivity to therapeutic compounds.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142567268","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"PathMethy: an interpretable AI framework for cancer origin tracing based on DNA methylation.","authors":"Jiajing Xie, Yuhang Song, Hailong Zheng, Shijie Luo, Ying Chen, Chen Zhang, Rongshan Yu, Mengsha Tong","doi":"10.1093/bib/bbae497","DOIUrl":"10.1093/bib/bbae497","url":null,"abstract":"<p><p>Despite advanced diagnostics, 3%-5% of cases remain classified as cancer of unknown primary (CUP). DNA methylation, an important epigenetic feature, is essential for determining the origin of metastatic tumors. We presented PathMethy, a novel Transformer model integrated with functional categories and crosstalk of pathways, to accurately trace the origin of tumors in CUP samples based on DNA methylation. PathMethy outperformed seven competing methods in F1-score across nine cancer datasets and predicted accurately the molecular subtypes within nine primary tumor types. It not only excelled at tracing the origins of both primary and metastatic tumors but also demonstrated a high degree of agreement with previously diagnosed sites in cases of CUP. PathMethy provided biological insights by highlighting key pathways, functional categories, and their interactions. Using functional categories of pathways, we gained a global understanding of biological processes. For broader access, a user-friendly web server for researchers and clinicians is available at https://cup.pathmethy.com.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11467402/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142399351","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Mediation analysis in longitudinal study with high-dimensional methylation mediators.","authors":"Yidan Cui, Qingmin Lin, Xin Yuan, Fan Jiang, Shiyang Ma, Zhangsheng Yu","doi":"10.1093/bib/bbae496","DOIUrl":"https://doi.org/10.1093/bib/bbae496","url":null,"abstract":"<p><p>Mediation analysis has been widely utilized to identify potential pathways connecting exposures and outcomes. However, there remains a lack of analytical methods for high-dimensional mediation analysis in longitudinal data. To tackle this concern, we proposed an effective and novel approach with variable selection and the indirect effect (IE) assessment based on both linear mixed-effect model and generalized estimating equation. Initially, we employ sure independence screening to reduce the dimension of candidate mediators. Subsequently, we implement the Sobel test with the Bonferroni correction for IE hypothesis testing. Through extensive simulation studies, we demonstrate the performance of our proposed procedure with a higher F$_{1}$ score (0.8056 and 0.9983 at sample sizes of 150 and 500, respectively) compared with the linear method (0.7779 and 0.9642 at the same sample sizes), along with more accurate parameter estimation and a significantly lower false discovery rate. Moreover, we apply our methodology to explore the mediation mechanisms involving over 730 000 DNA methylation sites with potential effects between the paternal body mass index (BMI) and offspring growing BMI in the Shanghai sleeping birth cohort data, leading to the identification of two previously undiscovered mediating CpG sites.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11479716/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Sheng Liu, Hye Seung Nam, Ziyu Zeng, Xuehong Deng, Elnaz Pashaei, Yong Zang, Lei Yang, Chenglong Li, Jiaoti Huang, Michael K Wendt, Xin Lu, Rong Huang, Jun Wan
{"title":"CDHu40: a novel marker gene set of neuroendocrine prostate cancer.","authors":"Sheng Liu, Hye Seung Nam, Ziyu Zeng, Xuehong Deng, Elnaz Pashaei, Yong Zang, Lei Yang, Chenglong Li, Jiaoti Huang, Michael K Wendt, Xin Lu, Rong Huang, Jun Wan","doi":"10.1093/bib/bbae471","DOIUrl":"10.1093/bib/bbae471","url":null,"abstract":"<p><p>Prostate cancer (PCa) is the most prevalent cancer affecting American men. Castration-resistant prostate cancer (CRPC) can emerge during hormone therapy for PCa, manifesting with elevated serum prostate-specific antigen levels, continued disease progression, and/or metastasis to the new sites, resulting in a poor prognosis. A subset of CRPC patients shows a neuroendocrine (NE) phenotype, signifying reduced or no reliance on androgen receptor signaling and a particularly unfavorable prognosis. In this study, we incorporated computational approaches based on both gene expression profiles and protein-protein interaction networks. We identified 500 potential marker genes, which are significantly enriched in cell cycle and neuronal processes. The top 40 candidates, collectively named CDHu40, demonstrated superior performance in distinguishing NE PCa (NEPC) and non-NEPC samples based on gene expression profiles. CDHu40 outperformed most of the other published marker sets, excelling particularly at the prognostic level. Notably, some marker genes in CDHu40, absent in the other marker sets, have been reported to be associated with NEPC in the literature, such as DDC, FOLH1, BEX1, MAST1, and CACNA1A. Importantly, elevated CDHu40 scores derived from our predictive model showed a robust correlation with unfavorable survival outcomes in patients, indicating the potential of the CDHu40 score as a promising indicator for predicting the survival prognosis of those patients with the NE phenotype. Motif enrichment analysis on the top candidates suggests that REST and E2F6 may serve as key regulators in the NEPC progression.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11422505/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142341934","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junxin Li, Linbu Liao, Chao Zhang, Kaifang Huang, Pengfei Zhang, John Z H Zhang, Xiaochun Wan, Haiping Zhang
{"title":"Development and experimental validation of computational methods for human antibody affinity enhancement.","authors":"Junxin Li, Linbu Liao, Chao Zhang, Kaifang Huang, Pengfei Zhang, John Z H Zhang, Xiaochun Wan, Haiping Zhang","doi":"10.1093/bib/bbae488","DOIUrl":"10.1093/bib/bbae488","url":null,"abstract":"<p><p>High affinity is crucial for the efficacy and specificity of antibody. Due to involving high-throughput screens, biological experiments for antibody affinity maturation are time-consuming and have a low success rate. Precise computational-assisted antibody design promises to accelerate this process, but there is still a lack of effective computational methods capable of pinpointing beneficial mutations within the complementarity-determining region (CDR) of antibodies. Moreover, random mutations often lead to challenges in antibody expression and immunogenicity. In this study, to enhance the affinity of a human antibody against avian influenza virus, a CDR library was constructed and evolutionary information was acquired through sequence alignment to restrict the mutation positions and types. Concurrently, a statistical potential methodology was developed based on amino acid interactions between antibodies and antigens to calculate potential affinity-enhanced antibodies, which were further subjected to molecular dynamics simulations. Subsequently, experimental validation confirmed that a point mutation enhancing 2.5-fold affinity was obtained from 10 designs, resulting in the antibody affinity of 2 nM. A predictive model for antibody-antigen interactions based on the binding interface was also developed, achieving an Area Under the Curve (AUC) of 0.83 and a precision of 0.89 on the test set. Lastly, a novel approach involving combinations of affinity-enhancing mutations and an iterative mutation optimization scheme similar to the Monte Carlo method were proposed. This study presents computational methods that rapidly and accurately enhance antibody affinity, addressing issues related to antibody expression and immunogenicity.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11446602/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142364431","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Reconstructing tumor clonal heterogeneity and evolutionary relationships based on tumor DNA sequencing data.","authors":"Zhen Wang, Yanhua Fang, Ruoyu Wang, Liwen Kong, Shanshan Liang, Shuai Tao","doi":"10.1093/bib/bbae516","DOIUrl":"https://doi.org/10.1093/bib/bbae516","url":null,"abstract":"<p><p>The heterogeneity of tumor clones drives the selection and evolution of distinct tumor cell populations, resulting in an intricate and dynamic tumor evolution process. While tumor bulk DNA sequencing helps elucidate intratumor heterogeneity, challenges such as the misidentification of mutation multiplicity due to copy number variations and uncertainties in the reconstruction process hinder the accurate inference of tumor evolution. In this study, we introduce a novel approach, REconstructing Tumor Clonal Heterogeneity and Evolutionary Relationships (RETCHER), which characterizes more realistic cancer cell fractions by accurately identifying mutation multiplicity while considering uncertainty during the reconstruction process and the credibility and reasonableness of subclone clustering. This method comprehensively and accurately infers multiple forms of tumor clonal heterogeneity and phylogenetic relationships. RETCHER outperforms existing methods on simulated data and infers clearer subclone structures and evolutionary relationships in real multisample sequencing data from five tumor types. By precisely analysing the complex clonal heterogeneity within tumors, RETCHER provides a new approach to tumor evolution research and offers scientific evidence for developing precise and personalized treatment strategies. This approach is expected to play a significant role in tumor evolution research, clinical diagnosis, and treatment. RETCHER is available for free at https://github.com/zlsys3/RETCHER.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11483135/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142458391","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}