Line Sandvad Nielsen, Anders Gorm Pedersen, Ole Winther, Henrik Nielsen
{"title":"NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model.","authors":"Line Sandvad Nielsen, Anders Gorm Pedersen, Ole Winther, Henrik Nielsen","doi":"10.1186/s12859-025-06220-2","DOIUrl":"10.1186/s12859-025-06220-2","url":null,"abstract":"<p><strong>Background: </strong>Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5[Formula: see text] end and the local start codon context. Translation initiation sites mark the transition from non-coding to coding regions. This fact motivates the expectation that the upstream sequence, if translated, would assemble a nonsensical order of amino acids, while the downstream sequence would correspond to the structured beginning of a protein. This distinction suggests potential for predicting translation initiation sites using a protein language model.</p><p><strong>Results: </strong>We present NetStart 2.0, a deep learning-based model that integrates the ESM-2 protein language model with the local sequence context to predict translation initiation sites across a broad range of eukaryotic species. NetStart 2.0 was trained as a single model across multiple species, and despite the broad phylogenetic diversity represented in the training data, it consistently relied on features marking the transition from non-coding to coding regions.</p><p><strong>Conclusion: </strong>By leveraging \"protein-ness\", NetStart 2.0 achieves state-of-the-art performance in predicting translation initiation sites across a diverse range of eukaryotic species. This success underscores the potential of protein language models to bridge transcript- and peptide-level information in complex biological prediction tasks. The NetStart 2.0 webserver is available at: https://services.healthtech.dtu.dk/services/NetStart-2.0/ .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"216"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GeneRiskCalc: a web-based tool for genetic risk association analysis in case-control studies.","authors":"Amrit Sudershan, Kuljeet Singh, Parvinder Kumar","doi":"10.1186/s12859-025-06207-z","DOIUrl":"10.1186/s12859-025-06207-z","url":null,"abstract":"<p><strong>Background: </strong>Genetic association studies play a pivotal role in identifying disease-associated variants, but researchers face challenges in performing essential calculations like Hardy-Weinberg equilibrium testing, odds ratios, and confidence intervals due to reliance on manual methods or multiple software tools. We aimed to develop GeneRiskCalc, an integrated web-based platform that simplifies genetic association analysis by automating Hardy-Weinberg equilibrium assessment, odds ratios with confidence interval calculation, and visual data presentation in case-control studies. Using an HTML/CSS/JavaScript framework, we developed online software with three core functionalities: (1) automated HWE evaluation, (2) odds ratio with 95% confidence interval computation with statistical validation, and (3) dynamic Forest Plot generation for data visualization. The tool was designed with an intuitive interface to minimize prerequisite statistical expertise.</p><p><strong>Results: </strong>The tool, named the Genetic Risk Association Calculator (GeneRiskCalc), demonstrated high computational accuracy in HWE testing (χ<sup>2</sup> validation) and association metrics (odds ratio and confidence interval). The results were cross-validated against established statistical methods, confirming their reliability. Furthermore, the integrated Forest Plotter enabled immediate visualization of effect sizes across multiple genetic models, facilitating a comprehensive interpretation of genetic associations.</p><p><strong>Conclusion: </strong>By integrating essential analytical steps into a single platform, the GeneRiskCalc, streamlines genetic epidemiology workflows, addressing key challenges in data analysis. Its user-friendly interface enhances accessibility, promotes reproducibility, and accelerates research in genetic association studies. The tool is freely available at GeneRiskCalc ( https://sites.google.com/view/GeneRiskCalc/home?authuser=0 ).</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"213"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12363000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel Bowen-James, Weilin Wu, Marie Wong-Erasmus, Julian M W Quinn, Chelsea Mayoh, Mark J Cowley
{"title":"consHLA: a next generation sequencing consensus-based HLA typing workflow.","authors":"Rachel Bowen-James, Weilin Wu, Marie Wong-Erasmus, Julian M W Quinn, Chelsea Mayoh, Mark J Cowley","doi":"10.1186/s12859-025-06223-z","DOIUrl":"10.1186/s12859-025-06223-z","url":null,"abstract":"<p><strong>Background: </strong>Human Leukocyte Antigens (HLA) play central roles in histocompatibility and immune system functions, including antigen presentation. Accurate typing of Class I and II HLA genes is crucial for transplant tissue matching, characterising autoimmune diseases and informing cancer immunotherapy. Clinical serology and PCR-based testing are the gold standards for HLA typing, but offer only single-field resolution (e.g., HLA-A*11). Whole genome sequencing (WGS) and RNA sequencing (RNA-seq) can achieve higher, three-field resolution (e.g., HLA-A∗11:01:01), although some HLA genes can be challenging to type from sequencing data. With the increasing use of germline WGS, tumour WGS and tumour RNA-seq in cancer patient care, there is an opportunity to combine these three dataset types to improve HLA typing accuracy and confidence, and to identify clinically relevant HLA type changes in tumours. To achieve this, we developed consHLA, a tool that employs this consensus HLA typing approach.</p><p><strong>Results: </strong>We obtained matched germline and tumour WGS and RNA-seq data from 86 high-risk paediatric cancer patients (76 brain cancers, 10 leukaemias) from the ZERO Childhood Cancer precision medicine program. We examined 10 HLA typing packages, selecting HLA-HD to develop our consHLA workflow as HLA-HD can employ all three dataset types, analysing both Class I and II HLA genes at three field resolution. Using consHLA we achieved 97.9% concordance with gold standard HLA test results. We observed 90.5% allele consistency across the three sequencing NGS inputs. Typing inconsistencies in at least one of 12 clinically relevant HLA genes were observed in 29 of the brain tumour cases. 32% of these had clinically relevant explanations. To assist clinically, we implemented consHLA as a fully automated workflow producing a clinician-friendly HLA-typing report.</p><p><strong>Conclusions: </strong>To leverage cancer patient germline and tumour WGS and tumour RNA-seq data we developed an automated workflow, consHLA, that produces consensus typing of HLA genes in a clinically relevant timeframe. This workflow provides higher resolution patient HLA-typing than current gold standard approaches, identifies HLA alterations arising in patient tumours and generates clear, simple reports.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"215"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12363109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoencoders with shared and specific embeddings for multi-omics data integration.","authors":"Chao Wang, Michael J O'Connell","doi":"10.1186/s12859-025-06245-7","DOIUrl":"10.1186/s12859-025-06245-7","url":null,"abstract":"<p><strong>Background: </strong>In cancer research, different levels of high-dimensional data are often collected for the same subjects. Effective integration of these data by considering the shared and specific information from each data source can help us better understand different types of cancer.</p><p><strong>Results: </strong>In this study we propose a novel autoencoder (AE) structure with explicitly defined orthogonal loss between the shared and specific embeddings to integrate different data sources. We compare our model with previously proposed AE structures based on simulated data and real cancer data from The Cancer Genome Atlas. Using simulations with different proportions of differentially expressed genes, we compare the performance of AE methods for subsequent classification tasks. We also compare the model performance with a commonly used dimension reduction method, joint and individual variance explained (JIVE). In terms of reconstruction loss, our proposed AE models with orthogonal constraints have a slightly better reconstruction loss. All AE models achieve higher classification accuracy than the original features, demonstrating the usefulness of the embeddings extracted by the model.</p><p><strong>Conclusions: </strong>We show that the proposed models have consistently high classification accuracy on both training and testing sets. In comparison, the recently proposed MOCSS model that imposes an orthogonality penalty in the post-processing step has lower classification accuracy that is on par with JIVE.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"214"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanya Golubchik, Lucie Abeler-Dörner, Matthew Hall, Chris Wymant, David Bonsall, George Macintyre-Cockett, Laura Thomson, Jared M Baeten, Connie L Celum, Ronald M Galiwango, Barry Kosloff, Mohammed Limbada, Andrew Mujugira, Nelly R Mugo, Astrid Gall, François Blanquart, Margreet Bakker, Daniela Bezemer, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Barbara Gunsenheimer-Bartmeyer, Huldrych F Günthard, Pia Kivelä, Roger D Kouyos, Laurence Meyer, Kholoud Porter, Ard van Sighem, Mark van der Valk, Ben Berkhout, Paul Kellam, Marion Cornelissen, Peter Reiss, Helen Ayles, David N Burns, Sarah Fidler, Mary Kate Grabowski, Richard Hayes, Joshua T Herbeck, Joseph Kagaayi, Pontiano Kaleebu, Jairam R Lingappa, Deogratius Ssemwanga, Susan H Eshleman, Myron S Cohen, Oliver Ratmann, Oliver Laeyendecker, Christophe Fraser
{"title":"HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data.","authors":"Tanya Golubchik, Lucie Abeler-Dörner, Matthew Hall, Chris Wymant, David Bonsall, George Macintyre-Cockett, Laura Thomson, Jared M Baeten, Connie L Celum, Ronald M Galiwango, Barry Kosloff, Mohammed Limbada, Andrew Mujugira, Nelly R Mugo, Astrid Gall, François Blanquart, Margreet Bakker, Daniela Bezemer, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Barbara Gunsenheimer-Bartmeyer, Huldrych F Günthard, Pia Kivelä, Roger D Kouyos, Laurence Meyer, Kholoud Porter, Ard van Sighem, Mark van der Valk, Ben Berkhout, Paul Kellam, Marion Cornelissen, Peter Reiss, Helen Ayles, David N Burns, Sarah Fidler, Mary Kate Grabowski, Richard Hayes, Joshua T Herbeck, Joseph Kagaayi, Pontiano Kaleebu, Jairam R Lingappa, Deogratius Ssemwanga, Susan H Eshleman, Myron S Cohen, Oliver Ratmann, Oliver Laeyendecker, Christophe Fraser","doi":"10.1186/s12859-025-06189-y","DOIUrl":"10.1186/s12859-025-06189-y","url":null,"abstract":"<p><strong>Background: </strong>Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess the cumulative impact of an intervention.</p><p><strong>Results: </strong>We developed a Random Forest Regression model, HIV-phyloTSI, which combines measures of within-host diversity and divergence to generate continuous TSI estimates directly from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performs equally well for all major HIV subtypes based on data from African and European cohorts.</p><p><strong>Conclusions: </strong>We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"212"},"PeriodicalIF":3.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351810/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144854398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tobias Olenyi, Constantin Carl, Tobias Senoner, Ivan Koludarov, Burkhard Rost
{"title":"FlatProt: 2D visualization eases protein structure comparison.","authors":"Tobias Olenyi, Constantin Carl, Tobias Senoner, Ivan Koludarov, Burkhard Rost","doi":"10.1186/s12859-025-06233-x","DOIUrl":"10.1186/s12859-025-06233-x","url":null,"abstract":"<p><strong>Background: </strong>Understanding and comparing three-dimensional (3D) structures of proteins can advance bioinformatics, molecular biology, and drug discovery. While 3D models offer detailed insights, comparing multiple structures simultaneously remains challenging, especially on two-dimensional (2D) displays. Existing 2D visualization tools lack standardized approaches for pipelined inspection of large protein sets, limiting their utility in large-scale pre-filtering.</p><p><strong>Results: </strong>We introduce FlatProt, a tool designed to complement 3D viewers by enabling standardized 2D visualization of individual protein structures or large sets thereof. By including Foldseek-based family rotation alignment or an inertia-based fallback, FlatProt creates consistent and scalable visual representations for user-defined protein structures. It supports domain-aware decomposition, family-level overlays, and lightweight visual abstraction of secondary structures. FlatProt processes proteins efficiently, as showcased on a subset of the human-proteome.</p><p><strong>Conclusion: </strong>FlatProt provides clear, consistent, user-friendly visualizations that support rapid, comparative inspection of protein structures at scale. By bridging the gap between interactive 3D tools and static visual summaries, it enables users to explore conserved features, detect outliers, and prioritize structures for further analysis.</p><p><strong>Availability: </strong>GitHub ( https://github.com/t03i/FlatProt ); Zenodo ( https://doi.org/10.5281/zenodo.15697296 ).</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"210"},"PeriodicalIF":3.3,"publicationDate":"2025-08-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12344939/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144844329","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Wei Zhang, Zeqi Xu, Ruochen Yu, Mingfeng Jiang, Qi Dai
{"title":"DualGCN-GE: integration of spatiotemporal representations from whole-blood expression data with dual-view graph convolution network to identify Parkinson's disease subtypes.","authors":"Wei Zhang, Zeqi Xu, Ruochen Yu, Mingfeng Jiang, Qi Dai","doi":"10.1186/s12859-025-06181-6","DOIUrl":"10.1186/s12859-025-06181-6","url":null,"abstract":"<p><strong>Background: </strong>As a typical type of neurodegenerative disorders, Parkinson's disease(PD) is characterized by significant clinical and progression heterogeneity. Based on gene expression data, reliable detection of PACE subtypes in Parkinson's disease(PD-PACE) has played a crucial role in addressing the heterogeneity of this disease. Established machine learning approaches generally adopt single-view learning schemes and employ temporal features underlying RNA sequencing data. Topological features, which are associated with gene graphs and cell graphs, were disregarded in previous works. Actually, Parkinson-specific gene graphs(PGG) could act as topological features to capture structural changes of molecular networks.</p><p><strong>Results: </strong>Under the framework of dual-view graph learning, this study proposes a DualGCN-GE method to identify multiple PD-PACE subtypes from whole-blood expression data, with regards of progression velocity. This DualGCN-GE method has proposed dual-view graph convolution network(GCN) to integrate temporal and topological features underlying whole-blood expression data, thus detecting PD-PACE subtypes. Experimental analysis of three benchmark datasets has validated the effectiveness and advantage of the DualGCN-GE method in the disease subtype detection task.</p><p><strong>Conclusion: </strong>For gene expression data of human blood samples, topological features have encoded unique information that are absent in temporal features. Using a collaborative fusion strategy, spatio-temporal representations extracted from whole blood expression data have improved accuracy and reliability in detecting PD-PACE subtypes.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"208"},"PeriodicalIF":3.3,"publicationDate":"2025-08-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341084/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144833847","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}