Césaire J K Fouodo, Marina Bleskina, Silke Szymczak
{"title":"fuseMLR: an R package for integrative prediction modeling of multi-omics data.","authors":"Césaire J K Fouodo, Marina Bleskina, Silke Szymczak","doi":"10.1186/s12859-025-06248-4","DOIUrl":"https://doi.org/10.1186/s12859-025-06248-4","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"221"},"PeriodicalIF":3.3,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12382258/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Calciumnetexplorer: an R package for network analysis of calcium imaging data.","authors":"Simone Lenci, Dirk Sieger","doi":"10.1186/s12859-025-06206-0","DOIUrl":"https://doi.org/10.1186/s12859-025-06206-0","url":null,"abstract":"<p><strong>Background: </strong>Analyzing calcium imaging data to understand complex functional networks can be challenging, often requiring multiple tools, custom scripts, and some coding expertise. To address these challenges, we present CalciumNetExploreR (CNER), an R package designed to streamline and standardize the analysis of time-series data from neuronal populations.</p><p><strong>Results: </strong>CNER integrates essential steps-normalization, binarization, population activity visualization, network construction, degree distribution analysis, principal component analysis, power spectral density evaluation, and event frequency calculations-into a single, cohesive pipeline. This comprehensive approach enables users to efficiently extract and compare network metrics, including clustering coefficients, global efficiency, community structures, and principal component variances. By offering a flexible and customizable framework, CNER simplifies the examination of functional connectivity and network topology, effectively providing the means to characterize a cellular functional network or analogous structures in other modalities.</p><p><strong>Conclusion: </strong>Designed as a user-friendly package, CNER allows both experimental and computational neuroscientists to incorporate robust statistical and graphical analyses into their workflows without extensive coding knowledge. By unifying key analytical components into one pipeline, CNER reduces barriers associated with large-scale data analyses, ultimately facilitating deeper insights into the functional organization and dynamic properties of neuronal networks across diverse recording techniques.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"220"},"PeriodicalIF":3.3,"publicationDate":"2025-08-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12379452/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941573","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Asier Ortega-Legarreta, Alberto Maillo, Daniel Mouzo, Ana Rosa López-Pérez, Lara Kular, Majid Pahlevan Kakhki, Maja Jagodic, Jesper Tegner, Vincenzo Lagani, Ewoud Ewing, David Gomez-Cabrero
{"title":"GeneSetCluster 2.0: a comprehensive toolset for summarizing and integrating gene-sets analysis.","authors":"Asier Ortega-Legarreta, Alberto Maillo, Daniel Mouzo, Ana Rosa López-Pérez, Lara Kular, Majid Pahlevan Kakhki, Maja Jagodic, Jesper Tegner, Vincenzo Lagani, Ewoud Ewing, David Gomez-Cabrero","doi":"10.1186/s12859-025-06249-3","DOIUrl":"https://doi.org/10.1186/s12859-025-06249-3","url":null,"abstract":"","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"219"},"PeriodicalIF":3.3,"publicationDate":"2025-08-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12372222/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941506","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"SV-MeCa: an XGBoost-based meta-caller approach for structural variant calling from short-read data.","authors":"Rudel Christian Nkouamedjo Fankep, Arda Söylev, Anna-Lena Kobiela, Jochen Blom, Corinna Ernst, Susanne Motameny","doi":"10.1186/s12859-025-06246-6","DOIUrl":"https://doi.org/10.1186/s12859-025-06246-6","url":null,"abstract":"<p><strong>Background: </strong>Calling structural variants (SVs), i.e., genomic alterations of ≥50bp, from whole genome short-read data remains challenging, as existing callers are known to lack accuracy and robustness. Therefore, meta-caller approaches combining the results of multiple standalone tools in a consensus set of reported SV calls, are widely used. Here, SV-MeCa (Structural Variant Meta-Caller) is presented, the first SV meta-caller incorporating variant-specific quality metrics from individual VCF outputs, rather than relying solely on number and combination of tools supporting consensus SV calls. In addition, SV-MeCa offers a suitable score to rank obtained consensus SV calls according to evidence of representing true positive calls, i.e., real-world variants.</p><p><strong>Results: </strong>SV-MeCa applies seven standalone SV callers and merges resulting deletion and insertion calls into a union VCF file using SURVIVOR. For each entry in the SURVIVOR-generated consensus, caller-specific quality measures are extracted from corresponding standalone VCF files, and serve as input for an either deletion- or insertion-specific XGBoost decision tree classifier, which was previously trained on the HG002 SV benchmark data provided by the Genome in a Bottle consortium. The SV-MeCa XGBoost models assign a probability to (consensus) SV calls to represent true positive calls, which can be used for ranking the final output according to evidence. Performance of SV-MeCa and four previously published meta-caller approaches were evaluated based on autosomal SV calls in samples curated by the Human Genome Structural Variation Consortium, Phase 2. With regard to F[Formula: see text] scores, which were 0.58 on average for deletions and 0.42 on average for insertions, SV-MeCa outperformed the other meta-callers. With regard to precision, only ConsensuSV achieved higher values (0.97 versus 0.64 on average for deletions, 0.75 versus 0.53 on average for insertions), and with regard to recall, SV-MeCa was outperformed exclusively by Meta-SV for deletions (0.55 versus 0.53).</p><p><strong>Conclusions: </strong>SV-MeCa, publicly available at https://github.com/ccfboc-bioinformatics/SV-MeCa , outperforms existing SV meta-caller approaches by taking variant-specific quality measures into account. Moreover, due to the XGBoost prediction probabilities serving as scores, the output of SV-MeCa can be continuously adjusted to user needs in terms of sensitivity and precision.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"218"},"PeriodicalIF":3.3,"publicationDate":"2025-08-20","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144941509","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Line Sandvad Nielsen, Anders Gorm Pedersen, Ole Winther, Henrik Nielsen
{"title":"NetStart 2.0: prediction of eukaryotic translation initiation sites using a protein language model.","authors":"Line Sandvad Nielsen, Anders Gorm Pedersen, Ole Winther, Henrik Nielsen","doi":"10.1186/s12859-025-06220-2","DOIUrl":"10.1186/s12859-025-06220-2","url":null,"abstract":"<p><strong>Background: </strong>Accurate identification of translation initiation sites is essential for the proper translation of mRNA into functional proteins. In eukaryotes, the choice of the translation initiation site is influenced by multiple factors, including its proximity to the 5[Formula: see text] end and the local start codon context. Translation initiation sites mark the transition from non-coding to coding regions. This fact motivates the expectation that the upstream sequence, if translated, would assemble a nonsensical order of amino acids, while the downstream sequence would correspond to the structured beginning of a protein. This distinction suggests potential for predicting translation initiation sites using a protein language model.</p><p><strong>Results: </strong>We present NetStart 2.0, a deep learning-based model that integrates the ESM-2 protein language model with the local sequence context to predict translation initiation sites across a broad range of eukaryotic species. NetStart 2.0 was trained as a single model across multiple species, and despite the broad phylogenetic diversity represented in the training data, it consistently relied on features marking the transition from non-coding to coding regions.</p><p><strong>Conclusion: </strong>By leveraging \"protein-ness\", NetStart 2.0 achieves state-of-the-art performance in predicting translation initiation sites across a diverse range of eukaryotic species. This success underscores the potential of protein language models to bridge transcript- and peptide-level information in complex biological prediction tasks. The NetStart 2.0 webserver is available at: https://services.healthtech.dtu.dk/services/NetStart-2.0/ .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"216"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12366053/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881993","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"GeneRiskCalc: a web-based tool for genetic risk association analysis in case-control studies.","authors":"Amrit Sudershan, Kuljeet Singh, Parvinder Kumar","doi":"10.1186/s12859-025-06207-z","DOIUrl":"10.1186/s12859-025-06207-z","url":null,"abstract":"<p><strong>Background: </strong>Genetic association studies play a pivotal role in identifying disease-associated variants, but researchers face challenges in performing essential calculations like Hardy-Weinberg equilibrium testing, odds ratios, and confidence intervals due to reliance on manual methods or multiple software tools. We aimed to develop GeneRiskCalc, an integrated web-based platform that simplifies genetic association analysis by automating Hardy-Weinberg equilibrium assessment, odds ratios with confidence interval calculation, and visual data presentation in case-control studies. Using an HTML/CSS/JavaScript framework, we developed online software with three core functionalities: (1) automated HWE evaluation, (2) odds ratio with 95% confidence interval computation with statistical validation, and (3) dynamic Forest Plot generation for data visualization. The tool was designed with an intuitive interface to minimize prerequisite statistical expertise.</p><p><strong>Results: </strong>The tool, named the Genetic Risk Association Calculator (GeneRiskCalc), demonstrated high computational accuracy in HWE testing (χ<sup>2</sup> validation) and association metrics (odds ratio and confidence interval). The results were cross-validated against established statistical methods, confirming their reliability. Furthermore, the integrated Forest Plotter enabled immediate visualization of effect sizes across multiple genetic models, facilitating a comprehensive interpretation of genetic associations.</p><p><strong>Conclusion: </strong>By integrating essential analytical steps into a single platform, the GeneRiskCalc, streamlines genetic epidemiology workflows, addressing key challenges in data analysis. Its user-friendly interface enhances accessibility, promotes reproducibility, and accelerates research in genetic association studies. The tool is freely available at GeneRiskCalc ( https://sites.google.com/view/GeneRiskCalc/home?authuser=0 ).</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"213"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12363000/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881992","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Rachel Bowen-James, Weilin Wu, Marie Wong-Erasmus, Julian M W Quinn, Chelsea Mayoh, Mark J Cowley
{"title":"consHLA: a next generation sequencing consensus-based HLA typing workflow.","authors":"Rachel Bowen-James, Weilin Wu, Marie Wong-Erasmus, Julian M W Quinn, Chelsea Mayoh, Mark J Cowley","doi":"10.1186/s12859-025-06223-z","DOIUrl":"10.1186/s12859-025-06223-z","url":null,"abstract":"<p><strong>Background: </strong>Human Leukocyte Antigens (HLA) play central roles in histocompatibility and immune system functions, including antigen presentation. Accurate typing of Class I and II HLA genes is crucial for transplant tissue matching, characterising autoimmune diseases and informing cancer immunotherapy. Clinical serology and PCR-based testing are the gold standards for HLA typing, but offer only single-field resolution (e.g., HLA-A*11). Whole genome sequencing (WGS) and RNA sequencing (RNA-seq) can achieve higher, three-field resolution (e.g., HLA-A∗11:01:01), although some HLA genes can be challenging to type from sequencing data. With the increasing use of germline WGS, tumour WGS and tumour RNA-seq in cancer patient care, there is an opportunity to combine these three dataset types to improve HLA typing accuracy and confidence, and to identify clinically relevant HLA type changes in tumours. To achieve this, we developed consHLA, a tool that employs this consensus HLA typing approach.</p><p><strong>Results: </strong>We obtained matched germline and tumour WGS and RNA-seq data from 86 high-risk paediatric cancer patients (76 brain cancers, 10 leukaemias) from the ZERO Childhood Cancer precision medicine program. We examined 10 HLA typing packages, selecting HLA-HD to develop our consHLA workflow as HLA-HD can employ all three dataset types, analysing both Class I and II HLA genes at three field resolution. Using consHLA we achieved 97.9% concordance with gold standard HLA test results. We observed 90.5% allele consistency across the three sequencing NGS inputs. Typing inconsistencies in at least one of 12 clinically relevant HLA genes were observed in 29 of the brain tumour cases. 32% of these had clinically relevant explanations. To assist clinically, we implemented consHLA as a fully automated workflow producing a clinician-friendly HLA-typing report.</p><p><strong>Conclusions: </strong>To leverage cancer patient germline and tumour WGS and tumour RNA-seq data we developed an automated workflow, consHLA, that produces consensus typing of HLA genes in a clinically relevant timeframe. This workflow provides higher resolution patient HLA-typing than current gold standard approaches, identifies HLA alterations arising in patient tumours and generates clear, simple reports.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"215"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12363109/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881991","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Autoencoders with shared and specific embeddings for multi-omics data integration.","authors":"Chao Wang, Michael J O'Connell","doi":"10.1186/s12859-025-06245-7","DOIUrl":"10.1186/s12859-025-06245-7","url":null,"abstract":"<p><strong>Background: </strong>In cancer research, different levels of high-dimensional data are often collected for the same subjects. Effective integration of these data by considering the shared and specific information from each data source can help us better understand different types of cancer.</p><p><strong>Results: </strong>In this study we propose a novel autoencoder (AE) structure with explicitly defined orthogonal loss between the shared and specific embeddings to integrate different data sources. We compare our model with previously proposed AE structures based on simulated data and real cancer data from The Cancer Genome Atlas. Using simulations with different proportions of differentially expressed genes, we compare the performance of AE methods for subsequent classification tasks. We also compare the model performance with a commonly used dimension reduction method, joint and individual variance explained (JIVE). In terms of reconstruction loss, our proposed AE models with orthogonal constraints have a slightly better reconstruction loss. All AE models achieve higher classification accuracy than the original features, demonstrating the usefulness of the embeddings extracted by the model.</p><p><strong>Conclusions: </strong>We show that the proposed models have consistently high classification accuracy on both training and testing sets. In comparison, the recently proposed MOCSS model that imposes an orthogonality penalty in the post-processing step has lower classification accuracy that is on par with JIVE.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"214"},"PeriodicalIF":3.3,"publicationDate":"2025-08-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12362917/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144881990","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Tanya Golubchik, Lucie Abeler-Dörner, Matthew Hall, Chris Wymant, David Bonsall, George Macintyre-Cockett, Laura Thomson, Jared M Baeten, Connie L Celum, Ronald M Galiwango, Barry Kosloff, Mohammed Limbada, Andrew Mujugira, Nelly R Mugo, Astrid Gall, François Blanquart, Margreet Bakker, Daniela Bezemer, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Barbara Gunsenheimer-Bartmeyer, Huldrych F Günthard, Pia Kivelä, Roger D Kouyos, Laurence Meyer, Kholoud Porter, Ard van Sighem, Mark van der Valk, Ben Berkhout, Paul Kellam, Marion Cornelissen, Peter Reiss, Helen Ayles, David N Burns, Sarah Fidler, Mary Kate Grabowski, Richard Hayes, Joshua T Herbeck, Joseph Kagaayi, Pontiano Kaleebu, Jairam R Lingappa, Deogratius Ssemwanga, Susan H Eshleman, Myron S Cohen, Oliver Ratmann, Oliver Laeyendecker, Christophe Fraser
{"title":"HIV-phyloTSI: subtype-independent estimation of time since HIV-1 infection for cross-sectional measures of population incidence using deep sequence data.","authors":"Tanya Golubchik, Lucie Abeler-Dörner, Matthew Hall, Chris Wymant, David Bonsall, George Macintyre-Cockett, Laura Thomson, Jared M Baeten, Connie L Celum, Ronald M Galiwango, Barry Kosloff, Mohammed Limbada, Andrew Mujugira, Nelly R Mugo, Astrid Gall, François Blanquart, Margreet Bakker, Daniela Bezemer, Swee Hoe Ong, Jan Albert, Norbert Bannert, Jacques Fellay, Barbara Gunsenheimer-Bartmeyer, Huldrych F Günthard, Pia Kivelä, Roger D Kouyos, Laurence Meyer, Kholoud Porter, Ard van Sighem, Mark van der Valk, Ben Berkhout, Paul Kellam, Marion Cornelissen, Peter Reiss, Helen Ayles, David N Burns, Sarah Fidler, Mary Kate Grabowski, Richard Hayes, Joshua T Herbeck, Joseph Kagaayi, Pontiano Kaleebu, Jairam R Lingappa, Deogratius Ssemwanga, Susan H Eshleman, Myron S Cohen, Oliver Ratmann, Oliver Laeyendecker, Christophe Fraser","doi":"10.1186/s12859-025-06189-y","DOIUrl":"10.1186/s12859-025-06189-y","url":null,"abstract":"<p><strong>Background: </strong>Estimating the time since HIV infection (TSI) at population level is essential for tracking changes in the global HIV epidemic. Most methods for determining TSI give a binary classification of infections as recent or non-recent within a window of several months, and cannot assess the cumulative impact of an intervention.</p><p><strong>Results: </strong>We developed a Random Forest Regression model, HIV-phyloTSI, which combines measures of within-host diversity and divergence to generate continuous TSI estimates directly from viral deep-sequencing data, with no need for additional variables. HIV-phyloTSI provides a continuous measure of TSI up to 9 years, with a mean absolute error of less than 12 months overall and less than 5 months for infections with a TSI of up to a year. It performs equally well for all major HIV subtypes based on data from African and European cohorts.</p><p><strong>Conclusions: </strong>We demonstrate how HIV-phyloTSI can be used for incidence estimates on a population level.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"212"},"PeriodicalIF":3.3,"publicationDate":"2025-08-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12351810/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144854398","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}