Bioinformatics advancesPub Date : 2025-07-21eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf176
Altaf Barelvi, Oliver Anderson, Anna Ritz
{"title":"GRPhIN: graphlet characterization of regulatory and physical interaction networks.","authors":"Altaf Barelvi, Oliver Anderson, Anna Ritz","doi":"10.1093/bioadv/vbaf176","DOIUrl":"10.1093/bioadv/vbaf176","url":null,"abstract":"<p><strong>Motivation: </strong>Graphs are powerful tools for modeling and analyzing molecular interaction networks. Graphs typically represent either undirected physical interactions or directed regulatory relationships, which can obscure a particular protein's functional context. Graphlets can describe local topologies and patterns within graphs, and combining physical and regulatory interactions offer new graphlet configurations that can provide biological insights.</p><p><strong>Results: </strong>We present GRPhIN, a tool for characterizing graphlets and protein roles within graphlets in mixed physical and regulatory interaction networks. We describe the graphlets of mixed networks in <i>Bacillus subtilis</i>, <i>Caenorhabditis elegans</i>, <i>Drosophila melanogaster</i>, <i>Danio rerio</i>, and <i>Saccharomyces cerevisiae</i> and examine local topologies of proteins and subnetworks related to the oxidative stress response pathway. We found a number of graphlets that were abundant in all species, specific node positions (orbits) within graphlets that were overrepresented in stress-associated proteins, and rarely-occurring graphlets that were overrepresented in oxidative stress subnetworks. These results showcase the potential for using graphlets in mixed physical and regulatory interaction networks to identify new patterns beyond a single interaction type.</p><p><strong>Availability and implementation: </strong>GRPhIN is available at https://github.com/Reed-CompBio/grphin.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf176"},"PeriodicalIF":2.8,"publicationDate":"2025-07-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12317317/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144777039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-18eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf168
Danilo Silva, Monika Moir, Marcel Dunaiski, Natalia Blanco, Fati Murtala-Ibrahim, Cheryl Baxter, Tulio de Oliveira, Joicymara S Xavier
{"title":"Review of open-source software for developing heterogeneous data management systems for bioinformatics applications.","authors":"Danilo Silva, Monika Moir, Marcel Dunaiski, Natalia Blanco, Fati Murtala-Ibrahim, Cheryl Baxter, Tulio de Oliveira, Joicymara S Xavier","doi":"10.1093/bioadv/vbaf168","DOIUrl":"10.1093/bioadv/vbaf168","url":null,"abstract":"<p><strong>Summary: </strong>In a world where data drive effective decision-making, bioinformatics and health science researchers often encounter difficulties managing data efficiently. In these fields, data are typically diverse in format and subject. Consequently, challenges in storing, tracking, and responsibly sharing valuable data have become increasingly evident over the past decades. To address the complexities, some approaches have leveraged standard strategies, such as using non-relational databases and data warehouses. However, these approaches often fall short in providing the flexibility and scalability required for complex projects. While the data lake paradigm has emerged to offer flexibility and handle large volumes of diverse data, it lacks robust data governance and organization. The data lakehouse is a new paradigm that combines the flexibility of a data lake with the governance of a data warehouse, offering a promising solution for managing heterogeneous data in bioinformatics. However, the lakehouse model remains unexplored in bioinformatics, with limited discussion in the current literature. In this study, we review strategies and tools for developing a data lakehouse infrastructure tailored to bioinformatics research. We summarize key concepts and assess available open-source and commercial solutions for managing data in bioinformatics.</p><p><strong>Availability and implementation: </strong>Not applicable.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf168"},"PeriodicalIF":2.8,"publicationDate":"2025-07-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321290/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786059","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"<i>ScaleSC</i>: a superfast and scalable single-cell RNA-seq data analysis pipeline powered by GPU.","authors":"Wenxing Hu, Haotian Zhang, Yu H Sun, Shaolong Cao, Jake Gagnon, Yuka Moroishi, Yirui Chen, Zhengyu Ouyang, Baohong Zhang","doi":"10.1093/bioadv/vbaf167","DOIUrl":"10.1093/bioadv/vbaf167","url":null,"abstract":"<p><strong>Summary: </strong>The rise of large-scale single-cell RNA-seq data has introduced challenges in data processing due to its slow speed. Leveraging advancements in Graphics Processing Unit (GPU) computing ecosystems, such as <i>CuPy</i> and Compute Unified Device Architecture (CUDA), building on <i>Scanpy</i> and <i>Rapids-singlecell</i> package, we developed <i>ScaleSC</i>, a GPU-accelerated solution for large-scale single-cell data processing. <i>ScaleSC</i> delivers over a 20× speedup through GPU computing and significantly improves scalability, handling datasets of 10-20 million cells with over 1000 batches by overcoming the memory bottleneck on a single A100 card, which far surpasses <i>Rapids-singlecell'</i>s capacity of processing only 1 million cells without multi-GPU support. We also resolved discrepancies between GPU and Central Processing Unit (CPU) algorithm implementations to ensure consistency. In addition to core optimizations, we developed novel tools for marker gene identification and cluster merging with GPU-optimized implementations seamlessly integrated. <i>ScaleSC</i> shares a similar syntax with <i>Scanpy</i>, which helps lower the learning curve for users already familiar with <i>Scanpy</i> workflows.</p><p><strong>Availability and implementation: </strong>The <i>ScaleSC</i> package (https://github.com/interactivereport/ScaleSC) promises significant benefits for the single-cell RNA-seq computational community.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf167"},"PeriodicalIF":2.8,"publicationDate":"2025-07-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321287/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786058","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-16eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf172
Hangjia Zhao, Michael Baudis
{"title":"<i>pgxRpi</i>: an R/bioconductor package for user-friendly access to the Beacon v2 API.","authors":"Hangjia Zhao, Michael Baudis","doi":"10.1093/bioadv/vbaf172","DOIUrl":"10.1093/bioadv/vbaf172","url":null,"abstract":"<p><strong>Motivation: </strong>The Beacon v2 specification, established by the Global Alliance for Genomics and Health (GA4GH), consists of a standardized framework and data models for genomic and phenotypic data discovery. By enabling secure, federated data sharing, it fosters interoperability across genomic resources. Progenetix, a Beacon v2 reference implementation, exemplifies its potential for large-scale genomic data integration, offering open access to genomic mutation data across diverse cancer types.</p><p><strong>Results: </strong>We present <i>pgxRpi</i>, an open-source R/Bioconductor package that provides a streamlined interface to the Progenetix Beacon v2 REST API, facilitating efficient and flexible genomic data retrieval. Beyond data access, <i>pgxRpi</i> offers integrated visualization and analysis functions, enabling users to explore, interpret, and process queried data effectively. Leveraging the flexibility of the Beacon v2 standard, <i>pgxRpi</i> extends beyond Progenetix, supporting interoperable data access across multiple Beacon-enabled resources, thereby enhancing data-driven discovery in genomics.</p><p><strong>Availability and implementation: </strong><i>pgxRpi</i> is freely available under the Artistic-2.0 license from Bioconductor (https://doi.org/10.18129/B9.bioc.pgxRpi), with actively maintained source code on GitHub (https://github.com/progenetix/pgxRpi). Comprehensive usage instructions and example workflows are provided in the package vignettes, available at https://github.com/progenetix/pgxRpi/tree/devel/vignettes.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf172"},"PeriodicalIF":2.8,"publicationDate":"2025-07-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12321294/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144786057","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-15eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf171
Ermes Filomena, Ernesto Picardi, Graziano Pesole, Anna Maria D'Erchia
{"title":"IsoPrimer: a pipeline for designing isoform-aware primer pairs for comprehensive gene expression quantification.","authors":"Ermes Filomena, Ernesto Picardi, Graziano Pesole, Anna Maria D'Erchia","doi":"10.1093/bioadv/vbaf171","DOIUrl":"10.1093/bioadv/vbaf171","url":null,"abstract":"<p><strong>Motivation: </strong>Eukaryotic genes can perform different functions by generating multiple transcripts through the alternative splicing mechanism. The accurate quantification of gene expression in specific conditions is important for functional assessment and requires an accurate PCR primer pair design to target all expressed alternative transcripts, a complex and prone-to-error task if performed manually.</p><p><strong>Results: </strong>To efficiently address this task, we developed a pipeline, called IsoPrimer, to design PCR primer pairs targeting the specific set of expressed splicing variants of the genes of interest, to be used in quantitative PCR, e.g. in RNA-seq validation experiments. IsoPrimer, according to the level of expression of the splicing variants derived from an RNA-seq dataset, can: (i) identify the most expressed gene isoforms; (ii) design primer pairs overlapping exon-exon junctions common to the expressed variants; (iii) verify the specificity of the designed primer pairs.</p><p><strong>Availability and implementation: </strong>IsoPrimer is available for download from https://github.com/BioinfoUNIBA/IsoPrimer.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf171"},"PeriodicalIF":2.8,"publicationDate":"2025-07-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12311343/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144762471","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-14eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf170
Muhammad Tahir, Sheela Ramanna, Qian Liu
{"title":"Hybrid representation learning for human m<sup>6</sup>A modifications with chromosome-level generalizability.","authors":"Muhammad Tahir, Sheela Ramanna, Qian Liu","doi":"10.1093/bioadv/vbaf170","DOIUrl":"10.1093/bioadv/vbaf170","url":null,"abstract":"<p><strong>Motivation: </strong><math> <mrow> <mrow> <msup><mrow><mi>N</mi></mrow> <mn>6</mn></msup> </mrow> <mo>-</mo> <mtext>methyladenosine</mtext></mrow> </math> ( <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> ) is the most abundant internal modification in eukaryotic mRNA and plays essential roles in post-transcriptional gene regulation. While several deep learning approaches have been proposed to predict <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> sites, most suffer from limited chromosome-level generalizability due to evaluation on randomly split datasets.</p><p><strong>Results: </strong>In this study, we propose two novel hybrid deep learning models-Hybrid Model and Hybrid Deep Model-that integrate local sequence features (<i>k</i>-mers) and contextual embeddings via convolutional neural networks to improve predictive performance and generalization. We evaluate these models using both a Random-Split strategy and a more biologically realistic Leave-One-Chromosome-Out setting to ensure robustness across genomic regions. Our proposed models outperform the state-of-the-art m6A-TCPred model across all key evaluation metrics. Hybrid Deep Model achieves the highest accuracy under Random-Split, while Hybrid Model demonstrates superior generalization under Leave-One-Chromosome-Out, indicating that deep global representations may overfit in chromosome-independent settings. These findings underscore the importance of rigorous validation strategies and offer insights into designing robust <math> <mrow> <mrow> <msup><mrow><mi>m</mi></mrow> <mn>6</mn></msup> </mrow> <mi>A</mi></mrow> </math> predictors.</p><p><strong>Availability and implementation: </strong>Source code and datasets are available at: https://github.com/malikmtahir/LOCO-m6A.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf170"},"PeriodicalIF":2.8,"publicationDate":"2025-07-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12288952/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144710028","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-12eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf166
Jorge Lázaro, Arin Wongprommoon, Jorge Júlvez, Stephen G Oliver
{"title":"Enhancing genome-scale metabolic models with kinetic data: resolving growth and citramalate production trade-offs in <i>Escherichia coli</i>.","authors":"Jorge Lázaro, Arin Wongprommoon, Jorge Júlvez, Stephen G Oliver","doi":"10.1093/bioadv/vbaf166","DOIUrl":"10.1093/bioadv/vbaf166","url":null,"abstract":"<p><strong>Summary: </strong>Metabolic models are valuable tools for analyzing and predicting cellular features such as growth, gene essentiality, and product formation. Among the various types of metabolic models, two prominent categories are constraint-based models and kinetic models. Constraint-based models typically represent a large subset of an organism's metabolic reactions and incorporate reaction stoichiometry, gene regulation, and constant flux bounds. However, their analyses are restricted to steady-state conditions, making it difficult to optimize competing objective functions. In contrast, kinetic models offer detailed kinetic information but are limited to a smaller subset of metabolic reactions, providing precise predictions for only a fraction of an organism's metabolism. To address these limitations, we proposed a hybrid approach that integrates these modeling frameworks by redefining the flux bounds in genome-scale constraint-based models using kinetic data. We applied this method to the constraint-based model of <i>Escherichia coli</i>, examining both its wild-type form and a genetically modified strain engineered for citramalate production. Our results demonstrate that the enriched model achieves more realistic reaction flux boundaries. Furthermore, by fixing the growth rate to a value derived from kinetic information, we resolved a flux bifurcation between growth and citramalate production in the modified strain, enabling accurate predictions of citramalate production rates.</p><p><strong>Availability and implementation: </strong>The Python code generated for this work is available at: https://github.com/jlazaroibanezz/citrabounds.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf166"},"PeriodicalIF":2.8,"publicationDate":"2025-07-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12341681/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144838703","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"MitSorter: a standalone tool for accurate discrimination of mtDNA and NuMT ONT reads based on differential methylation.","authors":"Sharon Natasha Cox, Angelo Sante Varvara, Graziano Pesole","doi":"10.1093/bioadv/vbaf135","DOIUrl":"10.1093/bioadv/vbaf135","url":null,"abstract":"<p><strong>Motivation: </strong>The accurate differentiation between mitochondrial DNA (mtDNA) and nuclear mitochondrial DNA segments (NuMTs) is a critical challenge in studies involving mitochondrial disorders. Mapping the mtDNA mutation spectrum and quantifying heteroplasmy are complex tasks when using next-generation sequencing methods, mostly due to NuMTs contamination in data analysis.</p><p><strong>Results: </strong>Here, we present a novel, easy-to-use standalone command-line tool designed to reliably discriminate long reads originated by either mtDNA or NuMTs and generated by Oxford Nanopore Technologies (ONT) sequencing based on the known lack of CpG methylation in human mtDNA. MitSorter aligns the reads to the mitochondrial genome incorporating base modification calls directly from raw POD5 files. The resulting BAM file is then partitioned into two separate BAM files: one containing unmethylated reads and the other containing methylated reads. We show that MitSorter analysis can provide a more accurate landscape of the mtDNA mutation profile. We describe here the tool's features, computational framework, validation approach, and its potential applications in other genomic research areas.</p><p><strong>Availability and implementation: </strong>Source code and documentation, are available at https://github.com/asvarvara/MitSorter.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf135"},"PeriodicalIF":2.4,"publicationDate":"2025-07-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12275464/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144676691","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-09eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf165
Liviu-Iulian Rotaru, Marius Surleac
{"title":"PeGAS: a versatile bioinformatics pipeline for antimicrobial resistance, virulence and pangenome analysis.","authors":"Liviu-Iulian Rotaru, Marius Surleac","doi":"10.1093/bioadv/vbaf165","DOIUrl":"10.1093/bioadv/vbaf165","url":null,"abstract":"<p><strong>Motivation: </strong>Antimicrobial resistance is increasingly recognized as one of the most significant global health threats, with profound implications for human, animal, and environmental health. Genome analysis represents a very useful tool that provides accurate and reproducible results allowing for the advancement of knowledge regarding antimicrobial resistance diagnosis, therapeutics, surveillance, transmission, and evolution. However, due to increasing complexity of bacterial genome analysis and computational power required for genomic approaches, there is a continuous need for comprehensive, user-friendly tools for data analysis. We developed Pangenome and Genomic Analysis Suite (PeGAS), to address some of these challenges by offering an all-in-one pipeline that performs a range of analyses.</p><p><strong>Results: </strong>PeGAS integrates key genomic analysis features of bacteria whole genome sequencing, including the prediction of antimicrobial resistance profiles, sorted by various categories of antibiotics, VF detection, and plasmid replicon assignment. The pipeline also performs pangenome analysis, multilocus sequence typing, genome assembly quality control (by reporting statistics such as GC content, contig length, the number of contigs, as well as variation from certain GC thresholds) providing a comprehensive genomic overview. PeGAS also offers the ability to restart seamlessly from any sporadic interruptions that might occur during long or resource-intensive runs.</p><p><strong>Availability and implementation: </strong>PeGAS is available at: https://github.com/liviurotiul/PeGAS.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf165"},"PeriodicalIF":2.8,"publicationDate":"2025-07-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12308278/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144755179","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-07-08eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf154
Nicolas Kubista, Danielle Braun, Giovanni Parmigiani
{"title":"The <i>penetrance</i> R package for estimation of age specific risk in family-based studies.","authors":"Nicolas Kubista, Danielle Braun, Giovanni Parmigiani","doi":"10.1093/bioadv/vbaf154","DOIUrl":"10.1093/bioadv/vbaf154","url":null,"abstract":"<p><strong>Motivation: </strong>Reliable tools and software for penetrance (age-specific risk among those who carry a genetic variant) estimation are critical to improving clinical decision making and risk assessment for hereditary syndromes. However, there is a lack of easily usable software for penetrance estimation in family-based studies that implements a Bayesian estimation approach.</p><p><strong>Results: </strong>We introduce <i>penetrance</i>, an open-source R package available on CRAN, to estimate age-specific penetrance using family-history pedigree data. The package uses a Bayesian estimation approach, allowing for the incorporation of prior knowledge through the specification of priors for the parameters of the carrier distribution. It also includes options to impute missing ages during the estimation process, addressing incomplete age information which is not uncommon in pedigree datasets. Our open-source software provides a flexible and user-friendly tool for researchers to estimate penetrance in complex family-based studies, facilitating improved genetic risk assessment in hereditary syndromes.</p><p><strong>Availability and implementation: </strong>The <i>penetrance</i> package is freely available on CRAN. Source code and documentation are available at https://github.com/nicokubi/penetrance.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf154"},"PeriodicalIF":2.4,"publicationDate":"2025-07-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12270257/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144661139","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}