{"title":"DAPCy: a Python package for the discriminant analysis of principal components method for population genetic analyses.","authors":"Alejandro Correa Rojo, Pieter Moris, Hanne Meuwissen, Pieter Monsieurs, Dirk Valkenborg","doi":"10.1093/bioadv/vbaf143","DOIUrl":"10.1093/bioadv/vbaf143","url":null,"abstract":"<p><strong>Summary: </strong>The Discriminant Analysis of Principal Components method is a pivotal tool in population genetics, combining principal component analysis and linear discriminant analysis to assess the genetic structure of populations using genetic markers, focusing on the description of variation between genetic clusters. Despite its utility, the original R implementation in the adegenet package faces computational challenges with large genomic datasets. To address these limitations, we introduce DAPCy, a Python package leveraging the scikit-learn library to enhance the method's scalability and efficiency. DAPCy supports large datasets by utilizing compressed sparse matrices and truncated singular value decomposition for dimensionality reduction, coupled with training-test cross-validation for robust model evaluation. It also includes modules for <i>de novo</i> genetic clustering and extensive visualization and reporting capabilities. Compared to the original R implementation, DAPCy can process genomic datasets with thousands of samples and features in less computational time and with reduced memory usage. To show DAPCy's computational capabilities, we benchmarked it with the R implementation using the <i>Plasmodium falciparum</i> dataset from MalariaGEN and the 1000 Genomes Project.</p><p><strong>Availability and implementation: </strong>DAPCy can be installed as a Python package through pip. Source code is available on https://gitlab.com/uhasselt-bioinfo/dapcy. Documentation and a tutorial can be found on https://uhasselt-bioinfo.gitlab.io/dapcy/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf143"},"PeriodicalIF":2.4,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237503/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144593035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-18eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf145
Yaniv Swiel, Jean-Tristan Brandenburg, Mahtaab Hayat, Wenlong Carl Chen, Mitchell A Cox, Scott Hazelhurst
{"title":"FPGA acceleration of GWAS permutation testing.","authors":"Yaniv Swiel, Jean-Tristan Brandenburg, Mahtaab Hayat, Wenlong Carl Chen, Mitchell A Cox, Scott Hazelhurst","doi":"10.1093/bioadv/vbaf145","DOIUrl":"10.1093/bioadv/vbaf145","url":null,"abstract":"<p><p><b></b> Genome-wide association studies (GWASs) analyse genetic variation across many individuals to identify single-nucleotide polymorphisms (SNPs) associated with complex traits. They typically include millions of SNPs from thousands of individuals, creating a multiple testing problem where the probability of false associations increases with the number of SNPs tested. While permutation testing provides accurate control of false positive rates, it is computationally expensive and slow for large datasets. This research presents an FPGA-based tool designed for cloud deployment on AWS EC2 instances that significantly accelerates GWAS permutation testing for continuous phenotypes. The tool implements two algorithms: maxT and adaptive permutation testing. Performance comparisons using a breast cancer dataset (13.7 million SNPs from 3652 individuals) showed large speedups over PLINK running on 40 CPU cores. For 1000 maxT permutations, the FPGA tool completed analysis in 22 min versus PLINK's 7 days. For 100 million adaptive permutations, FPGA required 325 min compared to PLINK's 8.5 days. The tool handled 700 million adaptive permutations in 33 h-a workload which would require over a month for CPU-based analysis. FPGA solution provides accessible, order-of-magnitude performance improvements without requiring FPGA expertise or dedicated cluster access.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf145"},"PeriodicalIF":2.8,"publicationDate":"2025-06-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12237511/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144593036","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-16eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf129
Gabriel Tiago Galdino, Thomas DesCôteaux, Natalia Teruel, Rafael Najmanovich
{"title":"NRGSuite-Qt: a PyMOL plugin for high-throughput virtual screening, molecular docking, normal-mode analysis, the study of molecular interactions, and the detection of binding-site similarities.","authors":"Gabriel Tiago Galdino, Thomas DesCôteaux, Natalia Teruel, Rafael Najmanovich","doi":"10.1093/bioadv/vbaf129","DOIUrl":"10.1093/bioadv/vbaf129","url":null,"abstract":"<p><strong>Summary: </strong>We introduce NRGSuite-Qt, a PyMOL plugin, that provides a comprehensive toolkit for macromolecular cavity detection, virtual screening, small-molecule docking, normal mode analysis, analyses of molecular interactions, and detection of binding-site similarities. This complete redesign of the original NRGSuite (restricted to cavity detection and small-molecule docking) integrates five new functionalities: protein-protein and protein-ligand interaction analysis using Surfaces, ultra-massive virtual screening with NRGRank, binding-site similarity detection with IsoMIF, normal mode analysis using NRGTEN, and mutational studies through integration with the Modeler Suite. By merging these advanced tools into a cohesive platform, NRGSuite-Qt simplifies visualization and streamlines complex workflows within a single interface. Additionally, we benchmark a newer version of the Elastic Network Contact Model (ENCoM) for normal mode analysis method, utilizing the same 40 atom-type pairwise interaction matrix that is used in all other software. This version outperforms the default model in multiple benchmarking tests.</p><p><strong>Avalilability and implementation: </strong>The Installation guide and tutorial is available at https://nrg-qt.readthedocs.io/en/latest/index.html. The NRGSuite-Qt is implement in Python.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf129"},"PeriodicalIF":2.4,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12177131/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144334505","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-16eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf096
Bill Gates Happi Happi, Geraud Fokou Pelap, Danai Symeonidou, Pierre Larmande
{"title":"GRU-SCANET: unleashing the power of GRU-based sinusoidal capture network for precision-driven named entity recognition.","authors":"Bill Gates Happi Happi, Geraud Fokou Pelap, Danai Symeonidou, Pierre Larmande","doi":"10.1093/bioadv/vbaf096","DOIUrl":"10.1093/bioadv/vbaf096","url":null,"abstract":"<p><strong>Motivation: </strong>Pre-trained Language Models (PLMs) have achieved remarkable performance across various natural language processing tasks. However, they encounter challenges in biomedical named entity recognition (NER), such as high computational costs and the need for complex fine-tuning. These limitations hinder the efficient recognition of biological entities, especially within specialized corpora. To address these issues, we introduce GRU-SCANET (Gated Recurrent Unit-based Sinusoidal Capture Network), a novel architecture that directly models the relationship between input tokens and entity classes. Our approach offers a computationally efficient alternative for extracting biological entities by capturing contextual dependencies within biomedical texts.</p><p><strong>Results: </strong>GRU-SCANET combines positional encoding, bidirectional GRUs (BiGRUs), an attention-based encoder, and a conditional random field (CRF) decoder to achieve high precision in entity labeling. This design effectively mitigates the challenges posed by unbalanced data across multiple corpora. Our model consistently outperforms leading benchmarks, achieving better performance than BioBERT (8/8 evaluations), PubMedBERT (5/5 evaluations), and the previous state-of-the-art (SOTA) models (8/8 evaluations), including Bern2 (5/5 evaluations). These results highlight the strength of our approach in capturing token-entity relationships more effectively than existing methods, advancing the state of biomedicalNER.</p><p><strong>Availability and implementation: </strong>https://github.com/ANR-DIG-AI/GRU-SCANET.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf096"},"PeriodicalIF":2.4,"publicationDate":"2025-06-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12198495/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144509791","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-12eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf141
Enock Niyonkuru, J Harry Caufield, Leigh C Carmody, Michael A Gargano, Sabrina Toro, Patricia L Whetzel, Hannah Blau, Mauricio Soto Gomez, Elena Casiraghi, Leonardo Chimirri, Justin T Reese, Giorgio Valentini, Melissa A Haendel, Christopher J Mungall, Peter N Robinson
{"title":"Leveraging generative AI to assist biocuration of medical actions for rare disease.","authors":"Enock Niyonkuru, J Harry Caufield, Leigh C Carmody, Michael A Gargano, Sabrina Toro, Patricia L Whetzel, Hannah Blau, Mauricio Soto Gomez, Elena Casiraghi, Leonardo Chimirri, Justin T Reese, Giorgio Valentini, Melissa A Haendel, Christopher J Mungall, Peter N Robinson","doi":"10.1093/bioadv/vbaf141","DOIUrl":"10.1093/bioadv/vbaf141","url":null,"abstract":"<p><strong>Motivation: </strong>Structured representations of clinical data can support computational analysis of individuals and cohorts, and ontologies representing disease entities and phenotypic abnormalities are now commonly used for translational research. The Medical Action Ontology (MAxO) provides a computational representation of treatments and other actions taken for clinical management. Currently, manual biocuration is used to annotate MAxO terms to rare diseases. However, it is challenging to scale manual curation to comprehensively capture information about medical actions for the more than 10 000 rare diseases.</p><p><strong>Results: </strong>We present AutoMAxO, a semi-automated workflow that leverages Large Language Models (LLMs) to streamline MAxO biocuration. AutoMAxO first uses LLMs to retrieve candidate curations from abstracts of relevant publications. Next, the candidate curations are matched to ontology terms from MAxO, Human Phenotype Ontology (HPO), and MONDO disease ontology via a combination of LLMs and post-processing techniques. Finally, the matched terms are presented in a structured form to a human curator for approval. We used this approach to process abstracts related to 37 rare genetic diseases and identified 958 novel treatment annotations that were transferred to the MAxO annotation dataset.</p><p><strong>Availability and implementation: </strong>AutoMAxO is a Python package freely available at https://github.com/monarch-initiative/automaxo.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf141"},"PeriodicalIF":2.4,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228962/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577034","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"A comprehensive analysis of the inherited lncRNA and circRNA repertoire of zebrafish.","authors":"Dheeraj Chandra Joshi, Aakanksha Kadam, Chetana Sachidanandan, Beena Pillai","doi":"10.1093/bioadv/vbaf139","DOIUrl":"10.1093/bioadv/vbaf139","url":null,"abstract":"<p><strong>Motivation: </strong>Inherited non-coding RNAs can be the third major component of epigenetic information transfer from one generation to the next. Here, we present a comprehensive resource of lncRNAs and circular RNAs that are inherited, compiled from meta-analysis of zebrafish transcriptomics data and comparative genomics with mouse and human. Maternal and paternal inheritance of mRNA into the zygote is accepted to be an important regulator of embryonic development as well as adult characteristics. Although inheritance of certain specific miRNAs is known, other non-coding RNA inheritance remains less explored.</p><p><strong>Results: </strong>We performed a comprehensive analysis of the inherited lncRNAs and circRNAs in zebrafish. We discovered that nearly 20% of all known lncRNA and 7% of circRNAs are inherited. Many of these lncRNAs are conserved in mammals, and are expressed widely in adult tissues of zebrafish. The male and female gametes carry a highly similar pool of inherited lncRNAs, with only a few sperm/oocyte specific transcripts. The majority of inherited circRNAs originate from genes important for fertilization and can potentially regulate translational processes. Contrary to general belief, the inherited lncRNAs and circRNAs do not undergo degradation en masse coincidental to zygotic genomic activation, suggesting that these RNAs may have more sustained roles in development.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf139"},"PeriodicalIF":2.4,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12267137/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144661137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-12eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf119
Wei Han, Xuemei Zhang, Qingzhen Zhang, Zhe Zhou
{"title":"Next-generation sequencing-based tools or nanopore-based tools: which is more suitable for short tandem repeats genotyping of nanopore sequencing?","authors":"Wei Han, Xuemei Zhang, Qingzhen Zhang, Zhe Zhou","doi":"10.1093/bioadv/vbaf119","DOIUrl":"10.1093/bioadv/vbaf119","url":null,"abstract":"<p><strong>Motivation: </strong>Short tandem repeats (STRs) are widely recognized as critical genetic markers for individual identification. Nanopore sequencing technology holds promise as an effective tool for onsite STR detection owing to its portability. Initially, low sequencing quality led to the development of various genotyping tools specifically tailored for nanopore data. However, recent advancements in nanopore sequencing quality suggest that tools designed for next-generation sequencing (NGS) may be more suitable for analyzing nanopore data than those specifically developed for nanopore sequencing.</p><p><strong>Results: </strong>We selected two sequencing platforms, MinION Mk1C, and PolySeqOne, to generate sequencing data from 61 unrelated individual samples. Samples were amplified using a custom NanoSTR panel that included 31 autosomal STRs (A-STRs) and 31 Y chromosomal STRs (Y-STRs). Sequencing data were analyzed using four distinct tools: NASTRA, STRspy, STRinNGS, and STRait Razor. Our findings indicated that STRinNGS showed greater accuracy for both A-STRs and Y-STRs, enabling the accurate detection of a broad range of STRs. Compared with STRinNGS, NASTRA exhibited greater STR depth and featured more non-integer stutters. Therefore, in practical applications, STRinNGS demonstrates high reliability in genotyping.</p><p><strong>Availability and implementation: </strong>NASTRA, STRspy, STRinNGS and STRait Razor, which can be accessed via the following links: https://github.com/renzilin/NASTRA, https://github.com/unique379r/strspy, https://bitbucket.org/rirgabiss/strinngs/src/master, and https://github.com/Ahhgust/STRaitRazor, respectively. The commands during process are provided as requested by the corresponding author.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf119"},"PeriodicalIF":2.4,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12167636/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144303789","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf137
Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Ozlem Ozmen Garibay
{"title":"BoKDiff: best-of-K diffusion alignment for target-specific 3D molecule generation.","authors":"Ali Khodabandeh Yalabadi, Mehdi Yazdani-Jahromi, Ozlem Ozmen Garibay","doi":"10.1093/bioadv/vbaf137","DOIUrl":"10.1093/bioadv/vbaf137","url":null,"abstract":"<p><strong>Motivation: </strong>Structure-based drug design (SBDD) leverages the 3D structure of target proteins to guide therapeutic development. While generative models like diffusion models and geometric deep learning show promise in ligand design, challenges such as limited protein-ligand data and poor alignment reduce their effectiveness. We introduce BoKDiff, a domain-adapted framework inspired by alignment strategies in large language and vision models that combines multi-objective optimization with Best-of-K alignment to enhance ligand generation.</p><p><strong>Results: </strong>Built on DecompDiff, BoKDiff generates diverse ligands and ranks them using a weighted score based on QED, SA, and docking metrics. To overcome alignment issues, we reposition each ligand's center of mass to match its docking pose, enabling more accurate sub-component extraction. We further incorporate a Best-of-N (BoN) sampling strategy to select optimal candidates without model fine-tuning. BoN achieves QED > 0.6, SA > 0.75, and over 35% success rate. BoKDiff outperforms prior models on the CrossDocked2020 dataset with an average docking score of -8.58 and 26% valid molecule generation rate. This is the first study to integrate Best-of-K alignment and BoN sampling into SBDD, demonstrating their potential for practical, high-quality ligand design.</p><p><strong>Availability and implementation: </strong>Code is available at https://github.com/khodabandeh-ali/BoKDiff.git.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf137"},"PeriodicalIF":2.4,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228967/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf134
Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek
{"title":"UTRGAN: learning to generate 5' UTR sequences for optimized translation efficiency and gene expression.","authors":"Sina Barazandeh, Furkan Ozden, Ahmet Hincer, Urartu Ozgur Safak Seker, A Ercument Cicek","doi":"10.1093/bioadv/vbaf134","DOIUrl":"10.1093/bioadv/vbaf134","url":null,"abstract":"<p><strong>Motivation: </strong>The 5' untranslated region (5' UTR) of mRNA is crucial for the molecule's translatability and stability, making it essential for designing synthetic biological circuits for high and stable protein expression. Several UTR sequences are patented and widely used in laboratories. This paper presents UTRGAN, a Generative Adversarial Network (GAN)-based model for generating 5' UTR sequences, coupled with an optimization procedure to ensure high expression for target gene sequences or high ribosome load and translation efficiency.</p><p><strong>Results: </strong>The model generates sequences mimicking various properties of natural UTR sequences and optimizes them to achieve (i) up to five-fold higher average predicted expression on target genes, (ii) up to two-fold higher predicted mean ribosome load, and (iii) a 34-fold higher average predicted translation efficiency compared to initial UTR sequences. UTRGAN-generated sequences also exhibit higher similarity to known regulatory motifs in regions such as internal ribosome entry sites, upstream open reading frames, G-quadruplexes, and Kozak and initiation start codon regions. <i>In-vitro</i> experiments show that the UTR sequences designed by UTRGAN result in a higher translation rate for the human TNF- <math><mi>α</mi></math> protein compared to the human Beta Globin 5' UTR, a UTR with high production capacity.</p><p><strong>Availability and implementation: </strong>The source code, including the model implementation and the optimization are released at http://github.com/ciceklab/UTRGAN. We downloaded the dataset from the UTRdb 2.0 database and available within the GitHub repository.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf134"},"PeriodicalIF":2.4,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12228966/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144577035","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Bioinformatics advancesPub Date : 2025-06-10eCollection Date: 2025-01-01DOI: 10.1093/bioadv/vbaf126
Katarzyna Górczak, Tomasz Burzykowski, Jürgen Claesen
{"title":"A hierarchical negative-binomial model for analysis of correlated sequencing data: practical implementations.","authors":"Katarzyna Górczak, Tomasz Burzykowski, Jürgen Claesen","doi":"10.1093/bioadv/vbaf126","DOIUrl":"10.1093/bioadv/vbaf126","url":null,"abstract":"<p><p>High-throughput techniques for biological and (bio)medical sciences often result in read counts used in downstream analysis. Nowadays, complex experimental designs in combination with these high-throughput methods are regularly applied and lead to correlated count-data measured from matched samples or taken from the same subject under multiple treatment conditions. Additionally, as is common with biological data, the variance is often larger than the mean, leading to over dispersed count data. Hierarchical models have been proposed to analyze over dispersed, correlated data from paired, longitudinal, or clustered experiments. We consider a hierarchical negative-binomial model with normally distributed random effects to account for the within- and between-sample correlation. We focus on different software implementations to allow the use of the model in practice.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf126"},"PeriodicalIF":2.4,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12256765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144638817","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}