Weiyin Zhou, Wen-Yi Huang, Neal D Freedman, Mitchell Machiela
{"title":"Estimation of mosaic loss of Y chromosome cell fraction with genotyping arrays lacking coverage in the pseudoautosomal region.","authors":"Weiyin Zhou, Wen-Yi Huang, Neal D Freedman, Mitchell Machiela","doi":"10.1186/s12859-025-06076-6","DOIUrl":"10.1186/s12859-025-06076-6","url":null,"abstract":"<p><strong>Background: </strong>Mosaic loss of the Y chromosome (mLOY) in circulating leukocytes is the most frequently detected age-related chromosomal mosaic event in men. Current mLOY detection approaches use genotyping arrays and employ a phase-based approach that identifies B allele frequency (BAF) deviations in the pseudo-autosomal region (PAR) shared between the X and Y chromosome. As some widely used genotyping arrays lack sufficient probe coverage of the PAR, methods for accurately measuring mLOY utilizing the median log<sub>2</sub> R ratio across the male-specific region of Y chromosome (mLRR_Y) are needed for detecting mLOY on these platforms.</p><p><strong>Results: </strong>We derived a formula from mLRR_Y to estimate the cellular fraction (CF) of cells with Y loss and validated the approach, finding high alignment with the CF estimation from female data and lab-generated qPCR data (R<sup>2</sup> = 0.98). Additionally, we compared the correlation between phase-based BAF and mLRR_Y methods for CF estimation, achieving a high correlation with R<sup>2</sup> > 0.80.</p><p><strong>Conclusion: </strong>Although mLRR_Y is a noisier metric for mosaic chromosomal alteration detection relative to BAF, we demonstrate mLRR_Y across non-PAR variants can accurately estimate mLOY CF, especially for high CF mLOY.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"60"},"PeriodicalIF":2.9,"publicationDate":"2025-02-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837314/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143456626","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"BHCox: Bayesian heredity-constrained Cox proportional hazards models for detecting gene-environment interactions.","authors":"Na Sun, Qiang Han, Yu Wang, Mengtong Sun, Ziqing Sun, Hongpeng Sun, Yueping Shen","doi":"10.1186/s12859-025-06077-5","DOIUrl":"10.1186/s12859-025-06077-5","url":null,"abstract":"<p><strong>Background: </strong>Gene-environment (G × E) interactions play a critical role in understanding the etiology of diseases and exploring the factors that affect disease prognosis. There are several challenges in detecting G × E interactions for censored survival outcomes, such as the high dimensionality, complexity of environmental effects, and specificity of survival analysis. The effect heredity, which incorporates the dependence of the main effects and interactions in the analysis, has been widely applied in the study of interaction detection. However, it has not yet been applied to Bayesian Cox proportional hazards models for detecting interactions for censored survival outcomes.</p><p><strong>Results: </strong>In this study, we propose Bayesian heredity-constrained Cox proportional hazards (BHCox) models with novel spike-and-slab and regularized horseshoe priors that incorporate effect heredity to identify and estimate the main and interaction effects. The no-U-turn sampler (NUTS) algorithm, which has been implemented in the R package brms, was used to fit the proposed model. Extensive simulations were performed to evaluate and compare our proposed approaches with other alternative models. The simulation studies illustrated that BHCox models outperform other alternative models. We applied the proposed method to real data of non-small-cell lung cancer (NSCLC) and identified biologically plausible G × smoking interactions associated with the prognosis of patients with NSCLC.</p><p><strong>Conclusions: </strong>In summary, BHCox can be used to detect the main effects and interactions and thus have significant implications for the discovery of high-dimensional interactions in censored survival outcome data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"58"},"PeriodicalIF":2.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834309/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143448003","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz
{"title":"A Dirichlet-multinomial mixed model for determining differential abundance of mutational signatures.","authors":"Lena Morrill Gavarró, Dominique-Laurent Couturier, Florian Markowetz","doi":"10.1186/s12859-025-06055-x","DOIUrl":"10.1186/s12859-025-06055-x","url":null,"abstract":"<p><strong>Background: </strong>Mutational processes of diverse origin leave their imprints in the genome during tumour evolution. These imprints are called mutational signatures and they have been characterised for point mutations, structural variants and copy number changes. Each signature has an exposure, or abundance, per sample, which indicates how much a process has contributed to the overall genomic change. Mutational processes are not static, and a better understanding of their dynamics is key to characterise tumour evolution and identify cancer cell vulnerabilities that can be exploited during treatment. However, the structure of the data typically collected in this context makes it difficult to test whether signature exposures differ between conditions or time-points when comparing groups of samples. In general, the data consists of multivariate count mutational data (e.g. signature exposures) with two observations per patient, each reflecting a group.</p><p><strong>Results: </strong>We propose a mixed-effects Dirichlet-multinomial model: within-patient correlations are taken into account with random effects, possible correlations between signatures by making such random effects multivariate, and a group-specific dispersion parameter can deal with particularities of the groups. Moreover, the model is flexible in its fixed-effects structure, so that the two-group comparison can be generalised to several groups, or to a regression setting. We apply our approach to characterise differences of mutational processes between clonal and subclonal mutations across 23 cancer types of the PCAWG cohort. We find ubiquitous differential abundance of clonal and subclonal signatures across cancer types, and higher dispersion of signatures in the subclonal group, indicating higher variability between patients at subclonal level, possibly due to the presence of different clones with distinct active mutational processes.</p><p><strong>Conclusions: </strong>Mutational signature analysis is an expanding field and we envision our framework to be used widely to detect global changes in mutational process activity. Our methodology is available in the R package CompSign and offers an ample toolkit for the analysis and visualisation of differential abundance of compositional data such as, but not restricted to, mutational signatures.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"59"},"PeriodicalIF":2.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11837616/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143447956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Na Quan, Shicheng Ma, Kai Zhao, Xuehua Bi, Linlin Zhang
{"title":"MFCADTI: improving drug-target interaction prediction by integrating multiple feature through cross attention mechanism.","authors":"Na Quan, Shicheng Ma, Kai Zhao, Xuehua Bi, Linlin Zhang","doi":"10.1186/s12859-025-06075-7","DOIUrl":"10.1186/s12859-025-06075-7","url":null,"abstract":"<p><p>Accurately identifying potential drug-target interactions (DTIs) is a critical step in drug discovery. Multiple heterogeneous biological data provide abundant features for DTI prediction. Many computational methods have been proposed based on these data. However, most of these methods either extract features from sequences or from networks, utilizing only one aspect of the characteristics of drugs and targets, neglecting the complementary information between these two types of features. In fact, integrating different types of features will provide more valuable information for DTI prediction. In this article, we propose a novel method to improve the predictive capability for DTIs, named MFCADTI, by integrating multi-source feature through cross-attention mechanisms. The method extracts network topological features from the heterogeneous network and attribute features from sequences of drugs and targets. Considering the complementarity and heterogeneity between network and attribute features, cross-attention mechanisms are used to integrate the network and attribute features of drugs and targets. To capture the correlations between drugs and targets, cross-attention is used to learn the interaction features of each drug-target pair. We evaluate MFCADTI on two datasets and experimental results demonstrate a significant improvement in the performance of MFCADTI compared to state-of-the-art methods. Finally, case studies illustrate that MFCADTI is an effective DTI prediction way that provides valuable guidance for drug development. The data and source code used in this study are available at: https://github.com/Dejavun/MFCADTI .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"57"},"PeriodicalIF":2.9,"publicationDate":"2025-02-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834641/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143448006","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai
{"title":"A novel weighted pseudo-labeling framework based on matrix factorization for adverse drug reaction prediction.","authors":"Junheng Chen, Fangfang Han, Mingxiu He, Yiyang Shi, Yongming Cai","doi":"10.1186/s12859-025-06053-z","DOIUrl":"10.1186/s12859-025-06053-z","url":null,"abstract":"<p><p>Adverse drug reactions (ADRs) are among the global public health events that seriously endanger human life and cause high economic burdens. Therefore, predicting the possibility of their occurrence and taking early and effective response measures is of great significance. Constructing a correlation matrix between drugs and their adverse reactions, followed by effective correlation data mining, is one of the current strategies to predict ADRs using accessible public data. Since the number of known ADRs in real-world data is far less than the number of their unknown counterparts, the drug-ADR association matrix is very sparse, which greatly affects the classification performance of machine learning methods. To effectively address the problem of sparsity, we proposed a novel weighted pseudo-labeling framework that mines potential unknown drug-ADR pairs by integrating multiple weighted matrix factorization (MF) models and treating them as pseudo-labeled drug-ADR pairs. Pseudo-labeled data is added to the training set, and the MF model is fine-tuned to improve the classification performance. To prevent overfitting to easily found pseudo-labels and improve the quality of pseudo-labels, a novel weighting approach for pseudo-labels was adopted. This paper reproduces the baselines under the same experimental conditions to evaluate the performance of the proposed method on sparse data from the Side Effect Resource (SIDER) database. Experimental results showed that our method outperformed other baselines in the Area Under Precision-Recall and F1-scores and still maintained the best performance in sparser scenarios. Furthermore, we conducted a case study, and the results showed that our proposed framework efficiently predicted ADRs in the real world.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"54"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11831795/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439536","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Annette Spooner, Gelareh Mohammadi, Perminder S Sachdev, Henry Brodaty, Arcot Sowmya
{"title":"Identifying risk factors for Alzheimer's disease from multivariate longitudinal clinical data using temporal pattern mining.","authors":"Annette Spooner, Gelareh Mohammadi, Perminder S Sachdev, Henry Brodaty, Arcot Sowmya","doi":"10.1186/s12859-024-06018-8","DOIUrl":"10.1186/s12859-024-06018-8","url":null,"abstract":"<p><strong>Background: </strong>Patient data contain a wealth of information that could aid in understanding the onset and progression of disease. However, the task of modelling clinical data, which consist of multiple heterogeneous time series of different lengths, measured at different time intervals, is a complex one. A growing body of research has applied temporal pattern mining to this problem to identify common patterns in clinical attributes over time. However, the vast majority of these algorithms use techniques that are not ideally suited to clinical data. We present an efficient and scalable framework designed specifically for temporal pattern mining of real-world clinical data. Our framework combines temporal abstraction, an extended version of the efficient pattern-growth algorithm, TPMiner, the concepts of relative risk and the odds ratio to identify interesting and high-risk patterns and multiprocessing to improve computational efficiency. A complete set of cut-off values for discretisation and interpretation of the data is provided and is applicable to studies on ageing populations in general. We name this framework Clinical Temporal Pattern Mining or C-TPM.</p><p><strong>Results: </strong>The framework is applied to data from two real-world studies of Alzheimer's disease (AD). The patterns discovered were predictive of AD in survival analysis models with a Concordance index of up to 0.87 and contain clinically relevant variables. A visualisation module provides a clear picture of the discovered patterns for ease of interpretability.</p><p><strong>Conclusions: </strong>The framework provides an effective and scalable method of modelling multivariate, longitudinal clinical data and can identify patterns in uncommon diseases and those that progress slowly over time. It is generalisable to clinical data from other medical domains as well as non-clinical data.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"56"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834509/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439662","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"Harnessing pre-trained models for accurate prediction of protein-ligand binding affinity.","authors":"Jiashan Li, Xinqi Gong","doi":"10.1186/s12859-025-06064-w","DOIUrl":"10.1186/s12859-025-06064-w","url":null,"abstract":"<p><strong>Background: </strong>The binding between proteins and ligands plays a crucial role in the field of drug discovery. However, this area currently faces numerous challenges. On one hand, existing methods are constrained by the limited availability of labeled data, often performing inadequately when addressing complex protein-ligand interactions. On the other hand, many models struggle to effectively capture the flexible variations and relative spatial relationships between proteins and ligands. These issues not only significantly hinder the advancement of protein-ligand binding research but also adversely affect the accuracy and efficiency of drug discovery. Therefore, in response to these challenges, our study aims to enhance predictive capabilities through innovative approaches, providing more reliable support for drug discovery efforts.</p><p><strong>Methods: </strong>This study leverages a pre-trained model with spatial awareness to enhance the prediction of protein-ligand binding affinity. By perturbing the structures of small molecules in a manner consistent with physical constraints and employing self-supervised tasks, we improve the representation of small molecule structures, allowing for better adaptation to affinity predictions. Meanwhile, our approach enables the identification of potential binding sites on proteins.</p><p><strong>Results: </strong>Our model demonstrates a significantly higher correlation coefficient in binding affinity predictions. Extensive evaluation on the PDBBind v2019 refined set, CASF, and Merck FEP benchmarks confirms the model's robustness and strong generalization across diverse datasets. Additionally, the model achieves over 95% in classification ROC for binding site identification, underscoring its high accuracy in pinpointing protein-ligand interaction regions.</p><p><strong>Conclusion: </strong>This research presents a novel approach that not only enhances the accuracy of binding affinity predictions but also facilitates the identification of binding sites, showcasing the potential of pre-trained models in computational drug design. Data and code are available at https://github.com/MIALAB-RUC/SableBind .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"55"},"PeriodicalIF":2.9,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11834573/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143439487","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
{"title":"CamITree: a streamlined software for phylogenetic analysis of viral and mitochondrial genomes.","authors":"Peng Sun, Yu Yang, Mengjie Yuan, Qin Tang","doi":"10.1186/s12859-025-06034-2","DOIUrl":"10.1186/s12859-025-06034-2","url":null,"abstract":"<p><strong>Background: </strong>Over the past decade, the continuous and rapid advances in bioinformatics have led to an increasingly common use of molecular sequence comparison for phylogenetic analysis. However, the use of multi-software and cross-platform strategies has increased the complexity of phylogenetic tree estimation. Therefore, the development and application of streamlined phylogenetic analysis tools are growing in significance in the field of biology. Particularly for genomes with relatively short sequences, there is a lack of simple and integrative tools for phylogenetic analysis.</p><p><strong>Results: </strong>In this study, we present CamlTree (Concatenated alignments maximum-likelihood tree), a user-friendly desktop software designed to simplify phylogenetic analysis for viral and mitochondrial genomes, ultimately facilitating related research. CamlTree provides a workflow including gene concatenation (or coalescence), sequence alignment, alignment optimization, and the estimation of phylogenetic trees using both maximum-likelihood (ML) and Bayesian inference (BI) methods. CamlTree was written in TypeScript and developed using the Electron framework. It offers a primarily user-friendly interface based on the React framework.</p><p><strong>Conclusions: </strong>CamlTree software has been released for the Windows OS. It integrates several popular analysis tools to optimize and simplify the process of estimating polygenic phylogenetic trees. The establishment of software can assist researchers in reducing their workload and enhancing data processing efficiency, enabling them to expedite their research progress. The software, along with a detailed user manual, is available at https://github.com/BioCrossCoder/camltree .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"53"},"PeriodicalIF":2.9,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11829546/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143424463","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pia Keukeleire, Jonathan D Rosen, Angelina Göbel-Knapp, Kilian Salomon, Max Schubach, Martin Kircher
{"title":"Using individual barcodes to increase quantification power of massively parallel reporter assays.","authors":"Pia Keukeleire, Jonathan D Rosen, Angelina Göbel-Knapp, Kilian Salomon, Max Schubach, Martin Kircher","doi":"10.1186/s12859-025-06065-9","DOIUrl":"10.1186/s12859-025-06065-9","url":null,"abstract":"<p><strong>Background: </strong>Massively parallel reporter assays (MPRAs) are an experimental technology for measuring the activity of thousands of candidate regulatory sequences or their variants in parallel, where the activity of individual sequences is measured from pools of sequence-tagged reporter genes. Activity is derived from the ratio of transcribed RNA to input DNA counts of associated tag sequences in each reporter construct, so-called barcodes. Recently, tools specifically designed to analyze MPRA data were developed that attempt to model the count data, accounting for its inherent variation. Of these tools, MPRAnalyze and mpralm are most widely used. MPRAnalyze models barcode counts to estimate the transcription rate of each sequence. While it has increased statistical power and robustness against outliers compared to mpralm, it is slow and has a high false discovery rate. Mpralm, a tool built on the R package Limma, estimates log fold-changes between different sequences. As opposed to MPRAnalyze, it is fast and has a low false discovery rate but is susceptible to outliers and has less statistical power.</p><p><strong>Results: </strong>We propose BCalm, an MPRA analysis framework aimed at addressing the limitations of the existing tools. BCalm is an adaptation of mpralm, but models individual barcode counts instead of aggregating counts per sequence. Leaving out the aggregation step increases statistical power and improves robustness to outliers, while being fast and precise. We show the improved performance over existing methods on both simulated MPRA data and a lentiviral MPRA library of 166,508 target sequences, including 82,258 allelic variants. Further, BCalm adds functionality beyond the existing mpralm package, such as preparing count input files from MPRAsnakeflow, as well as an option to test for sequences with enhancing or repressing activity. Its built-in plotting functionalities allow for easy interpretation of the results.</p><p><strong>Conclusions: </strong>With BCalm, we provide a new tool for analyzing MPRA data which is robust and accurate on real MPRA datasets. The package is available at https://github.com/kircherlab/BCalm .</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"52"},"PeriodicalIF":2.9,"publicationDate":"2025-02-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11827149/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143413312","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Diane M Walker, Wendy A Smith, Lia Gale, Jacob T Wolff, Connor P Healy, Hannah F Van Hollebeke, Ashlie Stephenson, Marianne Kim
{"title":"CoMIT: a bioinformatic pipeline for risk-based prediction of COVID-19 test inclusivity.","authors":"Diane M Walker, Wendy A Smith, Lia Gale, Jacob T Wolff, Connor P Healy, Hannah F Van Hollebeke, Ashlie Stephenson, Marianne Kim","doi":"10.1186/s12859-025-06046-y","DOIUrl":"10.1186/s12859-025-06046-y","url":null,"abstract":"<p><strong>Background: </strong>The global Coronavirus Disease 2019 (COVID-19) pandemic highlighted the need to quickly diagnose infections to identify and prevent viral spread in the population. In response to the pandemic, BioFire Defense leveraged its PCR-based \"lab-in-a-pouch\" technology for expedited development of the BioFire® COVID-19 Test, a novel in vitro diagnostic detecting SARS-CoV-2 nucleic acid in human samples. Following clearance of an in vitro diagnostic device, regulatory bodies such as the U.S. Food and Drug Administration (FDA) require regular post market surveillance to monitor test performance against viral lineages circulating in the field, using predictive in silico inclusivity evaluations. Exponential increases in the number of sequences deposited in bioinformatic repositories such as GISAID, during the pandemic, impeded progress in meeting these post market requirements. In response, BioFire Defense developed a new bioinformatic tool to overcome scalability problems and the loss of accuracy encountered with the standard inclusivity method.</p><p><strong>Results: </strong>The Coronavirus Monitoring for Inclusivity Tool (CoMIT) uses the Variant Sorter Algorithm to sidestep multiple sequence alignments, a significant barrier inherent in the standard inclusivity method. The implementation of CoMIT and its Variant Sorter Algorithm are described. Automated summary tables and visualizations from a typical inclusivity evaluation are presented. We report our approach to filter and display relevant information in the pipeline outputs using risk factors tied to test performance.</p><p><strong>Conclusions: </strong>BioFire Defense has developed CoMIT, an automated bioinformatic pipeline for efficient processing and reporting of variant inclusivity from the GISAID EpiCoV™ repository. This tool ensures continuous and comprehensive post market evaluations of BioFire COVID-19 Test performance even from datasets large enough to impede standard inclusivity analyses. CoMIT's low computational space complexity and modular code allow this tool to be generalized for inclusivity monitoring of multianalyte or single analyte tests with complex assay designs and/or highly variable targets. CoMIT's databasing capabilities and metadata handling hold the potential for new investigations to improve readiness for future outbreaks.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"51"},"PeriodicalIF":2.9,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11817761/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143405551","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}