BMC Bioinformatics最新文献_第7页

MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files. MAFcounter：一个计算在MAF文件中k-mers出现次数的有效工具。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-30 DOI: 10.1186/s12859-025-06172-7

Michail Patsakis, Kimonas Provatas, Aris Karatzikos, Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares

{"title":"MAFcounter: an efficient tool for counting the occurrences of k-mers in MAF files.","authors":"Michail Patsakis, Kimonas Provatas, Aris Karatzikos, Charalampos Koilakos, Ioannis Mouratidis, Ilias Georgakopoulos-Soares","doi":"10.1186/s12859-025-06172-7","DOIUrl":"10.1186/s12859-025-06172-7","url":null,"abstract":"Motivation: With the rapid expansion of large-scale biological datasets, DNA and protein sequence alignments have become essential for comparative genomics and proteomics. These alignments facilitate the exploration of sequence similarity patterns, providing valuable insights into sequence conservation, evolutionary relationships and for functional analyses. Typically, sequence alignments are stored in formats such as the Multiple Alignment Format (MAF). Counting k-mer occurrences is a crucial task in many computational biology applications, but currently, there is no algorithm designed for k-mer counting in alignment files.Results: We have developed MAFcounter, the first k-mer counter dedicated to alignment files. MAFcounter is multithreaded, fast, and memory efficient, enabling k-mer counting in DNA and protein sequence alignment files with a wide variety of features for k-mer analysis.Availability: MAFcounter is released under GPL license as a suite of binary C++ applications and is available at: https://github.com/Georgakopoulos-Soares-lab/MAFcounter .","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"142"},"PeriodicalIF":2.9,"publicationDate":"2025-05-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12125892/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144186384","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

RSCUcaller: an R package for analyzing differences in relative synonymous codon usage (RSCU). RSCUcaller：一个用于分析相对同义密码子使用差异的R包。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-29 DOI: 10.1186/s12859-025-06166-5

Mateusz Maździarz, Sebastian Zając, Łukasz Paukszto, Jakub Sawicki

{"title":"RSCUcaller: an R package for analyzing differences in relative synonymous codon usage (RSCU).","authors":"Mateusz Maździarz, Sebastian Zając, Łukasz Paukszto, Jakub Sawicki","doi":"10.1186/s12859-025-06166-5","DOIUrl":"10.1186/s12859-025-06166-5","url":null,"abstract":"Background: Synonymous codon usage bias, a significant factor in gene expression and genome evolution, was extensively studied in genomics and molecular biology. Although the genetic code is universal, significant variations in synonymous codon usage have been observed among and within organisms. This bias was linked to various factors, including gene expression levels, tRNA abundance, protein structure, and environmental adaptation. Relative Synonymous Codon Usage (RSCU), a normalized measure, was used to quantify this bias. By analyzing RSCU values, researchers uncovered patterns and trends related to the underlying mechanisms driving codon usage bias.Results: We present an R package named RSCUcaller designed for the analysis of coding nucleotide sequences at the level of relative synonymous codon usage (RSCU). The package enables both visualization of data and the performance of advanced statistical analyses. RSCUcaller accepts as input a multi-fasta file containing coding sequences (CDS) and an accompanying description table. Alternatively, the user may provide separate fasta files for each sequence along with the corresponding table. The program merges the provided sequences and calculates RSCU values for each. Implemented visualization features include creating heatmaps and dendrograms based on these heatmaps. Furthermore, the package allows for the presentation of data in the form of histograms. The calculated RSCU values can also be used to create matrices that can be subjected to further analysis by the user. RSCUcaller offers the functionality of correlation analysis between any two organisms. Additionally, to compare the frequency of amino acid occurrence between different groups of sequences, statistical tests have been implemented.Conclusions: RSCUcaller enabled comparative RSCU analysis between coding sequences of different organisms or individuals of the same species. It facilitated visualization and statistical analysis among codons and user-defined groups. The RSCUcaller package is available at https://github.com/Mordziarz/RSCUcaller under the GPL-3 license.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"141"},"PeriodicalIF":2.9,"publicationDate":"2025-05-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121200/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144179585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Valsci: an open-source, self-hostable literature review utility for automated large-batch scientific claim verification using large language models. Valsci：一个开源的、自托管的文献回顾工具，用于使用大型语言模型进行自动化的大批量科学声明验证。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-28 DOI: 10.1186/s12859-025-06159-4

Brice Edelman, Jeffrey Skolnick

{"title":"Valsci: an open-source, self-hostable literature review utility for automated large-batch scientific claim verification using large language models.","authors":"Brice Edelman, Jeffrey Skolnick","doi":"10.1186/s12859-025-06159-4","DOIUrl":"10.1186/s12859-025-06159-4","url":null,"abstract":"Background: The exponential growth of scientific publications poses a formidable challenge for researchers seeking to validate emerging hypotheses or synthesize existing evidence. In this paper, we introduce Valsci, an open-source, self-hostable utility that automates large-batch scientific claim verification using any OpenAI-compatible large language model. Valsci unites retrieval-augmented generation with structured bibliometric scoring and chain-of-thought prompting, enabling users to efficiently search, evaluate, and summarize evidence from the Semantic Scholar database and other academic sources. Unlike conventional standalone LLMs, which often suffer from hallucinations and unreliable citations, Valsci grounds its analyses in verifiable published findings. A guided prompt-flow approach is employed to generate query expansions, retrieve relevant excerpts, and synthesize coherent, evidence-based reports.Results: Preliminary evaluations across claims from the SciFact benchmark dataset reveal that Valsci significantly outperforms base GPT-4o outputs in citation hallucination rate while maintaining a low misclassification rate. The system is highly scalable, processing hundreds of claims per hour through asynchronous parallelization.Conclusions: By providing an open and transparent platform for large-batch literature verification, Valsci substantially lowers the barrier to comprehensive evidence-based reviews and fosters a more reproducible research ecosystem.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"140"},"PeriodicalIF":2.9,"publicationDate":"2025-05-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12121171/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144172432","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Utilizing RNA-seq data in monotone iterative generalized linear model to elevate prior knowledge quality of the circRNA-miRNA-mRNA regulatory axis. 利用单调迭代广义线性模型中的RNA-seq数据提高circRNA-miRNA-mRNA调控轴的先验知识质量。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-27 DOI: 10.1186/s12859-025-06161-w

Alikhan Anuarbekov, Jiří Kléma

{"title":"Utilizing RNA-seq data in monotone iterative generalized linear model to elevate prior knowledge quality of the circRNA-miRNA-mRNA regulatory axis.","authors":"Alikhan Anuarbekov, Jiří Kléma","doi":"10.1186/s12859-025-06161-w","DOIUrl":"10.1186/s12859-025-06161-w","url":null,"abstract":"Background: Current experimental data on RNA interactions remain limited, particularly for non-coding RNAs, many of which have only recently been discovered and operate within complex regulatory networks. Researchers often rely on in-silico interaction detection algorithms, such as TargetScan, which are based on biochemical sequence alignment. However, these algorithms have limited performance. RNA-seq expression data can provide valuable insights into regulatory networks, especially for understudied interactions such as circRNA-miRNA-mRNA. By integrating RNA-seq data with prior interaction networks obtained experimentally or through in-silico predictions, researchers can discover novel interactions, validate existing ones, and improve interaction prediction accuracy.Results: This paper introduces Pi-GMIFS, an extension of the generalized monotone incremental forward stagewise (GMIFS) regression algorithm that incorporates prior knowledge. The algorithm first estimates prior response values through a prior-only regression, interpolates between these prior values and the original data, and then applies the GMIFS method. Our experimental results on circRNA-miRNA-mRNA regulatory interaction networks demonstrate that Pi-GMIFS consistently enhances precision and recall in RNA interaction prediction by leveraging implicit information from bulk RNA-seq expression data, outperforming the initial prior knowledge.Conclusion: Pi-GMIFS is a robust algorithm for inferring acyclic interaction networks when the variable ordering is known. Its effectiveness was confirmed through extensive experimental validation. We proved that RNA-seq data of a representative size help infer previously unknown interactions available in TarBase v9 and improve the quality of circRNA disease annotation.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"139"},"PeriodicalIF":2.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117772/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144156305","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

FunFEA: an R package for fungal functional enrichment analysis. FunFEA：用于真菌功能富集分析的R包。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-27 DOI: 10.1186/s12859-025-06164-7

Julien Charest, Paul Loebenstein, Robert L Mach, Astrid R Mach-Aigner

{"title":"FunFEA: an R package for fungal functional enrichment analysis.","authors":"Julien Charest, Paul Loebenstein, Robert L Mach, Astrid R Mach-Aigner","doi":"10.1186/s12859-025-06164-7","DOIUrl":"10.1186/s12859-025-06164-7","url":null,"abstract":"Background: The functional annotation of fungal genomes is critical for understanding their biological processes and ecological roles. While existing tools support functional enrichment analysis from publicly available annotations of well-established model organisms, few are tailored to the specific needs of the fungal research community. Furthermore, many tools struggle with processing functional annotations of novel species, for which no publicly available functional annotations are yet available.Results: FunFEA is an R package designed for functional enrichment analysis of fungal genomes. It supports COG/KOG (Clusters of Orthologous Genes), GO (Gene Ontology), and KEGG (Kyoto Encyclopedia of Genes and Genomes) annotations, and generates background frequency models from publicly available annotations for overrepresentation analysis, within a set of experimentally defined genes or proteins. Additionally, FunFEA can process eggNOG-mapper annotations, thus enabling functional enrichment analysis of novel genomes. The package offers a suite of tools for generation of background frequency models, functional enrichment analysis, as well as visualization of enriched functional categories. On release, the package includes precomputed models for 65 commonly used fungal strains in academic research and strains listed on the WHO fungal priority pathogens list.Conclusions: FunFEA fills a critical need for a specialized tool in fungal genomics, providing valuable insights into fungal biology. Additionally, its ability to process eggNOG-mapper annotations makes it an essential resource for researchers, helping to drive further exploration of fungal functional diversity and pathways and derive biological insights from novel genomes.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"138"},"PeriodicalIF":2.9,"publicationDate":"2025-05-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12117765/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144155924","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

DRaCOon: a novel algorithm for pathway-level differential co-expression analysis in transcriptomics. DRaCOon：转录组学中通路水平差异共表达分析的新算法。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-26 DOI: 10.1186/s12859-025-06162-9

Fernando M Delgado-Chaves, Ferdinand Spurny, Tanja Laske, Mhaned Oubounyt, Jan Baumbach

{"title":"DRaCOon: a novel algorithm for pathway-level differential co-expression analysis in transcriptomics.","authors":"Fernando M Delgado-Chaves, Ferdinand Spurny, Tanja Laske, Mhaned Oubounyt, Jan Baumbach","doi":"10.1186/s12859-025-06162-9","DOIUrl":"10.1186/s12859-025-06162-9","url":null,"abstract":"Understanding the molecular mechanisms underlying diseases is crucial for more precise, personalized medicine. Pathway-level differential co-expression analysis, a powerful approach for transcriptomics, identifies condition-specific changes in gene-gene interaction networks, offering targeted insights. However, a key challenge is the lack of robust methods and benchmarks specifically for evaluating algorithms' ability to identify disrupted gene-gene associations across conditions. We introduce DRaCOoN (Differential Regulatory and Co-expression Networks), a Python package and web tool for pathway-level differential co-expression analysis. DRaCOoN uniquely integrates multiple association and differential metrics, with a novel, computationally efficient permutation test for significance assessment. Crucially, DRaCOoN also provides a benchmarking framework for comprehensive method evaluation. Extensive benchmarking on simulated data and three real-world datasets (bone healing, colorectal cancer, and head/neck carcinoma) showed that DRaCOoN, particularly with an entropy-based association measure and the s differential metric, consistently outperforms eight other methods. It remains highly accurate in balanced datasets, robust to varying gene perturbation levels, and identifies biologically relevant regulatory changes. Furthermore, DRaCOoN serves as both a powerful tool and a benchmarking framework for elucidating disease mechanisms from transcriptomics data, advancing precision medicine by uncovering critical gene regulatory alterations.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"137"},"PeriodicalIF":2.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107744/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148959","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

TaxaCal: enhancing species-level profiling accuracy of 16S amplicon data. TaxaCal：提高16S扩增子数据的种级分析精度。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-26 DOI: 10.1186/s12859-025-06156-7

Qingrong Shen, Xiaoqian Fan, Yangyang Sun, Hao Gao, Xiaoquan Su

{"title":"TaxaCal: enhancing species-level profiling accuracy of 16S amplicon data.","authors":"Qingrong Shen, Xiaoqian Fan, Yangyang Sun, Hao Gao, Xiaoquan Su","doi":"10.1186/s12859-025-06156-7","DOIUrl":"10.1186/s12859-025-06156-7","url":null,"abstract":"Background: 16S rRNA amplicon sequencing is a widely used method for microbiome composition analysis due to its cost-effectiveness and lower data requirements compared to metagenomic whole-genome sequencing (WGS). However, inherent limitations in 16S-based approach often lead to profiling discrepancies, particularly at the species level, compromising the accuracy and reliability of findings.Results: To address this issue, we present TaxaCal (Taxonomic Calibrator), a machine learning algorithm designed to calibrate species-level taxonomy profiles in 16S amplicon data using a two-tier correction strategy. Validation on in-house produced and public datasets shows that TaxaCal effectively reduces biases in amplicon sequencing, mitigating discrepancies between microbial profiles derived from 16S and WGS. Moreover, TaxaCal enables seamless cross-platform comparisons between these two sequencing approaches, significantly improving disease detection in 16S-based microbiome data.Conclusions: Therefore, TaxaCal offers a cost-effective solution for generating high-resolution microbiome species profiles that closely align with WGS results, enhancing the utility of 16S-based profiling in microbiome research. As microbiome-based diagnostics continue to evolve, TaxaCal has the potential to be a crucial tool in advancing the utility of 16S sequencing in clinical and research settings.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"136"},"PeriodicalIF":2.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12107961/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144149033","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

CRISP: correlation-refined image segmentation process. CRISP：相关细化的图像分割过程。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-26 DOI: 10.1186/s12859-025-06150-z

Jennifer K Briggs, Erli Jin, Matthew J Merrins, Richard K P Benninger

{"title":"CRISP: correlation-refined image segmentation process.","authors":"Jennifer K Briggs, Erli Jin, Matthew J Merrins, Richard K P Benninger","doi":"10.1186/s12859-025-06150-z","DOIUrl":"10.1186/s12859-025-06150-z","url":null,"abstract":"Background: Calcium imaging enables real-time recording of cellular activity across various biological contexts. To assess the activity of individual cells, researchers must segment images into the individual cells. While intensity-based threshold algorithms allow for automatic image segmentation in sparsely packed tissues, they perform poorly in densely packed organs such as cardiomyocytes or the pancreatic islet. To study these tissues, investigators typically manually outline the cells based on visual inspection. This manual cell masking introduces potential user error. To address this error, we developed the Correlation-Refined Image Segmentation Process (CRISP). CRISP utilizes interpixel correlations to refine user drawn cell masks (cell mask refinement) or automatically masks cells by identifying the largest circle that captures only pixels within the cell (semi-minor axis identification).Results: CRISP cell mask refinement had an area under the receiver operating curve of 0.835, indicating good model performance on the training data set. CRISP had 77% accuracy when testing on a separate data set, which came from a different mouse model imaged with a different microscope than the training data set. CRISP cell mask refinement significantly improved the accuracy of functional network analysis compared to non-CRISP refined cell masks. CRISP automated semi-minor axis identification had an area under the receiver operating curve under the curve of 0.989, indicating strong model performance.Conclusions: Inaccurate cell masking can result in inaccurate scientific interpretations of calcium images. Utilizing interpixel correlations, we developed two transparent algorithms that can be used for image segmentation in densely packed tissues. These algorithms may allow for more accurate and reproducible cell masking.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"135"},"PeriodicalIF":2.9,"publicationDate":"2025-05-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12105354/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144148956","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Gene2role: a role-based gene embedding method for comparative analysis of signed gene regulatory networks. Gene2role：基于角色的基因嵌入方法，用于标记基因调控网络的比较分析。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-24 DOI: 10.1186/s12859-025-06128-x

Xin Zeng, Shu Liu, Bowen Liu, Weihang Zhang, Wanzhe Xu, Fujio Toriumi, Kenta Nakai

{"title":"Gene2role: a role-based gene embedding method for comparative analysis of signed gene regulatory networks.","authors":"Xin Zeng, Shu Liu, Bowen Liu, Weihang Zhang, Wanzhe Xu, Fujio Toriumi, Kenta Nakai","doi":"10.1186/s12859-025-06128-x","DOIUrl":"10.1186/s12859-025-06128-x","url":null,"abstract":"Background: Understanding the dynamics of gene regulatory networks (GRNs) across various cellular states is crucial for deciphering the underlying mechanisms governing cell behavior and functionality. However, current comparative analytical methods, which often focus on simple topological information such as the degree of genes, are limited in their ability to fully capture the similarities and differences among the complex GRNs.Results: We present Gene2role, a gene embedding approach that leverages multi-hop topological information from genes within signed GRNs. Initially, we demonstrated the effectiveness of Gene2role in capturing the intricate topological nuances of genes using GRNs inferred from four distinct data sources. Then, applying Gene2role to integrated GRNs allowed us to identify genes with significant topological changes across cell types or states, offering a fresh perspective beyond traditional differential gene expression analyses. Additionally, we quantified the stability of gene modules between two cellular states by measuring the changes in the gene embeddings within these modules.Conclusions: Our method augments the existing toolkit for probing the dynamic regulatory landscape, thereby opening new avenues for understanding gene behavior and interaction patterns across cellular transitions.","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"134"},"PeriodicalIF":2.9,"publicationDate":"2025-05-24","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12103023/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144141225","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

KF-NIPT: K-mer and fetal fraction-based estimation of chromosomal anomaly from NIPT data. KF-NIPT：从NIPT数据中基于K-mer和胎儿分数的染色体异常估计。

IF 2.9 3区生物学

BMC Bioinformatics Pub Date : 2025-05-22 DOI: 10.1186/s12859-025-06127-y

Dongin Kim, Ji Yeon Sohn, Jin Hee Cho, Ji-Hye Choi, Gwi-Young Oh, Hyun Goo Woo

{"title":"KF-NIPT: K-mer and fetal fraction-based estimation of chromosomal anomaly from NIPT data.","authors":"Dongin Kim, Ji Yeon Sohn, Jin Hee Cho, Ji-Hye Choi, Gwi-Young Oh, Hyun Goo Woo","doi":"10.1186/s12859-025-06127-y","DOIUrl":"10.1186/s12859-025-06127-y","url":null,"abstract":"Background: Non-Invasive Prenatal Testing (NIPT) is a technique that allows pregnant women to screen for chromosomal abnormalities in their developing fetus without the need for invasive procedures like amniocentesis or chorionic villus sampling. However, current methods to detect anomaly from maternal cell-free DNAs (cfDNAs) that are based on the sequence read counts calculating z-scores face challenges with false positives and negatives. To address these challenges, we aimed to develop a novel NIPT algorithm named KF-NIPT, which is derived from the initials of k-mer and fetal fraction used in its development with the goal of significantly improving accuracy.Results: We developed a KF-NIPT, a new algorithm that estimate chromosomal anomaly by calculating K-mer-based sequence depth and fetal fraction from the whole genome sequencing (WGS) data. Moreover, we implemented a modified preprocessing pipeline for the WGS data, correcting the biases of the genomic mapping quality and the GC contents. The performance of our method was evaluated using publicly available NIPT data. We could demonstrate that our method has better accuracy and sensitivity compared to those of the previous methods.Conclusions: We found that using k-mer and fetal fraction reduces errors in NIPT and have integrated this into a pipeline, showing that the traditional read count-based z-score method can be improved. KF-NIPT is implemented in the R and Python environment. The source code is available at https://github.com/eastbrain/KF-NIPT . KF-NIPT has been tested on Ubuntu Linux-64 server and Linux-64 on Windows using a WSL (Windows Subsystem for Linux).","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"133"},"PeriodicalIF":2.9,"publicationDate":"2025-05-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12100778/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144126570","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0