Bioinformatics advances最新文献

筛选
英文 中文
A unified hypothesis-free feature extraction framework for diverse epigenomic data. 不同表观基因组数据的统一无假设特征提取框架。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-08 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf013
Ali Tuğrul Balcı, Maria Chikina
{"title":"A unified hypothesis-free feature extraction framework for diverse epigenomic data.","authors":"Ali Tuğrul Balcı, Maria Chikina","doi":"10.1093/bioadv/vbaf013","DOIUrl":"10.1093/bioadv/vbaf013","url":null,"abstract":"<p><strong>Motivation: </strong>Epigenetic assays using next-generation sequencing have furthered our understanding of the functional genomic regions and the mechanisms of gene regulation. However, a single assay produces billions of data points, with limited information about the biological process due to numerous sources of technical and biological noise. To draw biological conclusions, numerous specialized algorithms have been proposed to summarize the data into higher-order patterns, such as peak calling and the discovery of differentially methylated regions. The key principle underlying these approaches is the search for locally consistent patterns.</p><p><strong>Results: </strong>We propose <math> <mrow> <mrow> <msub><mrow><mi>L</mi></mrow> <mn>0</mn></msub> </mrow> </mrow> </math> segmentation as a universal framework for extracting locally coherent signals for diverse epigenetic sources. <math> <mrow> <mrow> <msub><mrow><mi>L</mi></mrow> <mn>0</mn></msub> </mrow> </mrow> </math> serves to compress the input signal by approximating it as a piecewise constant. We implement a highly scalable <math> <mrow> <mrow> <msub><mrow><mi>L</mi></mrow> <mn>0</mn></msub> </mrow> </mrow> </math> segmentation with additional loss functions designed for sequencing epigenetic data types including Poisson loss for single tracks and binomial loss for methylation/coverage data. We show that the <math> <mrow> <mrow> <msub><mrow><mi>L</mi></mrow> <mn>0</mn></msub> </mrow> </mrow> </math> segmentation approach retains the salient features of the data yet can identify subtle features, such as transcription end sites, missed by other analytic approaches.</p><p><strong>Availability and implementation: </strong>Our approach is implemented as an R package \"l01segmentation\" with a C++ backend. Available at https://github.com/boooooogey/l01segmentation.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf013"},"PeriodicalIF":2.4,"publicationDate":"2025-03-08","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11897706/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143617585","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Assessing genome conservation on pangenome graphs with PanSel. 利用PanSel评估泛基因组图谱上的基因组保守性。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-05 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf018
Matthias Zytnicki
{"title":"Assessing genome conservation on pangenome graphs with PanSel.","authors":"Matthias Zytnicki","doi":"10.1093/bioadv/vbaf018","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf018","url":null,"abstract":"<p><strong>Motivation: </strong>With more and more telomere-to-telomere genomes assembled, pangenomes make it possible to capture the genomic diversity of a species. Because they introduce less biases, pangenomes, represented as graphs, tend to supplant the usual linear representation of a reference genome, augmented with variations. However, this major change requires new tools adapted to this data structure. Among the numerous questions that can be addressed to a pangenome graph is the search for conserved or divergent genes.</p><p><strong>Results: </strong>In this article, we present a new tool, named PanSel, which computes a conservation score for each segment of the genome, and finds genomic regions that are significantly conserved, or divergent. PanSel can be used on prokaryotes and eukaryotes, with a sequence identity not less than 98%.</p><p><strong>Availability and implementation: </strong>PanSel, written in C++11 with no dependency, is available at https://github.com/mzytnicki/pansel.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf018"},"PeriodicalIF":2.4,"publicationDate":"2025-03-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908644/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652376","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Welly: a web-tool for visualizing growth curves from microplate data. Welly:从微孔板数据中可视化生长曲线的网络工具。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-04 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf038
Felix Meier, Tom Williams, Ian Paulsen
{"title":"Welly: a web-tool for visualizing growth curves from microplate data.","authors":"Felix Meier, Tom Williams, Ian Paulsen","doi":"10.1093/bioadv/vbaf038","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf038","url":null,"abstract":"<p><strong>Summary: </strong>Welly is a web-based tool designed to simplify the visualization and analysis of growth curves from 96- and 384-well plates, addressing the limitations of existing commercial and coding-based solutions. Users can upload plate reader data in CSV or Excel format, easily select sample names and replicates and Welly generates interactive growth curves displaying the mean and standard deviation of triplicates. Additional features include heat map visualizations of maximum values, and downloadable interactive graphs of publication-quality figures and statistics files containing area under curve and max growth rate value of replicates.</p><p><strong>Availability and implementation: </strong>Welly is freely available at https://synbioexplorer.pythonanywhere.com, providing an easy-to-use interface accessible to all. All the code is publicly available at the github repository https://github.com/SynBioExplorer/Welly under the MIT license. The website will remain freely accessible for at least 2 years post publication, likely longer.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf038"},"PeriodicalIF":2.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908640/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A twin-tower model using MRI and gene for prediction on brain tumor patients' response to therapy. 利用MRI和基因预测脑肿瘤患者治疗反应的双塔模型。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-04 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf041
Qiyuan Lyu, Fumie Costen
{"title":"A twin-tower model using MRI and gene for prediction on brain tumor patients' response to therapy.","authors":"Qiyuan Lyu, Fumie Costen","doi":"10.1093/bioadv/vbaf041","DOIUrl":"10.1093/bioadv/vbaf041","url":null,"abstract":"<p><strong>Motivation: </strong>Glioma is the most prevalent and aggressive primary brain tumor, with a poor prognosis of patients and a high mortality rate. Standard treatment of surgery, radiation, and chemotherapy may not be effective for some patients as they suffer from a stable progression of disease after treatment. Hence, it is crucial to predict the patient's response to therapy as a guide for the treatment plan. In this paper, we propose a multimodal model based on both magnetic resonance imaging and genomic data. As the dataset has a majority of single-modality samples with a few ratios of multi-modality samples, we propose a twin-tower architecture to solve the unimodal dominance issue and fully use the single-modality data.</p><p><strong>Results: </strong>The proposed architecture comprises an image encoder and a gene encoder trained on the single-modality samples for feature extraction, along with a classification head trained on multi-modality samples. In this way, all the single-modality samples can be beneficial to the whole model, and the need for the multi-modality is diminished. The proposed model outperforms the comparison methods across all metrics, achieving an accuracy of 85% on the cross-validation. The ablation experiment comparing the proposed architecture with single-modality models reflects the effectiveness of the proposed twin-tower architecture.</p><p><strong>Availability and implementation: </strong>The proposed model exhibits excellent scalability and can accommodate the integration of additional modalities without the requirement of too many multi-modality samples.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf041"},"PeriodicalIF":2.4,"publicationDate":"2025-03-04","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12070387/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144036698","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PIPLOM: prediction of exogenous peptide loading on major histocompatibility complex class I molecules. PIPLOM:预测外源肽在主要组织相容性复合体I类分子上的负载。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-03 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf037
Florian Schmidt, Kanxing Wu, Lorenz Gerber, Florence Chioh Wen Jing, Daniel Pedrosa, Glenn Wong Choon Lim, Melissa Wirawan, Christine Eng, Katja Fink, Daniel T MacLeod, Michael Fehlings, Andreas Wilm
{"title":"PIPLOM: prediction of exogenous peptide loading on major histocompatibility complex class I molecules.","authors":"Florian Schmidt, Kanxing Wu, Lorenz Gerber, Florence Chioh Wen Jing, Daniel Pedrosa, Glenn Wong Choon Lim, Melissa Wirawan, Christine Eng, Katja Fink, Daniel T MacLeod, Michael Fehlings, Andreas Wilm","doi":"10.1093/bioadv/vbaf037","DOIUrl":"10.1093/bioadv/vbaf037","url":null,"abstract":"<p><strong>Summary: </strong>The exogenous, i.e. <i>in vitro</i>, loading of peptides onto major histocompatibility complex (MHC) class I molecules is a key step in many immunology-related experimental workflows. Here, we provide a machine learning solution, PIPLOM, which is specifically tailored to predict whether peptides can be loaded exogenously onto an MHC class I molecule. Benchmarking on 38 unseen epitopes with in-house ELISA (enzyme-linked immunosorbent assay) experiments showed that PIPLOM is outperforming well-established methods such as NETMHCpan-4.0 or MHCflurry, which are commonly used for the related task of predicting epitope HLA (human leukocyte antigen) haplotype specificity.</p><p><strong>Availability and implementation: </strong>Source code and data are available as Zenodo package 10.5281/zenodo.13771214. PIPLOM is available as a web service at https://piplom.immunoscape.com/.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf037"},"PeriodicalIF":2.4,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11904885/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143626884","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Pandora: a tool to estimate dimensionality reduction stability of genotype data. 一个估计基因型数据降维稳定性的工具。
IF 2.4
Bioinformatics advances Pub Date : 2025-03-03 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf040
Julia Haag, Alexander I Jordan, Alexandros Stamatakis
{"title":"Pandora: a tool to estimate dimensionality reduction stability of genotype data.","authors":"Julia Haag, Alexander I Jordan, Alexandros Stamatakis","doi":"10.1093/bioadv/vbaf040","DOIUrl":"10.1093/bioadv/vbaf040","url":null,"abstract":"<p><strong>Motivation: </strong>Genotype datasets typically contain a large number of single-nucleotide polymorphisms for a comparatively small number of individuals. To identify similarities between individuals and to infer an individual's origin or membership to a population, dimensionality reduction techniques are routinely deployed. However, inherent (technical) difficulties such as missing or noisy data need to be accounted for when analyzing a lower dimensional representation of genotype data, and the intrinsic uncertainty of such analyses should be reported in all studies. However, to date, there exists no stability assessment technique for genotype data that can estimate this uncertainty.</p><p><strong>Results: </strong>Here, we present Pandora, a stability estimation framework for genotype data based on bootstrapping. Pandora computes an overall score to quantify the stability of the entire embedding, infers per-individual support values, and also deploys a <math><mi>k</mi></math> -means clustering approach to assess the uncertainty of assignments to potential cultural groups. Using published empirical and simulated datasets, we demonstrate the usage and utility of Pandora for studies that rely on dimensionality reduction techniques.</p><p><strong>Availability and implementation: </strong>Pandora is available on GitHub: https://github.com/tschuelia/Pandora.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf040"},"PeriodicalIF":2.4,"publicationDate":"2025-03-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11955236/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143756191","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
scLTNN: an innovative tool for automatically visualizing single-cell trajectories. scLTNN:用于自动可视化单细胞轨迹的创新工具。
IF 2.4
Bioinformatics advances Pub Date : 2025-02-26 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf033
Cencan Xing, Zehua Zeng, Lei Hu, Jianing Kang, Shah Roshan, Yuanyan Xiong, Hongwu Du, Tongbiao Zhao
{"title":"scLTNN: an innovative tool for automatically visualizing single-cell trajectories.","authors":"Cencan Xing, Zehua Zeng, Lei Hu, Jianing Kang, Shah Roshan, Yuanyan Xiong, Hongwu Du, Tongbiao Zhao","doi":"10.1093/bioadv/vbaf033","DOIUrl":"10.1093/bioadv/vbaf033","url":null,"abstract":"<p><strong>Motivation: </strong>Cellular state identification and trajectory inference enable the computational simulation of cell fate dynamics using single-cell RNA sequencing data. However, existing methods for constructing cell fate trajectories demand substantial computational resources or prior knowledge of the developmental process.</p><p><strong>Results: </strong>Here, based on the discovery of the consistent expression distribution of highly variable genes, we create a new tool named scRNA-seq latent time neural network (scLTNN) by combining an artificial neural network with a distribution model. This innovative tool is pre-trained and capable of automatically inferring the origin and terminal state of cells, and accurately illustrating the developmental trajectory of cells with minimal use of computational resources and time. We implement scLTNN on human bone marrow cells, mouse pancreatic endocrine lineage, and axial mesoderm lineage of zebrafish embryo, accurately reconstructing their cell fate trajectories, respectively. Our scLTNN tool provides a straightforward and efficient method for illustrating cell fate trajectories, applicable across various species without the need for prior knowledge of the biological process.</p><p><strong>Availability and implementation: </strong>https://github.com/Starlitnightly/scLTNN.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf033"},"PeriodicalIF":2.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889453/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143588436","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing gene selection and module identification via ontology-based scoring and deep learning. 通过基于本体的评分和深度学习优化基因选择和模块识别。
IF 2.4
Bioinformatics advances Pub Date : 2025-02-26 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf034
Boutaina Ettetuani, Rajaa Chahboune, Ahmed Moussa
{"title":"Optimizing gene selection and module identification via ontology-based scoring and deep learning.","authors":"Boutaina Ettetuani, Rajaa Chahboune, Ahmed Moussa","doi":"10.1093/bioadv/vbaf034","DOIUrl":"10.1093/bioadv/vbaf034","url":null,"abstract":"<p><strong>Motivation: </strong>Understanding gene interactions and their biological significance is a key challenge in computational biology. The complexity of biological systems, coupled with high-dimensional omics data, necessitates robust methods for gene selection and interaction analysis. Traditional statistical techniques often struggle with the hierarchical nature of gene ontology (GO) terms, leading to redundancy and limited interpretability. Meanwhile, deep learning models require biologically meaningful input to enhance their predictive power.</p><p><strong>Results: </strong>We present an integrated framework that enhances gene selection and uncovers gene interactions by combining a novel statistical algorithm with a deep neural network model. The statistical algorithm ranks differentially expressed genes by correlating their expression scores with the semantic similarity of their biological context, utilizing GO information to align genes with known pathways. The deep neural network then identifies interaction modules by integrating genes from different clusters based on regulatory pathway data. This model effectively navigates the hierarchical complexity of GO terms structured as directed acyclic graphs, employing a feed-forward architecture optimized via back-propagation. Our results demonstrate improved gene selection accuracy and enhanced discovery of biologically relevant interactions, providing valuable insights into complex disease mechanisms.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf034"},"PeriodicalIF":2.4,"publicationDate":"2025-02-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12073971/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144053693","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance. 一种新的集合学习方法分层抽样混合法优化了传统混合法,提高了预测性能。
IF 2.4
Bioinformatics advances Pub Date : 2025-02-22 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf002
Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu
{"title":"A new ensemble learning method stratified sampling blending optimizes conventional blending and improves prediction performance.","authors":"Na Miao, Mengke Yang, Pingping Han, Jiakun Qiao, Zhaoxuan Che, Fangjun Xu, Xiangyu Dai, Mengjin Zhu","doi":"10.1093/bioadv/vbaf002","DOIUrl":"https://doi.org/10.1093/bioadv/vbaf002","url":null,"abstract":"<p><strong>Motivation: </strong>Ensemble learning, as a powerful machine learning method, improves overall prediction performance by combining the prediction results of multiple base models. Blending, as a popular ensemble learning method, can train multiple base models, input the resulting prediction results to further train meta model and obtain final prediction results. However, conventional blending divides the training set by simple random sampling, which causes bias and large variance, thus affecting the stability and accuracy of prediction performance. In this study, we propose a new algorithm of stratified sampling blending (ssBlending), which addresses the algorithm instability of conventional blending caused by the random partition of the training set, further improving the prediction accuracy.</p><p><strong>Results: </strong>We used multiple genotype data sets from different species including animal (pig), plant (loblolly pine), and microorganism (yeast) to test the prediction performance of ssBlending. The across-species multi-dataset verification results reveal that ssBlending is superior to conventional blending in terms of prediction accuracy and stability. In addition, we optimized the training set sampling rate (BestH) to facilitate the practical application of the ssBlending algorithm. In summary, this study proposes a completely new algorithm combing stratification strategy with the conventional blending, which provides more options for ensemble learning in various fields.</p><p><strong>Availability and implementation: </strong>https://figshare.com/s/23122a18dc8a35f12ff6.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf002"},"PeriodicalIF":2.4,"publicationDate":"2025-02-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11908643/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143652375","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis. 在统计和预测分析缺失值的情况下,增强数据完整性。
IF 2.4
Bioinformatics advances Pub Date : 2025-02-21 eCollection Date: 2025-01-01 DOI: 10.1093/bioadv/vbaf035
Laila Mousafi Alasal, Emma U Hammarlund, Kenneth J Pienta, Lars Rönnstrand, Julhash U Kazi
{"title":"XeroGraph: enhancing data integrity in the presence of missing values with statistical and predictive analysis.","authors":"Laila Mousafi Alasal, Emma U Hammarlund, Kenneth J Pienta, Lars Rönnstrand, Julhash U Kazi","doi":"10.1093/bioadv/vbaf035","DOIUrl":"10.1093/bioadv/vbaf035","url":null,"abstract":"<p><strong>Motivation: </strong>Missing data present a pervasive challenge in data analysis, potentially biasing outcomes and undermining conclusions if not addressed properly. Missing data are commonly classified into Missing Completely at Random (MCAR), Missing at Random (MAR), and Missing Not at Random (MNAR). While MCAR poses a minimal risk of data distortion, both MAR and MNAR can seriously affect the results of subsequent analyses. Therefore, it is important to know the type of missing data and appropriately handle them.</p><p><strong>Results: </strong>To facilitate efficient handling of missing data, we introduce a Python package named XeroGraph that is designed to evaluate data quality, categorize the nature of missingness, and guide imputation decisions. By comparing how various imputation methods influence underlying distributions, XeroGraph provides a systematic framework that supports more accurate and transparent analyses. Through its comprehensive preliminary assessments and user-friendly interface, this package facilitates the selection of optimal strategies tailored to the specific missing data mechanisms present in a dataset. In doing so, XeroGraph may significantly improve the validity and reproducibility of research findings, making it a valuable tool for professionals in data-intensive fields.</p><p><strong>Availability and implementation: </strong>XeroGraph is compatible with all operating systems and requires Python version 3.9 or higher. It can be freely downloaded from PyPI (https://pypi.org/project/XeroGraph). The source code is accessible on GitHub (https://github.com/kazilab/XeroGraph), and comprehensive documentation is available at Read the Docs (https://xerograph.readthedocs.io). This software is distributed under the Apache License 2.0.</p>","PeriodicalId":72368,"journal":{"name":"Bioinformatics advances","volume":"5 1","pages":"vbaf035"},"PeriodicalIF":2.4,"publicationDate":"2025-02-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11889451/pdf/","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143588440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"OA","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信