Bioinformatics (Oxford, England)最新文献

筛选
英文 中文
Securing diagonal integration of multimodal single-cell data against ambiguous mapping. 确保多模态单单元数据的对角集成,避免歧义映射。
Bioinformatics (Oxford, England) Pub Date : 2025-06-14 DOI: 10.1093/bioinformatics/btaf345
Han Zhou, Kai Cao, Yang Young Lu
{"title":"Securing diagonal integration of multimodal single-cell data against ambiguous mapping.","authors":"Han Zhou, Kai Cao, Yang Young Lu","doi":"10.1093/bioinformatics/btaf345","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf345","url":null,"abstract":"<p><strong>Motivation: </strong>Recent advances in single-cell multimodal omics technologies enable the exploration of cellular systems at unprecedented resolution, leading to the rapid generation of multimodal datasets that require sophisticated integration methods. Diagonal integration has emerged as a flexible solution for integrating heterogeneous single-cell data without relying on shared cells or features. However, the absence of anchoring elements introduces the risk of artificial integrations, where cells across modalities are incorrectly aligned due to ambiguous mapping.</p><p><strong>Results: </strong>To address this challenge, we propose SONATA, a novel diagnostic method designed to detect potential artificial integrations resulting from ambiguous mappings in diagonal data integration. SONATA identifies ambiguous alignments by quantifying cell-cell ambiguity within the data manifold, ensuring that biologically meaningful integrations are distinguished from spurious ones. It is worth noting that SONATA is not designed to replace any existing pipelines for diagonal data integration; instead, SONATA works simply as an add-on to an existing pipeline for achieving more reliable integration. Through a comprehensive evaluation on both simulated and real multimodal single-cell datasets, we observe that artificial integrations in diagonal data integration are widespread yet surprisingly overlooked, occurring across all mainstream diagonal integration methods. We demonstrate SONATA's ability to safeguard against misleading integrations and provide actionable insights into potential integration failures across mainstream methods. Our approach offers a robust framework for ensuring the reliability and interpretability of multimodal single-cell data integration.</p><p><strong>Availability and implementation: </strong>The source code is available at (https://github.com/batmen-lab/SONATA).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144295502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Differentiable Graph Clustering with Structural Grouping for Single-cell RNA-seq Data. 单细胞RNA-seq数据的结构分组可微图聚类。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf347
Xiaoqiang Yan, Shike Du, Quan Zou, Zhen Tian
{"title":"Differentiable Graph Clustering with Structural Grouping for Single-cell RNA-seq Data.","authors":"Xiaoqiang Yan, Shike Du, Quan Zou, Zhen Tian","doi":"10.1093/bioinformatics/btaf347","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf347","url":null,"abstract":"<p><strong>Motivation: </strong>Clustering cells into subpopulations is one of the most crucial tasks in single-cell RNA sequencing (scRNA-seq) data analysis, which provides support for biological research at cellular level. With the development of graph neural networks, deep graph clustering approaches have achieved excellent performance by modelling the topological relationships between cells. However, existing approaches rely on cell node and its neighbors to obtain the cell feature representation, which ignore the graph cluster structure hidden in scRNA-seq data. Besides, how to bridge the heterogeneous gap between cell node feature and its structural information remains a highly challenging problem.</p><p><strong>Results: </strong>Here, we propose a novel differentiable graph clustering with structural grouping (DGCSG) for scRNA-seq data, which incorporates graph cluster information into deep graph clustering model by designing a differentiable clustering mechanism to learn clustering-friendly representation. Firstly, an interactive module is devised to dynamically transfer node representations learned by autoencoder (AE) to graph attention autoencoder (GATE) in layer-by-layer manner. Then, to characterize graph cluster information, a differentiable clustering mechanism is proposed to transform K-way normalized cuts from a discrete optimization problem into differentiable learning objective through spectral relaxation, which jointly optimizes the graph attention autoencoder by allocating more attention scores to nodes in the same graph cluster. Finally, a decoupled self-supervised optimization is proposed, which guides the representation learning of AE and GATE in the interactive module. Extensive evaluations on 14 scRNA-seq benchmarks verify the superiority of DGCSG compared with state-of-the-art baselines.</p><p><strong>Availability: </strong>https://github.com/Xiaoqiang-Yan/DGCSG.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287528","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
StripePy: fast and robust characterization of architectural stripes. StripePy:快速和健壮的建筑条纹特征。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf351
Andrea Raffo, Roberto Rossini, Jonas Paulsen
{"title":"StripePy: fast and robust characterization of architectural stripes.","authors":"Andrea Raffo, Roberto Rossini, Jonas Paulsen","doi":"10.1093/bioinformatics/btaf351","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf351","url":null,"abstract":"<p><strong>Motivation: </strong>Architectural stripes in Hi-C and related data are crucial for gene regulation, development, and DNA repair. Despite their importance, few tools exist for automatic stripe detection.</p><p><strong>Results: </strong>We introduce StripePy, which leverages computational geometry methods to identify and analyze architectural stripes in contact maps from Chromosome Conformation Capture experiments like Hi-C and Micro-C. StripePy outperforms existing tools, as shown through tests on various datasets and a newly developed simulated benchmark, StripeBench, providing a valuable resource for the community.</p><p><strong>Availability and implementation: </strong>StripePy is released to the public as an open source, MIT-licensed Python application. StripePy source code is hosted on GitHub at https://github.com/paulsengroup/StripePy and is archived on Zenodo. StripePy can be easily installed from source or PyPI using pip and from Bioconda using conda. Containerized versions of StripePy are regularly published on DockerHub.</p><p><strong>Supplementary information: </strong>Supplementary data are provided as a separate file.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287533","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
HAP-SAMPLE2: Data-based Resampling for Association Studies with Admixture. HAP-SAMPLE2:基于数据的重采样与混合物的关联研究。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf333
George Sun, Bryan W Ting, Fred A Wright, Yi-Hui Zhou
{"title":"HAP-SAMPLE2: Data-based Resampling for Association Studies with Admixture.","authors":"George Sun, Bryan W Ting, Fred A Wright, Yi-Hui Zhou","doi":"10.1093/bioinformatics/btaf333","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf333","url":null,"abstract":"<p><strong>Motivation: </strong>HAP-SAMPLE2 extends the functionality of the original HAP-SAMPLE tool for simulating genotype-phenotype data, now with features to handle population admixture and rare variant analysis. It allows users to define parameters such as disease prevalence and allele effect sizes for both common and rare variant simulations.</p><p><strong>Application: </strong>HAP-SAMPLE2 provides an efficient means for simulating complex datasets, suitable for large-scale projects like the 1000 Genomes Project. Its capabilities for population admixture allow users to create admixed populations or preserve substructures, while introducing novel variation through artificial recombination. Additionally, the tool supports burden testing for rare variants using fixed and Madsen-Browning weighting schemes.</p><p><strong>Availability: </strong>The software, along with a detailed vignette, is available on GitHub: https://github.com/M3dical/HAPSAMPLE2.</p><p><strong>Supplementary information: </strong>A supplemental material file and software vignette are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287530","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation. 基因空间整合:通过深度学习和批次效应缓解增强空间转录组学分析。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf350
Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song
{"title":"Gene Spatial Integration: enhancing spatial transcriptomics analysis via deep learning and batch effect mitigation.","authors":"Rian Pratama, Jason Hilton, J Michael Cherry, Giltae Song","doi":"10.1093/bioinformatics/btaf350","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf350","url":null,"abstract":"<p><strong>Motivation: </strong>Spatial transcriptomics (ST) is a groundbreaking technique for studying the correlation between cellular organization within a tissue and their physiological and pathological properties. Every facet of spatial information, including cell/spot proximity, distribution, and dimensionality, is significant. Most methods lean heavily on proximity for ST analysis, each resulting in useful insights but still leaving other aspects untapped. In addition, samples procured at different times, different donors, and by different technologies introduce a batch effects problem that hinders the statistical approach employed by most analysis tools. Addressing these challenges, we have developed a deep learning method for analyzing integrated multiple ST data, focusing on the distribution aspect. Furthermore, our method aims to leverage single-cell analysis tools.</p><p><strong>Results: </strong>Our study introduces Gene Spatial Integration (GSI), a data integration pipeline utilizing representation learning approach to extract spatial distribution of genes into the same feature space as gene expression features. We employ Autoencoder network to extract spatial embedding, facilitating the projection of spatial features into gene expression feature space. Our approach allows for seamless integration of multiple samples with minimum detriment, increasing the performance of the ST data analysis tool. We show application of our method on human DLPFC dataset. Our method consistently improves the performance of the clustering of Seurat tools, with the most significant increase observed in sample 151673, almost doubling the ARI score from 0.225 to 0.405. We also combine our pipeline with the clustering of GraphST, achieving a significantly higher ARI score in sample 151672 from 0.614 to 0.795. This result reveals the potential of gene distribution spatial aspect, also emphasizes the impact of integration and batch effect removal in developing a refined analysis in understanding tissue characteristics.</p><p><strong>Availability: </strong>Implementation of GSI is accessible at https://github.com/Riandanis/Spatial_Integration_GSI.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287529","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
PhyClone: Accurate Bayesian reconstruction of cancer phylogenies from bulk sequencing. PhyClone:从大量测序中精确的贝叶斯重建癌症系统发育。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf344
Emilia Hurtado, Alexandre Bouchard-Côté, Andrew Roth
{"title":"PhyClone: Accurate Bayesian reconstruction of cancer phylogenies from bulk sequencing.","authors":"Emilia Hurtado, Alexandre Bouchard-Côté, Andrew Roth","doi":"10.1093/bioinformatics/btaf344","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf344","url":null,"abstract":"<p><strong>Motivation: </strong>Cancer is driven by somatic mutations that result in the expansion of genomically distinct sub-populations of cells called clones. Identifying the clonal composition of tumours and understanding the evolutionary relationships between clones is a crucial task in cancer genomics. Bulk DNA sequencing is commonly used for studying the clonal composition of tumours, but it is challenging to infer the genetic relationship between different clones due to the mixture of different cell populations.</p><p><strong>Results: </strong>In this work, we introduce a new probabilistic model called PhyClone that can infer clonal phylogenies from bulk sequencing data. We demonstrate the performance of PhyClone on simulated and real-world datasets and show that it outperforms previous methods in terms of accuracy and sample scalability.</p><p><strong>Availability and implementation: </strong>Source code is available on Github at: https://github.com/Roth-Lab/PhyClone under the GPL v3.0 license.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287531","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
ProtMamba: a homology-aware but alignment-free protein state space model. ProtMamba:一个同源感知但无比对的蛋白质状态空间模型。
Bioinformatics (Oxford, England) Pub Date : 2025-06-13 DOI: 10.1093/bioinformatics/btaf348
Damiano Sgarbossa, Cyril Malbranke, Anne-Florence Bitbol
{"title":"ProtMamba: a homology-aware but alignment-free protein state space model.","authors":"Damiano Sgarbossa, Cyril Malbranke, Anne-Florence Bitbol","doi":"10.1093/bioinformatics/btaf348","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf348","url":null,"abstract":"<p><strong>Motivation: </strong>Protein language models are enabling advances in elucidating the sequence-to-function mapping, and have important applications in protein design. Models based on multiple sequence alignments efficiently capture the evolutionary information in homologous protein sequences, but multiple sequence alignment construction is imperfect.</p><p><strong>Results: </strong>We present ProtMamba, a homology-aware but alignment-free protein language model based on the Mamba architecture. In contrast with attention-based models, ProtMamba efficiently handles very long context, comprising hundreds of protein sequences. It is also computationally efficient. We train ProtMamba on a large dataset of concatenated homologous sequences, using two GPUs. We combine autoregressive modeling and masked language modeling through a fill-in-the-middle training objective. This makes the model adapted to various protein design applications. We demonstrate ProtMamba's usefulness for sequence generation, motif inpainting, fitness prediction, and modeling intrinsically disordered regions. For homolog-conditioned sequence generation, ProtMamba outperforms state-of-the-art models. ProtMamba's competitive performance, despite its relatively small size, sheds light on the importance of long-context conditioning.</p><p><strong>Availability: </strong>A Python implementation of ProtMamba is freely available in our GitHub repository: https://github.com/Bitbol-Lab/ProtMamba-ssm and archived at https://doi.org/10.5281/zenodo.15584634.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144287532","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Measurement and classification of bold-shy behaviours in medaka fish. medaka鱼大胆害羞行为的测量和分类。
Bioinformatics (Oxford, England) Pub Date : 2025-06-12 DOI: 10.1093/bioinformatics/btaf342
Saul Pierotti, Ian Brettell, Tomas Fitzgerald, Cathrin Herder, Narendar Aadepu, Christian Pylatiuk, Joachim Wittbrodt, Ewan Birney, Felix Loosli
{"title":"Measurement and classification of bold-shy behaviours in medaka fish.","authors":"Saul Pierotti, Ian Brettell, Tomas Fitzgerald, Cathrin Herder, Narendar Aadepu, Christian Pylatiuk, Joachim Wittbrodt, Ewan Birney, Felix Loosli","doi":"10.1093/bioinformatics/btaf342","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf342","url":null,"abstract":"<p><strong>Motivation: </strong>Boldness-shyness is considered a fundamental axis of behavioural variation in humans and other species, with obvious adaptive causes and evolutionary implications. Besides an individual's own genetics, this phenotype is also affected by the genetic makeup of peers in the individual's social environment. To identify genetic determinants of variation along the bold-shy behavioural axis, a reliable experimental and analytical setup able to highlight direct and indirect genetic effects is needed.</p><p><strong>Results: </strong>We describe a custom assay designed to detect bold-shy behaviours in medaka fish, combining an open-field and novel-object component. We use this assay to explore direct and social genetic effects on the behaviours of 307 pairs of fish from five inbred medaka strains. Applying a Hidden Markov Model (HMM) to classify behavioural modes, we find that direct genetic effects influence the proportions of time the five strains spent in slow-moving states, explaining up to 29.7% of the variance in time spent in those states. We also found that an individual's behaviour is influenced by the genetics of its tank partner, explaining up to 8.64% of the variance in the time spent in slow-moving states. Our behavioural assay in combination with the HMM analysis is applicable to follow-up genetic linkage studies of genetic variants involved in direct behavioural effects and indirect social genetic effects. A suitable genetic resource for such studies, the Medaka Inbred Kiyosu-Karlsruhe panel (MIKK) has recently been established.</p><p><strong>Availability: </strong>The code associated with this work is available on GitHub (https://github.com/birneylab/medaka_behaviour_pilot) and Software Heritage (swh : 1: dir: c9abec1c5d62d22e43c9e97d995c56261784d9ab). Experimental data has been uploaded to the EBI Bioimage Archive (https://doi.org/10.6019/S-BIAD1421).</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144277040","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
FuNTB: A functional network clustering tool for the analysis of genome-wide genetic variants in Mycobacterium tuberculosis. FuNTB:用于分析结核分枝杆菌全基因组遗传变异的功能网络聚类工具。
Bioinformatics (Oxford, England) Pub Date : 2025-06-11 DOI: 10.1093/bioinformatics/btaf341
Ramos-García Axel A, Mejía-Ponce Paulina M, Sélem-Mojica Nelly, Santos-Díaz Alejandro, Martínez-Ledesma Emmanuel, Licona-Cassani Cuauhtémoc
{"title":"FuNTB: A functional network clustering tool for the analysis of genome-wide genetic variants in Mycobacterium tuberculosis.","authors":"Ramos-García Axel A, Mejía-Ponce Paulina M, Sélem-Mojica Nelly, Santos-Díaz Alejandro, Martínez-Ledesma Emmanuel, Licona-Cassani Cuauhtémoc","doi":"10.1093/bioinformatics/btaf341","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf341","url":null,"abstract":"<p><strong>Motivation: </strong>Tuberculosis (TB), caused by Mycobacterium tuberculosis (Mtb), still claims around 1.25 million lives each year (WHO, 2020). The growing threat of drug resistance-often driven by single‑nucleotide polymorphisms (SNPs) in Mtb genomes (Brimacombe, et al. (2007)) underscores the need for high‑quality genomic data and powerful bioinformatics tools. We present FuNTB, a Python‑based pipeline that detects non‑synonymous SNPs in Mtb and builds functional network clusters to reveal genotype-phenotype relationships.</p><p><strong>Results: </strong>FuNTB profiles non‑synonymous SNPs at the gene level across user‑defined phenotypes, pinpointing both shared and unique mutations. It uses annotated Variant Call Format (VCF) files or MTBseq outputs and merges them with clinical metadata to produce network‑XML files compatible with Cytoscape and Gephi. When applied to the CRyPTIC Mtb collection, FuNTB rapidly recovered established resistance genes and proposed novel candidates, validating its utility for mapping genotype-phenotype associations.</p><p><strong>Availability: </strong>FuNTB is implemented in Python 3.8+ and is freely available under the MIT license at https://doi.org/10.5281/zenodo.15399917.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144277039","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Sequence analysis and decoding with extra low-quality reads for DNA data storage. 序列分析和解码与额外的低质量读取DNA数据存储。
Bioinformatics (Oxford, England) Pub Date : 2025-06-10 DOI: 10.1093/bioinformatics/btaf335
Jiyeon Park, Ha Hyeon Jeon, Jeong Wook Lee, Hosung Park
{"title":"Sequence analysis and decoding with extra low-quality reads for DNA data storage.","authors":"Jiyeon Park, Ha Hyeon Jeon, Jeong Wook Lee, Hosung Park","doi":"10.1093/bioinformatics/btaf335","DOIUrl":"https://doi.org/10.1093/bioinformatics/btaf335","url":null,"abstract":"<p><strong>Motivation: </strong>Error detection/correction codes play an important role to reduce writing and/or reading costs in DNA data storage. Sequence analysis algorithms also make a crucial effect on error correction but have been executed independently from the decoding of error correction codes. In conventional sequence analysis, low-quality reads are usually discarded. For DNA data storage, low-quality reads can be constructively used to sequence analysis with the assistance of error detection/correction codes.</p><p><strong>Results: </strong>We obtained the low-quality reads which failed to pass the chastity filter in Illumina NGS sequencing. We confirmed the effectiveness of the extra low-quality reads by providing error statistics and performing decoding with them. We proposed a sequence clustering algorithm for various length reads and a consensus algorithm based on probabilistic majority and error detection to efficiently exploit the extra reads. The proposed methods reduced the reading cost by 6.83% on average and up to 19.67% while maintaining the writing cost.</p><p><strong>Availability and implementation: </strong>https://github.com/PParkJy/SAD-DNAstorage (10.5281/zenodo.15571858).</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":0.0,"publicationDate":"2025-06-10","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144268161","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":0,"RegionCategory":"","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
相关产品
×
本文献相关产品
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信