Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights.

IF 6.8 2区 生物学 Q1 BIOCHEMICAL RESEARCH METHODS
Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian
{"title":"Enhancing RNA-seq analysis by addressing all co-existing biases using a self-benchmarking approach with 2D structural insights.","authors":"Qiang Su, Yi Long, Deming Gou, Junmin Quan, Qizhou Lian","doi":"10.1093/bib/bbae532","DOIUrl":null,"url":null,"abstract":"<p><p>We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.</p>","PeriodicalId":9209,"journal":{"name":"Briefings in bioinformatics","volume":null,"pages":null},"PeriodicalIF":6.8000,"publicationDate":"2024-09-23","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11491153/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Briefings in bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1093/bib/bbae532","RegionNum":2,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

We introduce a groundbreaking approach: the minimum free energy-based Gaussian Self-Benchmarking (MFE-GSB) framework, designed to combat the myriad of biases inherent in RNA-seq data. Central to our methodology is the MFE concept, facilitating the adoption of a Gaussian distribution model tailored to effectively mitigate all co-existing biases within a k-mer counting scheme. The MFE-GSB framework operates on a sophisticated dual-model system, juxtaposing modeling data of uniform k-mer distribution against the real, observed sequencing data characterized by nonuniform k-mer distributions. The framework applies a Gaussian function, guided by the predetermined parameters-mean and SD-derived from modeling data, to fit unknown sequencing data. This dual comparison allows for the accurate prediction of k-mer abundances across MFE categories, enabling simultaneous correction of biases at the single k-mer level. Through validation with both engineered RNA constructs and human tissue RNA samples, its wide-ranging efficacy and applicability are demonstrated.

利用具有二维结构洞察力的自我基准方法解决所有并存的偏差,从而加强 RNA-seq 分析。
我们引入了一种开创性的方法:基于最小自由能的高斯自基准(MFE-GSB)框架,旨在消除 RNA-seq 数据中固有的无数偏差。我们方法的核心是 MFE 概念,它有助于采用高斯分布模型,以有效减轻 k-mer 计数方案中所有并存的偏差。MFE-GSB 框架在复杂的双模型系统上运行,将统一 k-mer 分布的建模数据与以非统一 k-mer 分布为特征的真实观察测序数据并列。该框架应用高斯函数,在从建模数据中提取的预定参数--均值和标度--的指导下,拟合未知测序数据。通过这种双重比较,可以准确预测不同 MFE 类别的 k-mer丰度,同时纠正单个 k-mer水平的偏差。通过对工程 RNA 构建和人体组织 RNA 样本的验证,证明了该方法的广泛功效和适用性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
Briefings in bioinformatics
Briefings in bioinformatics 生物-生化研究方法
CiteScore
13.20
自引率
13.70%
发文量
549
审稿时长
6 months
期刊介绍: Briefings in Bioinformatics is an international journal serving as a platform for researchers and educators in the life sciences. It also appeals to mathematicians, statisticians, and computer scientists applying their expertise to biological challenges. The journal focuses on reviews tailored for users of databases and analytical tools in contemporary genetics, molecular and systems biology. It stands out by offering practical assistance and guidance to non-specialists in computerized methodologies. Covering a wide range from introductory concepts to specific protocols and analyses, the papers address bacterial, plant, fungal, animal, and human data. The journal's detailed subject areas include genetic studies of phenotypes and genotypes, mapping, DNA sequencing, expression profiling, gene expression studies, microarrays, alignment methods, protein profiles and HMMs, lipids, metabolic and signaling pathways, structure determination and function prediction, phylogenetic studies, and education and training.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信