整合用于评估低质量样品的大量RNA-seq管道指标。

Samuel Hamilton, Gaurav Gadhvi, Tyler Therron, Deborah R Winter
{"title":"整合用于评估低质量样品的大量RNA-seq管道指标。","authors":"Samuel Hamilton, Gaurav Gadhvi, Tyler Therron, Deborah R Winter","doi":"10.21203/rs.3.rs-6976695/v1","DOIUrl":null,"url":null,"abstract":"<p><p>Background With the rise of RNA-seq as an essential and ubiquitous tool for biomedical research, the need for guidelines on quality control (QC) is pressing. Specifically, there remains limited data as to which technical metrics are most informative in identifying low-quality samples. Results Here, we addressed this issue by developing the Quality Control Diagnostic Renderer (QC-DR), software designed to simultaneously visualize a comprehensive panel of QC metrics generated by an RNA-seq pipeline and flag samples with aberrant values when compared to a reference dataset. As an example, we applied QC-DR to the Successful Clinical Response in Pneumonia Therapy (SCRIPT) dataset, a large clinical RNA-seq dataset of sequenced alveolar macrophages (n = 252). Next, we used this dataset to assess relationships between a variety of QC metrics and sample quality. Among the most highly correlated pipeline QC metrics were <i>%</i> and <i># Uniquely Aligned Reads</i> , <i>% rRNA reads</i> , <i># Detected Genes</i> , and our newly developed metric of <i>Area Under the Gene Body Coverage Curve (AUC-GBC</i> ), while experimental QC metrics derived from the lab were not significantly correlated. We then trained a set of machine learning models on the SCRIPT dataset to evaluate the relative contribution of QC metrics to sample quality prediction. Our model performs well when tested on an independent dataset despite differences in the distribution of QC metrics. Conclusions Our results support the conclusion that any individual QC metric is limited in its predictive value and suggests approaches based on the integration of multiple metrics with QC thresholds. In summary, our work provides new insights, practical guidance, and novel QC software which can be used to improve the methodological rigor of RNA-seq studies.</p>","PeriodicalId":519972,"journal":{"name":"Research square","volume":" ","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2025-07-03","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236924/pdf/","citationCount":"0","resultStr":"{\"title\":\"Integration of Bulk RNA-seq Pipeline Metrics for Assessing Low-Quality Samples.\",\"authors\":\"Samuel Hamilton, Gaurav Gadhvi, Tyler Therron, Deborah R Winter\",\"doi\":\"10.21203/rs.3.rs-6976695/v1\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><p>Background With the rise of RNA-seq as an essential and ubiquitous tool for biomedical research, the need for guidelines on quality control (QC) is pressing. Specifically, there remains limited data as to which technical metrics are most informative in identifying low-quality samples. Results Here, we addressed this issue by developing the Quality Control Diagnostic Renderer (QC-DR), software designed to simultaneously visualize a comprehensive panel of QC metrics generated by an RNA-seq pipeline and flag samples with aberrant values when compared to a reference dataset. As an example, we applied QC-DR to the Successful Clinical Response in Pneumonia Therapy (SCRIPT) dataset, a large clinical RNA-seq dataset of sequenced alveolar macrophages (n = 252). Next, we used this dataset to assess relationships between a variety of QC metrics and sample quality. Among the most highly correlated pipeline QC metrics were <i>%</i> and <i># Uniquely Aligned Reads</i> , <i>% rRNA reads</i> , <i># Detected Genes</i> , and our newly developed metric of <i>Area Under the Gene Body Coverage Curve (AUC-GBC</i> ), while experimental QC metrics derived from the lab were not significantly correlated. We then trained a set of machine learning models on the SCRIPT dataset to evaluate the relative contribution of QC metrics to sample quality prediction. Our model performs well when tested on an independent dataset despite differences in the distribution of QC metrics. Conclusions Our results support the conclusion that any individual QC metric is limited in its predictive value and suggests approaches based on the integration of multiple metrics with QC thresholds. In summary, our work provides new insights, practical guidance, and novel QC software which can be used to improve the methodological rigor of RNA-seq studies.</p>\",\"PeriodicalId\":519972,\"journal\":{\"name\":\"Research square\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-07-03\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC12236924/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Research square\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.21203/rs.3.rs-6976695/v1\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Research square","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.21203/rs.3.rs-6976695/v1","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

随着RNA-seq作为生物医学研究中必不可少和无处不在的工具的兴起,对质量控制(QC)指南的需求迫在眉睫。具体来说,关于哪些技术指标在识别低质量样本方面最具信息性的数据仍然有限。在这里,我们通过开发质量控制诊断渲染器(QC- dr)解决了这个问题,该软件旨在同时可视化由RNA-seq管道生成的全面QC指标面板,并标记与参考数据集相比具有异常值的样本。例如,我们将QC-DR应用于肺炎治疗的成功临床反应(SCRIPT)数据集,这是一个大型肺泡巨噬细胞测序的临床rna序列数据集(n = 252)。接下来,我们使用该数据集来评估各种QC指标与样品质量之间的关系。其中相关性最高的流水线QC指标是%和#唯一对齐Reads、% rRNA Reads、#检测基因和我们新开发的基因覆盖曲线下面积(AUC-GBC)指标,而来自实验室的实验QC指标没有显著相关性。然后,我们在SCRIPT数据集上训练了一组机器学习模型,以评估QC指标对样本质量预测的相对贡献。我们的模型在独立数据集上测试时表现良好,尽管QC指标的分布存在差异。我们的研究结果支持了任何单个质量控制指标的预测价值都是有限的这一结论,并提出了基于多个质量控制指标与质量控制阈值相结合的方法。总之,我们的工作提供了新的见解,实践指导和新的QC软件,可用于提高RNA-seq研究方法的严谨性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Integration of Bulk RNA-seq Pipeline Metrics for Assessing Low-Quality Samples.

Background With the rise of RNA-seq as an essential and ubiquitous tool for biomedical research, the need for guidelines on quality control (QC) is pressing. Specifically, there remains limited data as to which technical metrics are most informative in identifying low-quality samples. Results Here, we addressed this issue by developing the Quality Control Diagnostic Renderer (QC-DR), software designed to simultaneously visualize a comprehensive panel of QC metrics generated by an RNA-seq pipeline and flag samples with aberrant values when compared to a reference dataset. As an example, we applied QC-DR to the Successful Clinical Response in Pneumonia Therapy (SCRIPT) dataset, a large clinical RNA-seq dataset of sequenced alveolar macrophages (n = 252). Next, we used this dataset to assess relationships between a variety of QC metrics and sample quality. Among the most highly correlated pipeline QC metrics were % and # Uniquely Aligned Reads , % rRNA reads , # Detected Genes , and our newly developed metric of Area Under the Gene Body Coverage Curve (AUC-GBC ), while experimental QC metrics derived from the lab were not significantly correlated. We then trained a set of machine learning models on the SCRIPT dataset to evaluate the relative contribution of QC metrics to sample quality prediction. Our model performs well when tested on an independent dataset despite differences in the distribution of QC metrics. Conclusions Our results support the conclusion that any individual QC metric is limited in its predictive value and suggests approaches based on the integration of multiple metrics with QC thresholds. In summary, our work provides new insights, practical guidance, and novel QC software which can be used to improve the methodological rigor of RNA-seq studies.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信