一个灵活的框架，从临床组学研究中发现最小的生物标志物特征，没有文库大小标准化。

PLOS digital health Pub Date : 2025-03-26 eCollection Date: 2025-03-01 DOI:10.1371/journal.pdig.0000780

Daniel Rawlinson, Chenxi Zhou, Myrsini Kaforou, Kim-Anh Lê Cao, Lachlan J M Coin

{"title":"一个灵活的框架，从临床组学研究中发现最小的生物标志物特征，没有文库大小标准化。","authors":"Daniel Rawlinson, Chenxi Zhou, Myrsini Kaforou, Kim-Anh Lê Cao, Lachlan J M Coin","doi":"10.1371/journal.pdig.0000780","DOIUrl":null,"url":null,"abstract":"Application of transcriptomics, proteomics and metabolomics technologies to clinical cohorts has uncovered a variety of signatures for predicting disease. Many of these signatures require the full 'omics data for evaluation on unseen samples, either explicitly or implicitly through library size normalisation. Translation to low-cost point-of-care tests requires development of signatures which measure as few analytes as possible without relying on direct measurement of library size. To achieve this, we have developed a feature selection method (Forward Selection-Partial Least Squares) which generates minimal disease signatures from high-dimensional omics datasets with applicability to continuous, binary or multi-class outcomes. Through extensive benchmarking, we show that FS-PLS has comparable performance to commonly used signature discovery methods while delivering signatures which are an order of magnitude smaller. We show that FS-PLS can be used to select features predictive of library size, and that these features can be used to normalize unseen samples, meaning that the features in the complete model can be measured in isolation for making new predictions. By enabling discovery of small, high-performance signatures, FS-PLS addresses an important impediment for the further development of precision medical care.","PeriodicalId":74465,"journal":{"name":"PLOS digital health","volume":"4 3","pages":"e0000780"},"PeriodicalIF":0.0000,"publicationDate":"2025-03-26","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11942414/pdf/","citationCount":"0","resultStr":"{\"title\":\"A flexible framework for minimal biomarker signature discovery from clinical omics studies without library size normalisation.\",\"authors\":\"Daniel Rawlinson, Chenxi Zhou, Myrsini Kaforou, Kim-Anh Lê Cao, Lachlan J M Coin\",\"doi\":\"10.1371/journal.pdig.0000780\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Application of transcriptomics, proteomics and metabolomics technologies to clinical cohorts has uncovered a variety of signatures for predicting disease. Many of these signatures require the full 'omics data for evaluation on unseen samples, either explicitly or implicitly through library size normalisation. Translation to low-cost point-of-care tests requires development of signatures which measure as few analytes as possible without relying on direct measurement of library size. To achieve this, we have developed a feature selection method (Forward Selection-Partial Least Squares) which generates minimal disease signatures from high-dimensional omics datasets with applicability to continuous, binary or multi-class outcomes. Through extensive benchmarking, we show that FS-PLS has comparable performance to commonly used signature discovery methods while delivering signatures which are an order of magnitude smaller. We show that FS-PLS can be used to select features predictive of library size, and that these features can be used to normalize unseen samples, meaning that the features in the complete model can be measured in isolation for making new predictions. By enabling discovery of small, high-performance signatures, FS-PLS addresses an important impediment for the further development of precision medical care.\",\"PeriodicalId\":74465,\"journal\":{\"name\":\"PLOS digital health\",\"volume\":\"4 3\",\"pages\":\"e0000780\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2025-03-26\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11942414/pdf/\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"PLOS digital health\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1371/journal.pdig.0000780\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"2025/3/1 0:00:00\",\"PubModel\":\"eCollection\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"PLOS digital health","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1371/journal.pdig.0000780","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/3/1 0:00:00","PubModel":"eCollection","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

转录组学、蛋白质组学和代谢组学技术在临床队列中的应用已经发现了多种预测疾病的特征。其中许多特征需要完整的'omics'数据，以便对未见过的样本进行评估，这些数据可以是明确的，也可以是通过库大小归一化隐含的。要将这些特征转化为低成本的护理点检测，就需要开发能测量尽可能少的分析物而不依赖于直接测量库大小的特征。为此，我们开发了一种特征选择方法（前向选择-部分最小二乘法），可从高维 omics 数据集生成最小疾病特征，适用于连续、二元或多类结果。通过广泛的基准测试，我们发现 FS-PLS 的性能可与常用的特征发现方法相媲美，而所生成的特征却小了一个数量级。我们表明，FS-PLS 可用于选择预测库大小的特征，这些特征可用于对未见样本进行归一化处理，这意味着可以单独测量完整模型中的特征，从而做出新的预测。FS-PLS 能够发现小型、高性能的特征，解决了进一步发展精准医疗的一个重要障碍。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

A flexible framework for minimal biomarker signature discovery from clinical omics studies without library size normalisation.

Application of transcriptomics, proteomics and metabolomics technologies to clinical cohorts has uncovered a variety of signatures for predicting disease. Many of these signatures require the full 'omics data for evaluation on unseen samples, either explicitly or implicitly through library size normalisation. Translation to low-cost point-of-care tests requires development of signatures which measure as few analytes as possible without relying on direct measurement of library size. To achieve this, we have developed a feature selection method (Forward Selection-Partial Least Squares) which generates minimal disease signatures from high-dimensional omics datasets with applicability to continuous, binary or multi-class outcomes. Through extensive benchmarking, we show that FS-PLS has comparable performance to commonly used signature discovery methods while delivering signatures which are an order of magnitude smaller. We show that FS-PLS can be used to select features predictive of library size, and that these features can be used to normalize unseen samples, meaning that the features in the complete model can be measured in isolation for making new predictions. By enabling discovery of small, high-performance signatures, FS-PLS addresses an important impediment for the further development of precision medical care.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

PLOS digital health

自引率

0.00%

发文量