Conditional similarity triplets enable covariate-informed representations of single-cell data.

IF 2.9 3区 生物学 Q2 BIOCHEMICAL RESEARCH METHODS
Chi-Jane Chen, Haidong Yi, Natalie Stanley
{"title":"Conditional similarity triplets enable covariate-informed representations of single-cell data.","authors":"Chi-Jane Chen, Haidong Yi, Natalie Stanley","doi":"10.1186/s12859-025-06069-5","DOIUrl":null,"url":null,"abstract":"<p><strong>Background: </strong>Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approaches are often employed to compute immunological summaries or per-sample featurizations, which can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are trained only to accurately predict a single outcome and do not take into account relevant additional clinical features or covariates that are likely to also be measured for each sample.</p><p><strong>Results: </strong>Here, we introduce a novel approach for incorporating measured covariates in optimizing model parameters to ultimately specify per-sample encodings that accurately affect both immune signatures and additional clinical information. Our introduced method CytoCoSet is a set-based encoding method for learning per-sample featurizations, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations.</p><p><strong>Conclusions: </strong>Overall, incorporating clinical covariates enables the learning of encodings for each individual sample that ultimately improve prediction of clinical outcome. This integration of information disparate more robust predictions of clinical phenotypes and holds significant potential for enhancing diagnostic and treatment strategies.</p>","PeriodicalId":8958,"journal":{"name":"BMC Bioinformatics","volume":"26 1","pages":"45"},"PeriodicalIF":2.9000,"publicationDate":"2025-02-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"https://www.ncbi.nlm.nih.gov/pmc/articles/PMC11807331/pdf/","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"BMC Bioinformatics","FirstCategoryId":"99","ListUrlMain":"https://doi.org/10.1186/s12859-025-06069-5","RegionNum":3,"RegionCategory":"生物学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q2","JCRName":"BIOCHEMICAL RESEARCH METHODS","Score":null,"Total":0}
引用次数: 0

Abstract

Background: Single-cell technologies enable comprehensive profiling of diverse immune cell-types through the measurement of multiple genes or proteins per individual cell. In order to translate immune signatures assayed from blood or tissue into powerful diagnostics, machine learning approaches are often employed to compute immunological summaries or per-sample featurizations, which can be used as inputs to models for outcomes of interest. Current supervised learning approaches for computing per-sample representations are trained only to accurately predict a single outcome and do not take into account relevant additional clinical features or covariates that are likely to also be measured for each sample.

Results: Here, we introduce a novel approach for incorporating measured covariates in optimizing model parameters to ultimately specify per-sample encodings that accurately affect both immune signatures and additional clinical information. Our introduced method CytoCoSet is a set-based encoding method for learning per-sample featurizations, which formulates a loss function with an additional triplet term penalizing samples with similar covariates from having disparate embedding results in per-sample representations.

Conclusions: Overall, incorporating clinical covariates enables the learning of encodings for each individual sample that ultimately improve prediction of clinical outcome. This integration of information disparate more robust predictions of clinical phenotypes and holds significant potential for enhancing diagnostic and treatment strategies.

背景:单细胞技术可通过测量每个细胞的多个基因或蛋白质,对不同的免疫细胞类型进行全面分析。为了将从血液或组织中检测到的免疫特征转化为强大的诊断功能,通常会采用机器学习方法来计算免疫学总结或每个样本的特征描述,这些特征描述可用作相关结果模型的输入。目前用于计算每个样本表征的监督学习方法只为准确预测单一结果而训练,并不考虑每个样本可能测量到的相关附加临床特征或协变量:结果:在此,我们介绍了一种新方法,将测量的协变量纳入模型参数的优化过程中,最终指定能准确影响免疫特征和其他临床信息的每样本编码。我们介绍的方法 CytoCoSet 是一种基于集合的编码方法,用于学习每个样本的特征化,该方法制定了一个损失函数,其中包含一个额外的三元项,用于惩罚具有相似协变量的样本,防止其在每个样本表示中出现不同的嵌入结果:总之,结合临床协变量可以学习每个样本的编码,最终改善临床结果的预测。这种信息整合能对临床表型进行更可靠的预测,并在改进诊断和治疗策略方面具有巨大潜力。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
BMC Bioinformatics
BMC Bioinformatics 生物-生化研究方法
CiteScore
5.70
自引率
3.30%
发文量
506
审稿时长
4.3 months
期刊介绍: BMC Bioinformatics is an open access, peer-reviewed journal that considers articles on all aspects of the development, testing and novel application of computational and statistical methods for the modeling and analysis of all kinds of biological data, as well as other areas of computational biology. BMC Bioinformatics is part of the BMC series which publishes subject-specific journals focused on the needs of individual research communities across all areas of biology and medicine. We offer an efficient, fair and friendly peer review service, and are committed to publishing all sound science, provided that there is some advance in knowledge presented by the work.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信