Self-supervised Representation Learning on Gene Expression Data.

IF 5.4
Kevin Dradjat, Massinissa Hamidi, Pierre Bartet, Blaise Hanczar
{"title":"Self-supervised Representation Learning on Gene Expression Data.","authors":"Kevin Dradjat, Massinissa Hamidi, Pierre Bartet, Blaise Hanczar","doi":"10.1093/bioinformatics/btaf533","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Predicting phenotypes from gene expression data is a crucial task in biomedical research, enabling insights into disease mechanisms, drug responses, and personalized medicine. Traditional machine learning and deep learning rely on supervised learning, which requires large quantities of labeled data that are costly and time-consuming to obtain in the case of gene expression data. Self-supervised learning has recently emerged as a promising approach to overcome these limitations by extracting information directly from the structure of unlabeled data.</p><p><strong>Results: </strong>In this study, we investigate the application of state-of-the-art self-supervised learning methods to bulk gene expression data for phenotype prediction. We selected three self-supervised methods, based on different approaches, to assess their ability to exploit the inherent structure of the data and to generate qualitative representations which can be used for downstream predictive tasks. By using several publicly available gene expression datasets, we demonstrate how the selected methods can effectively capture complex information and improve phenotype prediction accuracy. The results obtained show that self-supervised learning methods can outperform traditional supervised models besides offering significant advantage by reducing the dependency on annotated data. We provide a comprehensive analysis of the performance of each method by highlighting their strengths and limitations. We also provide recommendations for using these methods depending on the case under study. Finally, we outline future research directions to enhance the application of self-supervised learning in the field of gene expression data analysis. This study is the first work that deals with bulk RNA-Seq data and self-supervised learning.</p><p><strong>Availability: </strong>The code and results are available at https://github.com/kdradjat/ssrl-rnaseq.</p><p><strong>Supplementary information: </strong>Supplementary data are available at Bioinformatics online.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-09-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf533","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

Motivation: Predicting phenotypes from gene expression data is a crucial task in biomedical research, enabling insights into disease mechanisms, drug responses, and personalized medicine. Traditional machine learning and deep learning rely on supervised learning, which requires large quantities of labeled data that are costly and time-consuming to obtain in the case of gene expression data. Self-supervised learning has recently emerged as a promising approach to overcome these limitations by extracting information directly from the structure of unlabeled data.

Results: In this study, we investigate the application of state-of-the-art self-supervised learning methods to bulk gene expression data for phenotype prediction. We selected three self-supervised methods, based on different approaches, to assess their ability to exploit the inherent structure of the data and to generate qualitative representations which can be used for downstream predictive tasks. By using several publicly available gene expression datasets, we demonstrate how the selected methods can effectively capture complex information and improve phenotype prediction accuracy. The results obtained show that self-supervised learning methods can outperform traditional supervised models besides offering significant advantage by reducing the dependency on annotated data. We provide a comprehensive analysis of the performance of each method by highlighting their strengths and limitations. We also provide recommendations for using these methods depending on the case under study. Finally, we outline future research directions to enhance the application of self-supervised learning in the field of gene expression data analysis. This study is the first work that deals with bulk RNA-Seq data and self-supervised learning.

Availability: The code and results are available at https://github.com/kdradjat/ssrl-rnaseq.

Supplementary information: Supplementary data are available at Bioinformatics online.

基因表达数据的自监督表示学习。
动机:从基因表达数据预测表型是生物医学研究中的一项关键任务,可以深入了解疾病机制、药物反应和个性化医疗。传统的机器学习和深度学习依赖于监督学习,这需要大量的标记数据,在基因表达数据的情况下,这些数据的获取成本高,耗时长。自我监督学习最近成为一种很有前途的方法,通过直接从未标记数据的结构中提取信息来克服这些限制。结果:在本研究中,我们研究了最先进的自我监督学习方法在大量基因表达数据中用于表型预测的应用。我们选择了三种基于不同方法的自监督方法,以评估它们利用数据固有结构并生成可用于下游预测任务的定性表示的能力。通过使用几个公开可用的基因表达数据集,我们展示了所选择的方法如何有效地捕获复杂信息并提高表型预测的准确性。结果表明,自监督学习方法在减少对标注数据依赖的基础上,具有明显优于传统监督学习模型的优势。我们通过突出每种方法的优点和局限性,对每种方法的性能进行了全面的分析。我们还根据所研究的案例提供了使用这些方法的建议。最后,我们概述了未来的研究方向,以加强自监督学习在基因表达数据分析领域的应用。这项研究是第一个处理大量RNA-Seq数据和自监督学习的工作。可用性:代码和结果可在https://github.com/kdradjat/ssrl-rnaseq.Supplementary信息上获得;补充数据可在Bioinformatics在线上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信