隐私保护联邦无监督域自适应在DNA甲基化数据年龄预测中的应用。

IF 5.4
Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün
{"title":"隐私保护联邦无监督域自适应在DNA甲基化数据年龄预测中的应用。","authors":"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün","doi":"10.1093/bioinformatics/btaf465","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.</p><p><strong>Results: </strong>Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.</p><p><strong>Availability and implementation: </strong>The source code for our method is available at https://github.com/mdppml/FREDA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.\",\"authors\":\"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün\",\"doi\":\"10.1093/bioinformatics/btaf465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.</p><p><strong>Results: </strong>Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.</p><p><strong>Availability and implementation: </strong>The source code for our method is available at https://github.com/mdppml/FREDA.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

动机:在小型、高维和异构生物数据集上推广机器学习模型仍然是一个关键的挑战,因为数据收集的变化、人口差异和限制数据共享的隐私约束导致了领域转移。现有的联邦领域适应(FDA)方法主要依赖于深度学习,并专注于分类任务,这使得它们不适合生物医学研究中的隐私敏感、小规模回归问题。我们引入了一种隐私保护的联邦方法,用于回归中的无监督域适应,在保持完整数据隐私的同时,实现跨分布式高维数据集的鲁棒学习。结果:我们的方法是第一个使高斯过程的分布式训练域适应,通过随机编码和安全聚合确保完全隐私。与基于深度学习的FDA方法不同,我们的方法是专门为小规模、高维生物数据设计的,克服了先前在可扩展性和泛化方面的限制。我们从DNA甲基化数据中评估了我们的年龄预测方法,证明它在完全保护数据隐私的同时实现了与非私人最先进方法相当的性能。这项工作使生物医学研究中的跨机构协作安全有效,而不需要原始数据共享。可用性:我们的方法的源代码可在https://github.com/mdppml/FREDA.Supplementary上获得。补充数据可在Bioinformatics在线上获得。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.

Motivation: Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.

Results: Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.

Availability and implementation: The source code for our method is available at https://github.com/mdppml/FREDA.

求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:604180095
Book学术官方微信