隐私保护联邦无监督域自适应在DNA甲基化数据年龄预测中的应用。

IF 5.4

Bioinformatics (Oxford, England) Pub Date : 2025-10-02 DOI:10.1093/bioinformatics/btaf465

Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün

{"title":"隐私保护联邦无监督域自适应在DNA甲基化数据年龄预测中的应用。","authors":"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün","doi":"10.1093/bioinformatics/btaf465","DOIUrl":null,"url":null,"abstract":"Motivation: Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.Results: Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.Availability and implementation: The source code for our method is available at https://github.com/mdppml/FREDA.","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.\",\"authors\":\"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün\",\"doi\":\"10.1093/bioinformatics/btaf465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Motivation: Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.Results: Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.Availability and implementation: The source code for our method is available at https://github.com/mdppml/FREDA.\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 0

摘要

动机：在小型、高维和异构生物数据集上推广机器学习模型仍然是一个关键的挑战，因为数据收集的变化、人口差异和限制数据共享的隐私约束导致了领域转移。现有的联邦领域适应（FDA）方法主要依赖于深度学习，并专注于分类任务，这使得它们不适合生物医学研究中的隐私敏感、小规模回归问题。我们引入了一种隐私保护的联邦方法，用于回归中的无监督域适应，在保持完整数据隐私的同时，实现跨分布式高维数据集的鲁棒学习。结果：我们的方法是第一个使高斯过程的分布式训练域适应，通过随机编码和安全聚合确保完全隐私。与基于深度学习的FDA方法不同，我们的方法是专门为小规模、高维生物数据设计的，克服了先前在可扩展性和泛化方面的限制。我们从DNA甲基化数据中评估了我们的年龄预测方法，证明它在完全保护数据隐私的同时实现了与非私人最先进方法相当的性能。这项工作使生物医学研究中的跨机构协作安全有效，而不需要原始数据共享。可用性：我们的方法的源代码可在https://github.com/mdppml/FREDA.Supplementary上获得。补充数据可在Bioinformatics在线上获得。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.

Motivation: Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.

Results: Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.

Availability and implementation: The source code for our method is available at https://github.com/mdppml/FREDA.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Bioinformatics (Oxford, England)

自引率

0.00%

发文量