Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün
{"title":"隐私保护联邦无监督域自适应在DNA甲基化数据年龄预测中的应用。","authors":"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün","doi":"10.1093/bioinformatics/btaf465","DOIUrl":null,"url":null,"abstract":"<p><strong>Motivation: </strong>Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.</p><p><strong>Results: </strong>Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.</p><p><strong>Availability and implementation: </strong>The source code for our method is available at https://github.com/mdppml/FREDA.</p>","PeriodicalId":93899,"journal":{"name":"Bioinformatics (Oxford, England)","volume":" ","pages":""},"PeriodicalIF":5.4000,"publicationDate":"2025-10-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.\",\"authors\":\"Cem Ata Baykara, Ali Burak Ünal, Nico Pfeifer, Mete Akgün\",\"doi\":\"10.1093/bioinformatics/btaf465\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"<p><strong>Motivation: </strong>Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.</p><p><strong>Results: </strong>Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.</p><p><strong>Availability and implementation: </strong>The source code for our method is available at https://github.com/mdppml/FREDA.</p>\",\"PeriodicalId\":93899,\"journal\":{\"name\":\"Bioinformatics (Oxford, England)\",\"volume\":\" \",\"pages\":\"\"},\"PeriodicalIF\":5.4000,\"publicationDate\":\"2025-10-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Bioinformatics (Oxford, England)\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1093/bioinformatics/btaf465\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Bioinformatics (Oxford, England)","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1093/bioinformatics/btaf465","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
Privacy-preserving federated unsupervised domain adaptation with application to age prediction from DNA methylation data.
Motivation: Generalizing machine learning models across small, high-dimensional, and heterogeneous biological datasets remains a critical challenge due to domain shifts caused by variations in data collection, population differences, and privacy constraints that restrict data sharing. Existing federated domain adaptation (FDA) approaches primarily rely on deep learning and focus on classification tasks, making them unsuitable for privacy-sensitive, small-scale regression problems in biomedical research. We introduce a privacy-preserving federated method for unsupervised domain adaptation in regression, enabling robust learning across distributed, high-dimensional datasets while maintaining full data privacy.
Results: Our method is the first to enable distributed training of Gaussian processes for domain adaptation, ensuring complete privacy through randomized encoding and secure aggregation. Unlike deep learning-based FDA approaches, our method is specifically designed for small-scale, high-dimensional biological data, overcoming prior limitations in scalability and generalization. We evaluate our approach on age prediction from DNA methylation data, demonstrating that it achieves performance comparable to non-private state-of-the-art methods while fully preserving data privacy. This work enables secure and effective cross-institutional collaboration in biomedical research without requiring raw data sharing.
Availability and implementation: The source code for our method is available at https://github.com/mdppml/FREDA.