重用预训练模型的机器学习管道

Proceedings of the 12th International Conference on Management of Digital EcoSystems Pub Date : 2020-11-02 DOI:10.1145/3415958.3433054

M. Alshehhi, Di Wang

{"title":"重用预训练模型的机器学习管道","authors":"M. Alshehhi, Di Wang","doi":"10.1145/3415958.3433054","DOIUrl":null,"url":null,"abstract":"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.","PeriodicalId":198419,"journal":{"name":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","volume":"12 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2020-11-02","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"3","resultStr":"{\"title\":\"Machine Learning Pipeline for Reusing Pretrained Models\",\"authors\":\"M. Alshehhi, Di Wang\",\"doi\":\"10.1145/3415958.3433054\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.\",\"PeriodicalId\":198419,\"journal\":{\"name\":\"Proceedings of the 12th International Conference on Management of Digital EcoSystems\",\"volume\":\"12 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2020-11-02\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"3\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the 12th International Conference on Management of Digital EcoSystems\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3415958.3433054\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the 12th International Conference on Management of Digital EcoSystems","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3415958.3433054","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 3

摘要

事实证明，机器学习方法在分析各种格式的大量数据以获取模式、检测趋势、获得洞察力和基于历史数据预测结果方面是有效的。然而，从时间和数据消耗的角度来看，在各种实际应用程序中从头开始训练模型是非常昂贵的。模型自适应(域自适应)是解决这一问题的一种很有前途的方法。它可以重用嵌入在现有模型中的知识来训练另一个模型。然而，由于数据集偏差或域移位，模型自适应是一项具有挑战性的任务。此外，由于数据隐私和成本问题(收集额外的数据可能需要花钱)，从原始(源)域和目的地(目标)域访问数据在现实世界中经常是一个问题。近年来介绍了几种领域自适应算法和方法;他们为不同但相关的目标领域重用来自一个源领域的训练模型。现有的许多领域自适应方法都是利用源领域的数据来修改训练好的模型结构或调整目标领域的潜在空间。领域自适应技术可以根据几个标准进行评估，即准确性、知识转移、培训时间和预算。在本文中，我们从这样的概念出发，即在许多现实场景中，训练模型的所有者限制对模型结构和源数据集的访问。为了解决这一问题，我们提出了一种方法，在不访问源域的情况下，有效地从目标域中选择数据(最小化目标域数据的消耗)以适应现有模型，同时仍然达到可接受的精度。我们的方法是为监督学习和半监督学习设计的，并可扩展到无监督学习。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Machine Learning Pipeline for Reusing Pretrained Models

Machine learning methods have proven to be effective in analyzing vast amounts of data in various formats to obtain patterns, detect trends, gain insight, and predict outcomes based on historical data. However, training models from scratch across various real-world applications is costly in terms of both time and data consumption. Model adaptation (Domain Adaptation) is a promising methodology to tackle this problem. It can reuse the knowledge embedded in an existing model to train another model. However, model adaptation is a challenging task due to dataset bias or domain shift. In addition, data access from both the original (source) domain and the destination (target) domain is often an issue in the real world, due to data privacy and cost issues (gathering additional data may cost money). Several domain adaptation algorithms and methodologies have introduced in recent years; they reuse trained models from one source domain for a different but related target domain. Many existing domain adaptation approaches aim at modifying the trained model structure or adjusting the latent space of the target domain using data from the source domain. Domain adaptation techniques can be evaluated over several criteria, namely, accuracy, knowledge transfer, training time, and budget. In this paper, we start from the notion that in many real-world scenarios, the owner of the trained model restricts access to the model structure and the source dataset. To solve this problem, we propose a methodology to efficiently select data from the target domain (minimizing consumption of target domain data) to adapt the existing model without accessing the source domain, while still achieving acceptable accuracy. Our approach is designed for supervised and semi-supervised learning and extendable to unsupervised learning.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the 12th International Conference on Management of Digital EcoSystems

自引率

0.00%

发文量