{"title":"增强知识的可重用性:分布式多任务机器学习方法","authors":"Qianyu Long;Christos Anagnostopoulos;Kostas Kolomvatsos","doi":"10.1109/TETC.2024.3390811","DOIUrl":null,"url":null,"abstract":"In the era of the Internet of Things, the unprecedented growth of data surpasses current predictive analytics and processing capabilities. Due to the potential redundancy of similar data and analytics tasks, it is imperative to extract patterns from distributed data <italic>and</i> predictive models so that existing schemes can be efficiently reused in distributed computing environments. This is expected to avoid building and maintaining <italic>reduplicative</i> predictive models. The fundamental challenge, however, is the detection of reusable tasks and tuning models in order to improve predictive capacity while being reused. We introduce a two-phase Distributed Multi-task Machine Learning (DMtL) framework coping with this challenge. In the first phase, similar tasks are identified and efficiently grouped together according to locally trained models’ performance meta-features, using Partial Learning Curves (PLC). In the subsequent phase, we leverage the PLC-driven DMtL paradigm to boost the performance of candidate reusable models per group of tasks in distributed computing environments. We provide a thorough analysis of our framework along with a comparative assessment against relevant approaches and prior work found in the respective literature. Our experimental results showcase the feasibility of the PLC-driven DMtL method in terms of adaptability and reusability of existing knowledge in distributed computing systems.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"207-221"},"PeriodicalIF":5.1000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Enhancing Knowledge Reusability: A Distributed Multitask Machine Learning Approach\",\"authors\":\"Qianyu Long;Christos Anagnostopoulos;Kostas Kolomvatsos\",\"doi\":\"10.1109/TETC.2024.3390811\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"In the era of the Internet of Things, the unprecedented growth of data surpasses current predictive analytics and processing capabilities. Due to the potential redundancy of similar data and analytics tasks, it is imperative to extract patterns from distributed data <italic>and</i> predictive models so that existing schemes can be efficiently reused in distributed computing environments. This is expected to avoid building and maintaining <italic>reduplicative</i> predictive models. The fundamental challenge, however, is the detection of reusable tasks and tuning models in order to improve predictive capacity while being reused. We introduce a two-phase Distributed Multi-task Machine Learning (DMtL) framework coping with this challenge. In the first phase, similar tasks are identified and efficiently grouped together according to locally trained models’ performance meta-features, using Partial Learning Curves (PLC). In the subsequent phase, we leverage the PLC-driven DMtL paradigm to boost the performance of candidate reusable models per group of tasks in distributed computing environments. We provide a thorough analysis of our framework along with a comparative assessment against relevant approaches and prior work found in the respective literature. Our experimental results showcase the feasibility of the PLC-driven DMtL method in terms of adaptability and reusability of existing knowledge in distributed computing systems.\",\"PeriodicalId\":13156,\"journal\":{\"name\":\"IEEE Transactions on Emerging Topics in Computing\",\"volume\":\"13 1\",\"pages\":\"207-221\"},\"PeriodicalIF\":5.1000,\"publicationDate\":\"2024-03-06\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"IEEE Transactions on Emerging Topics in Computing\",\"FirstCategoryId\":\"94\",\"ListUrlMain\":\"https://ieeexplore.ieee.org/document/10521461/\",\"RegionNum\":2,\"RegionCategory\":\"计算机科学\",\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"Q1\",\"JCRName\":\"COMPUTER SCIENCE, INFORMATION SYSTEMS\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10521461/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
Enhancing Knowledge Reusability: A Distributed Multitask Machine Learning Approach
In the era of the Internet of Things, the unprecedented growth of data surpasses current predictive analytics and processing capabilities. Due to the potential redundancy of similar data and analytics tasks, it is imperative to extract patterns from distributed data and predictive models so that existing schemes can be efficiently reused in distributed computing environments. This is expected to avoid building and maintaining reduplicative predictive models. The fundamental challenge, however, is the detection of reusable tasks and tuning models in order to improve predictive capacity while being reused. We introduce a two-phase Distributed Multi-task Machine Learning (DMtL) framework coping with this challenge. In the first phase, similar tasks are identified and efficiently grouped together according to locally trained models’ performance meta-features, using Partial Learning Curves (PLC). In the subsequent phase, we leverage the PLC-driven DMtL paradigm to boost the performance of candidate reusable models per group of tasks in distributed computing environments. We provide a thorough analysis of our framework along with a comparative assessment against relevant approaches and prior work found in the respective literature. Our experimental results showcase the feasibility of the PLC-driven DMtL method in terms of adaptability and reusability of existing knowledge in distributed computing systems.
期刊介绍:
IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.