Enhancing Knowledge Reusability: A Distributed Multitask Machine Learning Approach

IF 5.1 2区 计算机科学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS
Qianyu Long;Christos Anagnostopoulos;Kostas Kolomvatsos
{"title":"Enhancing Knowledge Reusability: A Distributed Multitask Machine Learning Approach","authors":"Qianyu Long;Christos Anagnostopoulos;Kostas Kolomvatsos","doi":"10.1109/TETC.2024.3390811","DOIUrl":null,"url":null,"abstract":"In the era of the Internet of Things, the unprecedented growth of data surpasses current predictive analytics and processing capabilities. Due to the potential redundancy of similar data and analytics tasks, it is imperative to extract patterns from distributed data <italic>and</i> predictive models so that existing schemes can be efficiently reused in distributed computing environments. This is expected to avoid building and maintaining <italic>reduplicative</i> predictive models. The fundamental challenge, however, is the detection of reusable tasks and tuning models in order to improve predictive capacity while being reused. We introduce a two-phase Distributed Multi-task Machine Learning (DMtL) framework coping with this challenge. In the first phase, similar tasks are identified and efficiently grouped together according to locally trained models’ performance meta-features, using Partial Learning Curves (PLC). In the subsequent phase, we leverage the PLC-driven DMtL paradigm to boost the performance of candidate reusable models per group of tasks in distributed computing environments. We provide a thorough analysis of our framework along with a comparative assessment against relevant approaches and prior work found in the respective literature. Our experimental results showcase the feasibility of the PLC-driven DMtL method in terms of adaptability and reusability of existing knowledge in distributed computing systems.","PeriodicalId":13156,"journal":{"name":"IEEE Transactions on Emerging Topics in Computing","volume":"13 1","pages":"207-221"},"PeriodicalIF":5.1000,"publicationDate":"2024-03-06","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"IEEE Transactions on Emerging Topics in Computing","FirstCategoryId":"94","ListUrlMain":"https://ieeexplore.ieee.org/document/10521461/","RegionNum":2,"RegionCategory":"计算机科学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"Q1","JCRName":"COMPUTER SCIENCE, INFORMATION SYSTEMS","Score":null,"Total":0}
引用次数: 0

Abstract

In the era of the Internet of Things, the unprecedented growth of data surpasses current predictive analytics and processing capabilities. Due to the potential redundancy of similar data and analytics tasks, it is imperative to extract patterns from distributed data and predictive models so that existing schemes can be efficiently reused in distributed computing environments. This is expected to avoid building and maintaining reduplicative predictive models. The fundamental challenge, however, is the detection of reusable tasks and tuning models in order to improve predictive capacity while being reused. We introduce a two-phase Distributed Multi-task Machine Learning (DMtL) framework coping with this challenge. In the first phase, similar tasks are identified and efficiently grouped together according to locally trained models’ performance meta-features, using Partial Learning Curves (PLC). In the subsequent phase, we leverage the PLC-driven DMtL paradigm to boost the performance of candidate reusable models per group of tasks in distributed computing environments. We provide a thorough analysis of our framework along with a comparative assessment against relevant approaches and prior work found in the respective literature. Our experimental results showcase the feasibility of the PLC-driven DMtL method in terms of adaptability and reusability of existing knowledge in distributed computing systems.
增强知识的可重用性:分布式多任务机器学习方法
在物联网时代,前所未有的数据增长超过了当前的预测分析和处理能力。由于类似的数据和分析任务可能存在冗余,因此必须从分布式数据和预测模型中提取模式,以便在分布式计算环境中有效地重用现有的模式。这样可以避免构建和维护重复的预测模型。然而,最根本的挑战是检测可重用任务和调优模型,以便在重用时提高预测能力。我们引入了一个两阶段分布式多任务机器学习(DMtL)框架来应对这一挑战。在第一阶段,使用部分学习曲线(PLC),根据局部训练模型的性能元特征识别相似的任务并有效地分组在一起。在随后的阶段,我们利用plc驱动的DMtL范例来提高分布式计算环境中每组任务的候选可重用模型的性能。我们对我们的框架进行了全面的分析,并对相关方法和各自文献中发现的先前工作进行了比较评估。我们的实验结果展示了plc驱动的DMtL方法在分布式计算系统中现有知识的适应性和可重用性方面的可行性。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
IEEE Transactions on Emerging Topics in Computing
IEEE Transactions on Emerging Topics in Computing Computer Science-Computer Science (miscellaneous)
CiteScore
12.10
自引率
5.10%
发文量
113
期刊介绍: IEEE Transactions on Emerging Topics in Computing publishes papers on emerging aspects of computer science, computing technology, and computing applications not currently covered by other IEEE Computer Society Transactions. Some examples of emerging topics in computing include: IT for Green, Synthetic and organic computing structures and systems, Advanced analytics, Social/occupational computing, Location-based/client computer systems, Morphic computer design, Electronic game systems, & Health-care IT.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信