无监督终身学习课程

Proceedings of the Web Conference 2021 Pub Date : 2021-04-19 DOI:10.1145/3442381.3449839

Yi He, Sheng Chen, Baijun Wu, Xu Yuan, Xindong Wu

{"title":"无监督终身学习课程","authors":"Yi He, Sheng Chen, Baijun Wu, Xu Yuan, Xindong Wu","doi":"10.1145/3442381.3449839","DOIUrl":null,"url":null,"abstract":"Lifelong machine learning (LML) has driven the development of extensive web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly apply it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula (ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. Specifically, when faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledge base. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through extensive empirical studies on both synthetic and real datasets.","PeriodicalId":106672,"journal":{"name":"Proceedings of the Web Conference 2021","volume":"67 1","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2021-04-19","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"5","resultStr":"{\"title\":\"Unsupervised Lifelong Learning with Curricula\",\"authors\":\"Yi He, Sheng Chen, Baijun Wu, Xu Yuan, Xindong Wu\",\"doi\":\"10.1145/3442381.3449839\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Lifelong machine learning (LML) has driven the development of extensive web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly apply it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula (ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. Specifically, when faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledge base. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through extensive empirical studies on both synthetic and real datasets.\",\"PeriodicalId\":106672,\"journal\":{\"name\":\"Proceedings of the Web Conference 2021\",\"volume\":\"67 1\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2021-04-19\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"5\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"Proceedings of the Web Conference 2021\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1145/3442381.3449839\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"Proceedings of the Web Conference 2021","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1145/3442381.3449839","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}

引用次数: 5

摘要

终身机器学习(LML)推动了广泛的web应用程序的发展，使部署在web服务器上的学习系统能够以增量方式处理一系列任务。这样的系统可以将学习任务中的知识保留在知识库中，并无缝地应用于未来的学习。不幸的是，大多数现有的LML方法都要求在每个任务中都有标签，而为所有未来的任务提供持久的人工标签是昂贵的、繁重的、容易出错的，因此不切实际。在这种情况下，我们提出了一种新的范式，称为带课程的无监督终身学习(ULLC)，其中只需要标记一个任务进行初始化，然后系统以无监督的方式对后续任务执行终身学习。实现这一范式的一个主要挑战在于负知识迁移的发生，即部分旧知识对学习给定任务是有害的，但在没有标签的帮助下，学习者无法过滤掉。为了克服这一挑战，我们从人类的学习行为中汲取见解。具体来说，当面对一个我们现有的知识不能很好地解决的困难任务时，我们通常会推迟它，先做一些更容易的任务，这样可以让我们的知识增长。此后，一旦我们回到推迟的任务，我们更有可能解决它，因为我们现在更有知识。ULLC的关键思想是类似的——在任何时候，候选任务池根据它们与知识库的距离在课程中组织起来。然后，学习者从较近的任务开始，在学习中积累知识，并随着知识库的逐渐增加而学习较远的任务。通过对合成和真实数据集的广泛实证研究，我们的建议的可行性和有效性得到了证实。

本文章由计算机程序翻译，如有差异，请以英文原文为准。

查看原文本刊更多论文

Unsupervised Lifelong Learning with Curricula

Lifelong machine learning (LML) has driven the development of extensive web applications, enabling the learning systems deployed on web servers to deal with a sequence of tasks in an incremental fashion. Such systems can retain knowledge from learned tasks in a knowledge base and seamlessly apply it to improve the future learning. Unfortunately, most existing LML methods require labels in every task, whereas providing persistent human labeling for all future tasks is costly, onerous, error-prone, and hence impractical. Motivated by this situation, we propose a new paradigm named unsupervised lifelong learning with curricula (ULLC), where only one task needs to be labeled for initialization and the system then performs lifelong learning for subsequent tasks in an unsupervised fashion. A main challenge of realizing this paradigm lies in the occurrence of negative knowledge transfer, where partial old knowledge becomes detrimental for learning a given task yet cannot be filtered out by the learner without the help of labels. To overcome this challenge, we draw insights from the learning behaviors of humans. Specifically, when faced with a difficult task that cannot be well tackled by our current knowledge, we usually postpone it and work on some easier tasks first, which allows us to grow our knowledge. Thereafter, once we go back to the postponed task, we are more likely to tackle it well as we are more knowledgeable now. The key idea of ULLC is similar – at any time, a pool of candidate tasks are organized in a curriculum by their distances to the knowledge base. The learner then starts from the closer tasks, accumulates knowledge from learning them, and moves to learn the faraway tasks with a gradually augmented knowledge base. The viability and effectiveness of our proposal are substantiated through extensive empirical studies on both synthetic and real datasets.

求助全文

通过发布文献求助，成功后即可免费获取论文全文。去求助

来源期刊

Proceedings of the Web Conference 2021

自引率

0.00%

发文量