{"title":"Dynamics of Meta-learning Representation in the Teacher-student Scenario","authors":"Hui Wang, Cho Tung Yip, Bo Li","doi":"arxiv-2408.12545","DOIUrl":null,"url":null,"abstract":"Gradient-based meta-learning algorithms have gained popularity for their\nability to train models on new tasks using limited data. Empirical observations\nindicate that such algorithms are able to learn a shared representation across\ntasks, which is regarded as a key factor in their success. However, the\nin-depth theoretical understanding of the learning dynamics and the origin of\nthe shared representation remains underdeveloped. In this work, we investigate\nthe meta-learning dynamics of the non-linear two-layer neural networks trained\non streaming tasks in the teach-student scenario. Through the lens of\nstatistical physics analysis, we characterize the macroscopic behavior of the\nmeta-training processes, the formation of the shared representation, and the\ngeneralization ability of the model on new tasks. The analysis also points to\nthe importance of the choice of certain hyper-parameters of the learning\nalgorithms.","PeriodicalId":501066,"journal":{"name":"arXiv - PHYS - Disordered Systems and Neural Networks","volume":"8 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-08-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - PHYS - Disordered Systems and Neural Networks","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2408.12545","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
Gradient-based meta-learning algorithms have gained popularity for their
ability to train models on new tasks using limited data. Empirical observations
indicate that such algorithms are able to learn a shared representation across
tasks, which is regarded as a key factor in their success. However, the
in-depth theoretical understanding of the learning dynamics and the origin of
the shared representation remains underdeveloped. In this work, we investigate
the meta-learning dynamics of the non-linear two-layer neural networks trained
on streaming tasks in the teach-student scenario. Through the lens of
statistical physics analysis, we characterize the macroscopic behavior of the
meta-training processes, the formation of the shared representation, and the
generalization ability of the model on new tasks. The analysis also points to
the importance of the choice of certain hyper-parameters of the learning
algorithms.