利用热启动电磁学习大型软最大混合物

Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp
{"title":"利用热启动电磁学习大型软最大混合物","authors":"Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp","doi":"arxiv-2409.09903","DOIUrl":null,"url":null,"abstract":"Mixed multinomial logits are discrete mixtures introduced several decades ago\nto model the probability of choosing an attribute from $p$ possible candidates,\nin heterogeneous populations. The model has recently attracted attention in the\nAI literature, under the name softmax mixtures, where it is routinely used in\nthe final layer of a neural network to map a large number $p$ of vectors in\n$\\mathbb{R}^L$ to a probability vector. Despite its wide applicability and\nempirical success, statistically optimal estimators of the mixture parameters,\nobtained via algorithms whose running time scales polynomially in $L$, are not\nknown. This paper provides a solution to this problem for contemporary\napplications, such as large language models, in which the mixture has a large\nnumber $p$ of support points, and the size $N$ of the sample observed from the\nmixture is also large. Our proposed estimator combines two classical\nestimators, obtained respectively via a method of moments (MoM) and the\nexpectation-minimization (EM) algorithm. Although both estimator types have\nbeen studied, from a theoretical perspective, for Gaussian mixtures, no similar\nresults exist for softmax mixtures for either procedure. We develop a new MoM\nparameter estimator based on latent moment estimation that is tailored to our\nmodel, and provide the first theoretical analysis for a MoM-based procedure in\nsoftmax mixtures. Although consistent, MoM for softmax mixtures can exhibit\npoor numerical performance, as observed other mixture models. Nevertheless, as\nMoM is provably in a neighborhood of the target, it can be used as warm start\nfor any iterative algorithm. We study in detail the EM algorithm, and provide\nits first theoretical analysis for softmax mixtures. Our final proposal for\nparameter estimation is the EM algorithm with a MoM warm start.","PeriodicalId":501379,"journal":{"name":"arXiv - STAT - Statistics Theory","volume":"9 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-09-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Learning large softmax mixtures with warm start EM\",\"authors\":\"Xin Bing, Florentina Bunea, Jonathan Niles-Weed, Marten Wegkamp\",\"doi\":\"arxiv-2409.09903\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"Mixed multinomial logits are discrete mixtures introduced several decades ago\\nto model the probability of choosing an attribute from $p$ possible candidates,\\nin heterogeneous populations. The model has recently attracted attention in the\\nAI literature, under the name softmax mixtures, where it is routinely used in\\nthe final layer of a neural network to map a large number $p$ of vectors in\\n$\\\\mathbb{R}^L$ to a probability vector. Despite its wide applicability and\\nempirical success, statistically optimal estimators of the mixture parameters,\\nobtained via algorithms whose running time scales polynomially in $L$, are not\\nknown. This paper provides a solution to this problem for contemporary\\napplications, such as large language models, in which the mixture has a large\\nnumber $p$ of support points, and the size $N$ of the sample observed from the\\nmixture is also large. Our proposed estimator combines two classical\\nestimators, obtained respectively via a method of moments (MoM) and the\\nexpectation-minimization (EM) algorithm. Although both estimator types have\\nbeen studied, from a theoretical perspective, for Gaussian mixtures, no similar\\nresults exist for softmax mixtures for either procedure. We develop a new MoM\\nparameter estimator based on latent moment estimation that is tailored to our\\nmodel, and provide the first theoretical analysis for a MoM-based procedure in\\nsoftmax mixtures. Although consistent, MoM for softmax mixtures can exhibit\\npoor numerical performance, as observed other mixture models. Nevertheless, as\\nMoM is provably in a neighborhood of the target, it can be used as warm start\\nfor any iterative algorithm. We study in detail the EM algorithm, and provide\\nits first theoretical analysis for softmax mixtures. Our final proposal for\\nparameter estimation is the EM algorithm with a MoM warm start.\",\"PeriodicalId\":501379,\"journal\":{\"name\":\"arXiv - STAT - Statistics Theory\",\"volume\":\"9 1\",\"pages\":\"\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2024-09-16\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"arXiv - STAT - Statistics Theory\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/arxiv-2409.09903\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - STAT - Statistics Theory","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2409.09903","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

摘要

混合多项式对数是几十年前引入的离散混合物,用于模拟在异质人群中从 $p$ 可能的候选属性中选择一个属性的概率。该模型最近在人工智能文献中引起了关注,被称为软最大混合物,通常用于神经网络的最后一层,将大量 $p$ 的向量映射到概率向量中。尽管混合物参数具有广泛的适用性和成功的经验,但通过运行时间在 $L$ 中呈多项式缩放的算法获得的混合物参数的统计最优估计值却并不为人所知。本文为这一问题提供了当代应用的解决方案,例如大型语言模型,其中混合物具有大量 $p$ 支持点,而且从混合物中观察到的样本大小 $N$ 也很大。我们提出的估计器结合了两种经典估计器,分别通过矩量法(MoM)和期望最小化算法(EM)获得。虽然从理论上讲,这两种估计方法都针对高斯混合物进行过研究,但对于软最大混合物,这两种方法都没有类似的结果。我们开发了一种新的基于潜矩估计的 MoM 参数估计器,它是为我们的模型量身定制的,并首次为基于 MoM 的软最大混合物程序提供了理论分析。软最大混合物的 MoM 虽然具有一致性,但与其他混合物模型一样,可能会表现出较差的数值性能。不过,由于 MoM 可以证明是在目标邻域内,因此它可以用作任何迭代算法的暖起点。我们详细研究了 EM 算法,并首次对软最大混合物进行了理论分析。我们对参数估计的最终建议是使用 MoM 暖起始的 EM 算法。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
Learning large softmax mixtures with warm start EM
Mixed multinomial logits are discrete mixtures introduced several decades ago to model the probability of choosing an attribute from $p$ possible candidates, in heterogeneous populations. The model has recently attracted attention in the AI literature, under the name softmax mixtures, where it is routinely used in the final layer of a neural network to map a large number $p$ of vectors in $\mathbb{R}^L$ to a probability vector. Despite its wide applicability and empirical success, statistically optimal estimators of the mixture parameters, obtained via algorithms whose running time scales polynomially in $L$, are not known. This paper provides a solution to this problem for contemporary applications, such as large language models, in which the mixture has a large number $p$ of support points, and the size $N$ of the sample observed from the mixture is also large. Our proposed estimator combines two classical estimators, obtained respectively via a method of moments (MoM) and the expectation-minimization (EM) algorithm. Although both estimator types have been studied, from a theoretical perspective, for Gaussian mixtures, no similar results exist for softmax mixtures for either procedure. We develop a new MoM parameter estimator based on latent moment estimation that is tailored to our model, and provide the first theoretical analysis for a MoM-based procedure in softmax mixtures. Although consistent, MoM for softmax mixtures can exhibit poor numerical performance, as observed other mixture models. Nevertheless, as MoM is provably in a neighborhood of the target, it can be used as warm start for any iterative algorithm. We study in detail the EM algorithm, and provide its first theoretical analysis for softmax mixtures. Our final proposal for parameter estimation is the EM algorithm with a MoM warm start.
求助全文
通过发布文献求助,成功后即可免费获取论文全文。 去求助
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信