Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models

Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz
{"title":"Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models","authors":"Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz","doi":"arxiv-2406.02969","DOIUrl":null,"url":null,"abstract":"We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained\nexpert Large Language Models (LLMs) in online time-series prediction tasks by\nadaptively forecasting the best weighting of LLM predictions at every time\nstep. Our mechanism leverages the conditional information in each expert's\nrunning performance to forecast the best combination of LLMs for predicting the\ntime series in its next step. Diverging from static (learned) Mixture of\nExperts (MoE) methods, MoE-F employs time-adaptive stochastic filtering\ntechniques to combine experts. By framing the expert selection problem as a\nfinite state-space, continuous-time Hidden Markov model (HMM), we can leverage\nthe Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters\ncorresponding to each of the $N$ individual LLMs. Each filter proposes its best\ncombination of LLMs, given the information that they have access to.\nSubsequently, the $N$ filter outputs are aggregated to optimize a lower bound\nfor the loss of the aggregated LLMs, which can be optimized in closed-form,\nthus generating our ensemble predictor. Our contributions here are: (I) the\nMoE-F algorithm -- deployable as a plug-and-play filtering harness, (II)\ntheoretical optimality guarantees of the proposed filtering-based gating\nalgorithm, and (III) empirical evaluation and ablative results using state of\nthe art foundational and MoE LLMs on a real-world Financial Market Movement\ntask where MoE-F attains a remarkable 17% absolute and 48.5% relative F1\nmeasure improvement over the next best performing individual LLM expert.","PeriodicalId":501084,"journal":{"name":"arXiv - QuantFin - Mathematical Finance","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.02969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0

Abstract

We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained expert Large Language Models (LLMs) in online time-series prediction tasks by adaptively forecasting the best weighting of LLM predictions at every time step. Our mechanism leverages the conditional information in each expert's running performance to forecast the best combination of LLMs for predicting the time series in its next step. Diverging from static (learned) Mixture of Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering techniques to combine experts. By framing the expert selection problem as a finite state-space, continuous-time Hidden Markov model (HMM), we can leverage the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters corresponding to each of the $N$ individual LLMs. Each filter proposes its best combination of LLMs, given the information that they have access to. Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound for the loss of the aggregated LLMs, which can be optimized in closed-form, thus generating our ensemble predictor. Our contributions here are: (I) the MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II) theoretical optimality guarantees of the proposed filtering-based gating algorithm, and (III) empirical evaluation and ablative results using state of the art foundational and MoE LLMs on a real-world Financial Market Movement task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1 measure improvement over the next best performing individual LLM expert.
过滤而非混合:基于随机过滤的大型语言模型混合在线门控技术
我们提出了 MoE-F--一种正式的机制,用于在在线时间序列预测任务中结合 $N$ 预先训练好的专家大型语言模型(LLM),方法是在每个时间步骤中适应性地预测 LLM 预测的最佳权重。我们的机制利用每个专家运行性能中的条件信息,预测下一步预测时间序列的最佳 LLM 组合。与静态(学习)专家混合(MoE)方法不同,MoE-F 采用了时间适应性随机过滤技术来组合专家。通过将专家选择问题设定为无限状态空间、连续时间隐马尔可夫模型(HMM),我们可以利用沃曼-希尔亚耶夫滤波器(Wohman-Shiryaev filter)。我们的方法首先构建了 $N$ 平行滤波器,分别对应 $N$ 个单独的 LLM。然后,将 $N$ 滤波器输出汇总,优化汇总 LLMs 的损失下限,该损失下限可通过闭合形式进行优化,从而生成我们的集合预测器。我们的贡献在于(I)MoE-F算法--可部署为即插即用的滤波线束;(II)基于滤波的门控算法的理论最优性保证;(III)在真实世界的金融市场运动任务中使用最先进的基础LLM和MoE LLM进行实证评估和分析结果,其中MoE-F与性能仅次于它的单个LLM专家相比,绝对F1度量提高了17%,相对F1度量提高了48.5%。
本文章由计算机程序翻译,如有差异,请以英文原文为准。
求助全文
约1分钟内获得全文 求助全文
来源期刊
自引率
0.00%
发文量
0
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
确定
请完成安全验证×
copy
已复制链接
快去分享给好友吧!
我知道了
右上角分享
点击右上角分享
0
联系我们:info@booksci.cn Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。 Copyright © 2023 布克学术 All rights reserved.
京ICP备2023020795号-1
ghs 京公网安备 11010802042870号
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术官方微信