Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz
{"title":"Filtered not Mixed: Stochastic Filtering-Based Online Gating for Mixture of Large Language Models","authors":"Raeid Saqur, Anastasis Kratsios, Florian Krach, Yannick Limmer, Jacob-Junqi Tian, John Willes, Blanka Horvath, Frank Rudzicz","doi":"arxiv-2406.02969","DOIUrl":null,"url":null,"abstract":"We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained\nexpert Large Language Models (LLMs) in online time-series prediction tasks by\nadaptively forecasting the best weighting of LLM predictions at every time\nstep. Our mechanism leverages the conditional information in each expert's\nrunning performance to forecast the best combination of LLMs for predicting the\ntime series in its next step. Diverging from static (learned) Mixture of\nExperts (MoE) methods, MoE-F employs time-adaptive stochastic filtering\ntechniques to combine experts. By framing the expert selection problem as a\nfinite state-space, continuous-time Hidden Markov model (HMM), we can leverage\nthe Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters\ncorresponding to each of the $N$ individual LLMs. Each filter proposes its best\ncombination of LLMs, given the information that they have access to.\nSubsequently, the $N$ filter outputs are aggregated to optimize a lower bound\nfor the loss of the aggregated LLMs, which can be optimized in closed-form,\nthus generating our ensemble predictor. Our contributions here are: (I) the\nMoE-F algorithm -- deployable as a plug-and-play filtering harness, (II)\ntheoretical optimality guarantees of the proposed filtering-based gating\nalgorithm, and (III) empirical evaluation and ablative results using state of\nthe art foundational and MoE LLMs on a real-world Financial Market Movement\ntask where MoE-F attains a remarkable 17% absolute and 48.5% relative F1\nmeasure improvement over the next best performing individual LLM expert.","PeriodicalId":501084,"journal":{"name":"arXiv - QuantFin - Mathematical Finance","volume":"59 1","pages":""},"PeriodicalIF":0.0000,"publicationDate":"2024-06-05","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"arXiv - QuantFin - Mathematical Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/arxiv-2406.02969","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
We propose MoE-F -- a formalised mechanism for combining $N$ pre-trained
expert Large Language Models (LLMs) in online time-series prediction tasks by
adaptively forecasting the best weighting of LLM predictions at every time
step. Our mechanism leverages the conditional information in each expert's
running performance to forecast the best combination of LLMs for predicting the
time series in its next step. Diverging from static (learned) Mixture of
Experts (MoE) methods, MoE-F employs time-adaptive stochastic filtering
techniques to combine experts. By framing the expert selection problem as a
finite state-space, continuous-time Hidden Markov model (HMM), we can leverage
the Wohman-Shiryaev filter. Our approach first constructs $N$ parallel filters
corresponding to each of the $N$ individual LLMs. Each filter proposes its best
combination of LLMs, given the information that they have access to.
Subsequently, the $N$ filter outputs are aggregated to optimize a lower bound
for the loss of the aggregated LLMs, which can be optimized in closed-form,
thus generating our ensemble predictor. Our contributions here are: (I) the
MoE-F algorithm -- deployable as a plug-and-play filtering harness, (II)
theoretical optimality guarantees of the proposed filtering-based gating
algorithm, and (III) empirical evaluation and ablative results using state of
the art foundational and MoE LLMs on a real-world Financial Market Movement
task where MoE-F attains a remarkable 17% absolute and 48.5% relative F1
measure improvement over the next best performing individual LLM expert.