{"title":"Optimal text-based time-series indices","authors":"David Ardia , Keven Bluteau","doi":"10.1016/j.ijforecast.2025.07.003","DOIUrl":null,"url":null,"abstract":"<div><div>We propose an approach to construct text-based time-series indices in an optimal way—typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. Our methodology relies on binary selection matrices that, applied to the vocabulary of tokens, select the relevant texts in the corpus. Various widely known text-based indices, such as the Economic Policy Uncertainty (EPU) index, can be formulated in terms of selection matrices. We design a genetic algorithm with domain-specific knowledge featuring tailor-made crossover and mutation operations to perform the complex optimization. We illustrate our methodology with a corpus of news articles from the <em>Wall Street Journal</em> by optimizing text-based indices that forecast inflation at various horizons.</div></div>","PeriodicalId":14061,"journal":{"name":"International Journal of Forecasting","volume":"42 1","pages":"Pages 44-60"},"PeriodicalIF":7.1000,"publicationDate":"2026-01-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"International Journal of Forecasting","FirstCategoryId":"96","ListUrlMain":"https://www.sciencedirect.com/science/article/pii/S0169207025000627","RegionNum":2,"RegionCategory":"经济学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"2025/8/9 0:00:00","PubModel":"Epub","JCR":"Q1","JCRName":"ECONOMICS","Score":null,"Total":0}
引用次数: 0
Abstract
We propose an approach to construct text-based time-series indices in an optimal way—typically, indices that maximize the contemporaneous relation or the predictive performance with respect to a target variable, such as inflation. Our methodology relies on binary selection matrices that, applied to the vocabulary of tokens, select the relevant texts in the corpus. Various widely known text-based indices, such as the Economic Policy Uncertainty (EPU) index, can be formulated in terms of selection matrices. We design a genetic algorithm with domain-specific knowledge featuring tailor-made crossover and mutation operations to perform the complex optimization. We illustrate our methodology with a corpus of news articles from the Wall Street Journal by optimizing text-based indices that forecast inflation at various horizons.
我们提出了一种以最优方式构建基于文本的时间序列指数的方法-通常,指数最大化同期关系或相对于目标变量(如通货膨胀)的预测性能。我们的方法依赖于二进制选择矩阵,应用于标记的词汇表,选择语料库中的相关文本。各种众所周知的基于文本的指数,如经济政策不确定性(EPU)指数,都可以根据选择矩阵来制定。我们设计了一种具有特定领域知识的遗传算法,该算法具有定制的交叉和突变操作来执行复杂的优化。我们以《华尔街日报》(Wall Street Journal)的大量新闻文章为例,通过优化基于文本的指数来说明我们的方法,这些指数可以预测不同时期的通货膨胀。
期刊介绍:
The International Journal of Forecasting is a leading journal in its field that publishes high quality refereed papers. It aims to bridge the gap between theory and practice, making forecasting useful and relevant for decision and policy makers. The journal places strong emphasis on empirical studies, evaluation activities, implementation research, and improving the practice of forecasting. It welcomes various points of view and encourages debate to find solutions to field-related problems. The journal is the official publication of the International Institute of Forecasters (IIF) and is indexed in Sociological Abstracts, Journal of Economic Literature, Statistical Theory and Method Abstracts, INSPEC, Current Contents, UMI Data Courier, RePEc, Academic Journal Guide, CIS, IAOR, and Social Sciences Citation Index.