{"title":"基于可解释图像的ETF市场价格趋势预测深度学习","authors":"Ruixun Zhang, Chaoyi Zhao, Guanglian Lin","doi":"10.1080/1351847x.2023.2275567","DOIUrl":null,"url":null,"abstract":"AbstractImage-based deep learning models excel at extracting spatial information from images but their potential in financial applications has not been fully explored. Here we propose the channel and spatial attention convolutional neural network (CS-ACNN) for price trend prediction. It utilizes the attention mechanisms to focus on specific areas of input images that are the most relevant for prices. Using exchange-traded funds (ETF) data from three different markets, we show that CS-ACNN – using images constructed from financial time series – achieves on-par and, in some cases, superior performances compared to models that use time series data only. This holds true for both model classification metrics and investment profitability, and the out-of-sample Sharpe ratios range from 1.57 to 3.03 after accounting for transaction costs. The model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.Keywords: Price trend predictionconvolutional neural network (CNN)attentionimageinterpretabilityJEL Classifications: C45G11G12G15 AcknowledgmentsWe thank Xiuli Shao for very helpful comments and discussion.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Specific neural network architectures in this literature include the fully-connected neural networks (Gu, Kelly, and Xiu Citation2020), autoencoders (Gu, Kelly, and Xiu Citation2021), and sequence models (Cong et al. Citation2021a, Citation2021b).2 Jiang, Kelly, and Xiu (Citation2022), a primary example in this literature, focus on learning price patterns from candlestick charts for future price trends, while our framework is able to extract information from both candlestick charts and, more broadly, any images constructed from financial time series.3 We use Python's mpl_finance module, and adopt the convention in China to represent positive trends with red and negative trends with green.4 In particular, they are defined by whether the closing price is higher than the opening price of the day.5 See, for example, Borgefors (Citation1986) and Fang et al. (Citation2021).6 Se(p′) goes to ±∞ when p is very close to 0 or 1. In practice, we clip Se(p) to be between 0 and 1.7 To feed the data into the convolutional neural network, these images are resized and cropped to 112×64 pixels.8 This is referred to as the Gramian Summation Angular Field (GASF) by Wang and Oates (Citation2015). If we define an inner product as ⟨x,y⟩=xy−1−x2⋅1−y2, the image G in Equation (Equation11(11) G=[cos(ϕ1+ϕ1)⋯cos(ϕ1+ϕT)cos(ϕ2+ϕ1)⋯cos(ϕ2+ϕT)⋮⋱⋮cos(ϕT+ϕ1)⋯cos(ϕT+ϕT)]=X~⋅X~′−I−X~2⋅I−X~2′,(11) ) constitute a quasi-Gramian matrix under this inner product.9 The number of filters in VggNet (the number of output channels after convolution) starts from 64 and increases exponentially after each max-pooling operation. The convolution mode of VggNet is ‘same’, meaning that the dimension of the output image after convolution is the same as the input, and its downsampling is realized by the max-pooling operation.10 The number of convolution kernels in the original VggNet grows from 64 to 512. We choose smaller numbers to mitigate overfitting.11 A small kernel is also consistent with the fact that our images have a relatively small resolution, and a small filter is able to capture local details better.12 Here we use parenthesis on H×W to highlight that query, key, and value are two-dimensional matrices, where the first dimension is of length H×W and the second dimension is of length C.13 We configure the LSTM to be: Hidden layer (32 neurons) + Hidden layer (64 neurons) + Dropout(0.25) + Fully connected layer.14 We configure the 1D-CNN to be: Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) + Fully connected layer.15 The buy-and-hold strategy is equivalent to classifying all samples into ‘up’. Table 1 shows that there are 822 ‘up’ days and 641 ‘down’ days for SPY in the test set, implying an accuracy of 822/(822+641)=0.562.16 All these experiments are conducted on a laptop equipped with an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17 A single chart may contain more than one technical pattern.18 The only exception is TBOT for 2833.HK.Additional informationFundingResearch support from the National Key R&D Program of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and the Fundamental Research Funds for the Central Universities (Peking University) is gratefully acknowledged.Notes on contributorsRuixun ZhangRuixun Zhang is an assistant professor at Peking University.Chaoyi ZhaoChaoyi Zhao is a student at Peking University.Guanglian LinGuanglian Lin is a student at Nankai University.","PeriodicalId":22468,"journal":{"name":"The European Journal of Finance","volume":"29 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":"{\"title\":\"Interpretable image-based deep learning for price trend prediction in ETF markets\",\"authors\":\"Ruixun Zhang, Chaoyi Zhao, Guanglian Lin\",\"doi\":\"10.1080/1351847x.2023.2275567\",\"DOIUrl\":null,\"url\":null,\"abstract\":\"AbstractImage-based deep learning models excel at extracting spatial information from images but their potential in financial applications has not been fully explored. Here we propose the channel and spatial attention convolutional neural network (CS-ACNN) for price trend prediction. It utilizes the attention mechanisms to focus on specific areas of input images that are the most relevant for prices. Using exchange-traded funds (ETF) data from three different markets, we show that CS-ACNN – using images constructed from financial time series – achieves on-par and, in some cases, superior performances compared to models that use time series data only. This holds true for both model classification metrics and investment profitability, and the out-of-sample Sharpe ratios range from 1.57 to 3.03 after accounting for transaction costs. The model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.Keywords: Price trend predictionconvolutional neural network (CNN)attentionimageinterpretabilityJEL Classifications: C45G11G12G15 AcknowledgmentsWe thank Xiuli Shao for very helpful comments and discussion.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Specific neural network architectures in this literature include the fully-connected neural networks (Gu, Kelly, and Xiu Citation2020), autoencoders (Gu, Kelly, and Xiu Citation2021), and sequence models (Cong et al. Citation2021a, Citation2021b).2 Jiang, Kelly, and Xiu (Citation2022), a primary example in this literature, focus on learning price patterns from candlestick charts for future price trends, while our framework is able to extract information from both candlestick charts and, more broadly, any images constructed from financial time series.3 We use Python's mpl_finance module, and adopt the convention in China to represent positive trends with red and negative trends with green.4 In particular, they are defined by whether the closing price is higher than the opening price of the day.5 See, for example, Borgefors (Citation1986) and Fang et al. (Citation2021).6 Se(p′) goes to ±∞ when p is very close to 0 or 1. In practice, we clip Se(p) to be between 0 and 1.7 To feed the data into the convolutional neural network, these images are resized and cropped to 112×64 pixels.8 This is referred to as the Gramian Summation Angular Field (GASF) by Wang and Oates (Citation2015). If we define an inner product as ⟨x,y⟩=xy−1−x2⋅1−y2, the image G in Equation (Equation11(11) G=[cos(ϕ1+ϕ1)⋯cos(ϕ1+ϕT)cos(ϕ2+ϕ1)⋯cos(ϕ2+ϕT)⋮⋱⋮cos(ϕT+ϕ1)⋯cos(ϕT+ϕT)]=X~⋅X~′−I−X~2⋅I−X~2′,(11) ) constitute a quasi-Gramian matrix under this inner product.9 The number of filters in VggNet (the number of output channels after convolution) starts from 64 and increases exponentially after each max-pooling operation. The convolution mode of VggNet is ‘same’, meaning that the dimension of the output image after convolution is the same as the input, and its downsampling is realized by the max-pooling operation.10 The number of convolution kernels in the original VggNet grows from 64 to 512. We choose smaller numbers to mitigate overfitting.11 A small kernel is also consistent with the fact that our images have a relatively small resolution, and a small filter is able to capture local details better.12 Here we use parenthesis on H×W to highlight that query, key, and value are two-dimensional matrices, where the first dimension is of length H×W and the second dimension is of length C.13 We configure the LSTM to be: Hidden layer (32 neurons) + Hidden layer (64 neurons) + Dropout(0.25) + Fully connected layer.14 We configure the 1D-CNN to be: Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) + Fully connected layer.15 The buy-and-hold strategy is equivalent to classifying all samples into ‘up’. Table 1 shows that there are 822 ‘up’ days and 641 ‘down’ days for SPY in the test set, implying an accuracy of 822/(822+641)=0.562.16 All these experiments are conducted on a laptop equipped with an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17 A single chart may contain more than one technical pattern.18 The only exception is TBOT for 2833.HK.Additional informationFundingResearch support from the National Key R&D Program of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and the Fundamental Research Funds for the Central Universities (Peking University) is gratefully acknowledged.Notes on contributorsRuixun ZhangRuixun Zhang is an assistant professor at Peking University.Chaoyi ZhaoChaoyi Zhao is a student at Peking University.Guanglian LinGuanglian Lin is a student at Nankai University.\",\"PeriodicalId\":22468,\"journal\":{\"name\":\"The European Journal of Finance\",\"volume\":\"29 5\",\"pages\":\"0\"},\"PeriodicalIF\":0.0000,\"publicationDate\":\"2023-11-01\",\"publicationTypes\":\"Journal Article\",\"fieldsOfStudy\":null,\"isOpenAccess\":false,\"openAccessPdf\":\"\",\"citationCount\":\"0\",\"resultStr\":null,\"platform\":\"Semanticscholar\",\"paperid\":null,\"PeriodicalName\":\"The European Journal of Finance\",\"FirstCategoryId\":\"1085\",\"ListUrlMain\":\"https://doi.org/10.1080/1351847x.2023.2275567\",\"RegionNum\":0,\"RegionCategory\":null,\"ArticlePicture\":[],\"TitleCN\":null,\"AbstractTextCN\":null,\"PMCID\":null,\"EPubDate\":\"\",\"PubModel\":\"\",\"JCR\":\"\",\"JCRName\":\"\",\"Score\":null,\"Total\":0}","platform":"Semanticscholar","paperid":null,"PeriodicalName":"The European Journal of Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1351847x.2023.2275567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
摘要
摘要基于图像的深度学习模型擅长从图像中提取空间信息,但其在金融应用中的潜力尚未得到充分挖掘。在此,我们提出通道和空间注意卷积神经网络(CS-ACNN)用于价格趋势预测。它利用注意力机制来关注与价格最相关的输入图像的特定区域。使用来自三个不同市场的交易所交易基金(ETF)数据,我们表明CS-ACNN -使用从金融时间序列构建的图像-与仅使用时间序列数据的模型相比,达到了同等水平,在某些情况下,性能优于模型。这对模型分类指标和投资盈利能力都成立,在考虑交易成本后,样本外夏普比率的范围从1.57到3.03。该模型学习与传统技术分析一致的视觉模式,为学习模式提供经济原理,并允许投资者解释模型。关键词:价格趋势预测卷积神经网络(CNN)关注图像可解释性jel分类:C45G11G12G15致谢感谢邵秀丽非常有帮助的评论和讨论。披露声明作者未报告潜在的利益冲突。注1本文献中具体的神经网络架构包括全连接神经网络(Gu, Kelly, and Xiu Citation2020)、自编码器(Gu, Kelly, and Xiu Citation2021)和序列模型(Cong et al.)。Citation2021a Citation2021b)。2Jiang, Kelly和Xiu (Citation2022)是本文献中的一个主要例子,他们专注于从烛台图中学习未来价格趋势的价格模式,而我们的框架能够从烛台图和更广泛地说,从金融时间序列中构建的任何图像中提取信息我们使用Python的mpl_finance模块,采用中国的惯例,用红色表示积极趋势,用绿色表示消极趋势具体来说,它们是通过收盘价是否高于当天的开盘价来定义的例如,参见Borgefors (Citation1986)和Fang等人(Citation2021)当p非常接近0或1时,Se(p ')趋于±∞。在实践中,我们将Se(p)剪辑为0到1.7之间。为了将数据输入卷积神经网络,这些图像被调整大小并裁剪为112×64像素这被Wang和Oates (Citation2015)称为Gramian sum Angular Field (GASF)。如果我们将内积定义为⟨x,y⟩=xy - 1 - x2·1 - y2,则方程(Equation11(11) G=[cos (ϕ1+ϕ1)⋯cos (ϕ1+ϕT)cos (ϕ2+ϕ1)⋯cos (ϕ2+ϕT))]= x ~⋅x ~ ' - I - x ~2·I - x ~2 ',(11))中的像G构成该内积下的拟格兰矩阵VggNet中的过滤器数量(卷积后的输出通道数量)从64个开始,在每次最大池化操作后呈指数增长。VggNet的卷积模式为“相同”,即卷积后的输出图像的维数与输入图像的维数相同,其下采样是通过max-pooling操作实现的原始VggNet中的卷积核数从64个增加到512个。我们选择较小的数字以减轻过拟合一个小的内核也与我们的图像具有相对较小的分辨率这一事实相一致,并且一个小的过滤器能够更好地捕捉局部细节这里我们在H×W上使用括号来突出显示查询、键和值是二维矩阵,其中第一个维度的长度为H×W,第二个维度的长度为C.13我们将LSTM配置为:隐藏层(32个神经元)+隐藏层(64个神经元)+ Dropout(0.25) +完全连接层我们将1D-CNN配置为:Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) +全连接层买入并持有策略相当于将所有样本分类为“上行”。从表1可以看出,SPY在测试集中有822天“up”,641天“down”,准确率为822/(822+641)=0.562.16。所有这些实验都是在一台配备Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17的笔记本电脑上进行的一张图表可能包含一个以上的技术形态唯一的例外是2833.HK的TBOT。国家重点研发计划项目(2022YFA1007900)、国家自然科学基金项目(12271013)和中央高校(北京大学)基本科研业务费的支持。作者简介张瑞勋,北京大学助理教授。赵朝义是北京大学的一名学生。林光莲是南开大学的一名学生。
Interpretable image-based deep learning for price trend prediction in ETF markets
AbstractImage-based deep learning models excel at extracting spatial information from images but their potential in financial applications has not been fully explored. Here we propose the channel and spatial attention convolutional neural network (CS-ACNN) for price trend prediction. It utilizes the attention mechanisms to focus on specific areas of input images that are the most relevant for prices. Using exchange-traded funds (ETF) data from three different markets, we show that CS-ACNN – using images constructed from financial time series – achieves on-par and, in some cases, superior performances compared to models that use time series data only. This holds true for both model classification metrics and investment profitability, and the out-of-sample Sharpe ratios range from 1.57 to 3.03 after accounting for transaction costs. The model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.Keywords: Price trend predictionconvolutional neural network (CNN)attentionimageinterpretabilityJEL Classifications: C45G11G12G15 AcknowledgmentsWe thank Xiuli Shao for very helpful comments and discussion.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Specific neural network architectures in this literature include the fully-connected neural networks (Gu, Kelly, and Xiu Citation2020), autoencoders (Gu, Kelly, and Xiu Citation2021), and sequence models (Cong et al. Citation2021a, Citation2021b).2 Jiang, Kelly, and Xiu (Citation2022), a primary example in this literature, focus on learning price patterns from candlestick charts for future price trends, while our framework is able to extract information from both candlestick charts and, more broadly, any images constructed from financial time series.3 We use Python's mpl_finance module, and adopt the convention in China to represent positive trends with red and negative trends with green.4 In particular, they are defined by whether the closing price is higher than the opening price of the day.5 See, for example, Borgefors (Citation1986) and Fang et al. (Citation2021).6 Se(p′) goes to ±∞ when p is very close to 0 or 1. In practice, we clip Se(p) to be between 0 and 1.7 To feed the data into the convolutional neural network, these images are resized and cropped to 112×64 pixels.8 This is referred to as the Gramian Summation Angular Field (GASF) by Wang and Oates (Citation2015). If we define an inner product as ⟨x,y⟩=xy−1−x2⋅1−y2, the image G in Equation (Equation11(11) G=[cos(ϕ1+ϕ1)⋯cos(ϕ1+ϕT)cos(ϕ2+ϕ1)⋯cos(ϕ2+ϕT)⋮⋱⋮cos(ϕT+ϕ1)⋯cos(ϕT+ϕT)]=X~⋅X~′−I−X~2⋅I−X~2′,(11) ) constitute a quasi-Gramian matrix under this inner product.9 The number of filters in VggNet (the number of output channels after convolution) starts from 64 and increases exponentially after each max-pooling operation. The convolution mode of VggNet is ‘same’, meaning that the dimension of the output image after convolution is the same as the input, and its downsampling is realized by the max-pooling operation.10 The number of convolution kernels in the original VggNet grows from 64 to 512. We choose smaller numbers to mitigate overfitting.11 A small kernel is also consistent with the fact that our images have a relatively small resolution, and a small filter is able to capture local details better.12 Here we use parenthesis on H×W to highlight that query, key, and value are two-dimensional matrices, where the first dimension is of length H×W and the second dimension is of length C.13 We configure the LSTM to be: Hidden layer (32 neurons) + Hidden layer (64 neurons) + Dropout(0.25) + Fully connected layer.14 We configure the 1D-CNN to be: Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) + Fully connected layer.15 The buy-and-hold strategy is equivalent to classifying all samples into ‘up’. Table 1 shows that there are 822 ‘up’ days and 641 ‘down’ days for SPY in the test set, implying an accuracy of 822/(822+641)=0.562.16 All these experiments are conducted on a laptop equipped with an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17 A single chart may contain more than one technical pattern.18 The only exception is TBOT for 2833.HK.Additional informationFundingResearch support from the National Key R&D Program of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and the Fundamental Research Funds for the Central Universities (Peking University) is gratefully acknowledged.Notes on contributorsRuixun ZhangRuixun Zhang is an assistant professor at Peking University.Chaoyi ZhaoChaoyi Zhao is a student at Peking University.Guanglian LinGuanglian Lin is a student at Nankai University.