{"title":"Interpretable image-based deep learning for price trend prediction in ETF markets","authors":"Ruixun Zhang, Chaoyi Zhao, Guanglian Lin","doi":"10.1080/1351847x.2023.2275567","DOIUrl":null,"url":null,"abstract":"AbstractImage-based deep learning models excel at extracting spatial information from images but their potential in financial applications has not been fully explored. Here we propose the channel and spatial attention convolutional neural network (CS-ACNN) for price trend prediction. It utilizes the attention mechanisms to focus on specific areas of input images that are the most relevant for prices. Using exchange-traded funds (ETF) data from three different markets, we show that CS-ACNN – using images constructed from financial time series – achieves on-par and, in some cases, superior performances compared to models that use time series data only. This holds true for both model classification metrics and investment profitability, and the out-of-sample Sharpe ratios range from 1.57 to 3.03 after accounting for transaction costs. The model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.Keywords: Price trend predictionconvolutional neural network (CNN)attentionimageinterpretabilityJEL Classifications: C45G11G12G15 AcknowledgmentsWe thank Xiuli Shao for very helpful comments and discussion.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Specific neural network architectures in this literature include the fully-connected neural networks (Gu, Kelly, and Xiu Citation2020), autoencoders (Gu, Kelly, and Xiu Citation2021), and sequence models (Cong et al. Citation2021a, Citation2021b).2 Jiang, Kelly, and Xiu (Citation2022), a primary example in this literature, focus on learning price patterns from candlestick charts for future price trends, while our framework is able to extract information from both candlestick charts and, more broadly, any images constructed from financial time series.3 We use Python's mpl_finance module, and adopt the convention in China to represent positive trends with red and negative trends with green.4 In particular, they are defined by whether the closing price is higher than the opening price of the day.5 See, for example, Borgefors (Citation1986) and Fang et al. (Citation2021).6 Se(p′) goes to ±∞ when p is very close to 0 or 1. In practice, we clip Se(p) to be between 0 and 1.7 To feed the data into the convolutional neural network, these images are resized and cropped to 112×64 pixels.8 This is referred to as the Gramian Summation Angular Field (GASF) by Wang and Oates (Citation2015). If we define an inner product as ⟨x,y⟩=xy−1−x2⋅1−y2, the image G in Equation (Equation11(11) G=[cos(ϕ1+ϕ1)⋯cos(ϕ1+ϕT)cos(ϕ2+ϕ1)⋯cos(ϕ2+ϕT)⋮⋱⋮cos(ϕT+ϕ1)⋯cos(ϕT+ϕT)]=X~⋅X~′−I−X~2⋅I−X~2′,(11) ) constitute a quasi-Gramian matrix under this inner product.9 The number of filters in VggNet (the number of output channels after convolution) starts from 64 and increases exponentially after each max-pooling operation. The convolution mode of VggNet is ‘same’, meaning that the dimension of the output image after convolution is the same as the input, and its downsampling is realized by the max-pooling operation.10 The number of convolution kernels in the original VggNet grows from 64 to 512. We choose smaller numbers to mitigate overfitting.11 A small kernel is also consistent with the fact that our images have a relatively small resolution, and a small filter is able to capture local details better.12 Here we use parenthesis on H×W to highlight that query, key, and value are two-dimensional matrices, where the first dimension is of length H×W and the second dimension is of length C.13 We configure the LSTM to be: Hidden layer (32 neurons) + Hidden layer (64 neurons) + Dropout(0.25) + Fully connected layer.14 We configure the 1D-CNN to be: Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) + Fully connected layer.15 The buy-and-hold strategy is equivalent to classifying all samples into ‘up’. Table 1 shows that there are 822 ‘up’ days and 641 ‘down’ days for SPY in the test set, implying an accuracy of 822/(822+641)=0.562.16 All these experiments are conducted on a laptop equipped with an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17 A single chart may contain more than one technical pattern.18 The only exception is TBOT for 2833.HK.Additional informationFundingResearch support from the National Key R&D Program of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and the Fundamental Research Funds for the Central Universities (Peking University) is gratefully acknowledged.Notes on contributorsRuixun ZhangRuixun Zhang is an assistant professor at Peking University.Chaoyi ZhaoChaoyi Zhao is a student at Peking University.Guanglian LinGuanglian Lin is a student at Nankai University.","PeriodicalId":22468,"journal":{"name":"The European Journal of Finance","volume":"29 5","pages":"0"},"PeriodicalIF":0.0000,"publicationDate":"2023-11-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":"0","resultStr":null,"platform":"Semanticscholar","paperid":null,"PeriodicalName":"The European Journal of Finance","FirstCategoryId":"1085","ListUrlMain":"https://doi.org/10.1080/1351847x.2023.2275567","RegionNum":0,"RegionCategory":null,"ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":null,"EPubDate":"","PubModel":"","JCR":"","JCRName":"","Score":null,"Total":0}
引用次数: 0
Abstract
AbstractImage-based deep learning models excel at extracting spatial information from images but their potential in financial applications has not been fully explored. Here we propose the channel and spatial attention convolutional neural network (CS-ACNN) for price trend prediction. It utilizes the attention mechanisms to focus on specific areas of input images that are the most relevant for prices. Using exchange-traded funds (ETF) data from three different markets, we show that CS-ACNN – using images constructed from financial time series – achieves on-par and, in some cases, superior performances compared to models that use time series data only. This holds true for both model classification metrics and investment profitability, and the out-of-sample Sharpe ratios range from 1.57 to 3.03 after accounting for transaction costs. The model learns visual patterns that are consistent with traditional technical analysis, providing an economic rationale for learned patterns and allowing investors to interpret the model.Keywords: Price trend predictionconvolutional neural network (CNN)attentionimageinterpretabilityJEL Classifications: C45G11G12G15 AcknowledgmentsWe thank Xiuli Shao for very helpful comments and discussion.Disclosure statementNo potential conflict of interest was reported by the author(s).Notes1 Specific neural network architectures in this literature include the fully-connected neural networks (Gu, Kelly, and Xiu Citation2020), autoencoders (Gu, Kelly, and Xiu Citation2021), and sequence models (Cong et al. Citation2021a, Citation2021b).2 Jiang, Kelly, and Xiu (Citation2022), a primary example in this literature, focus on learning price patterns from candlestick charts for future price trends, while our framework is able to extract information from both candlestick charts and, more broadly, any images constructed from financial time series.3 We use Python's mpl_finance module, and adopt the convention in China to represent positive trends with red and negative trends with green.4 In particular, they are defined by whether the closing price is higher than the opening price of the day.5 See, for example, Borgefors (Citation1986) and Fang et al. (Citation2021).6 Se(p′) goes to ±∞ when p is very close to 0 or 1. In practice, we clip Se(p) to be between 0 and 1.7 To feed the data into the convolutional neural network, these images are resized and cropped to 112×64 pixels.8 This is referred to as the Gramian Summation Angular Field (GASF) by Wang and Oates (Citation2015). If we define an inner product as ⟨x,y⟩=xy−1−x2⋅1−y2, the image G in Equation (Equation11(11) G=[cos(ϕ1+ϕ1)⋯cos(ϕ1+ϕT)cos(ϕ2+ϕ1)⋯cos(ϕ2+ϕT)⋮⋱⋮cos(ϕT+ϕ1)⋯cos(ϕT+ϕT)]=X~⋅X~′−I−X~2⋅I−X~2′,(11) ) constitute a quasi-Gramian matrix under this inner product.9 The number of filters in VggNet (the number of output channels after convolution) starts from 64 and increases exponentially after each max-pooling operation. The convolution mode of VggNet is ‘same’, meaning that the dimension of the output image after convolution is the same as the input, and its downsampling is realized by the max-pooling operation.10 The number of convolution kernels in the original VggNet grows from 64 to 512. We choose smaller numbers to mitigate overfitting.11 A small kernel is also consistent with the fact that our images have a relatively small resolution, and a small filter is able to capture local details better.12 Here we use parenthesis on H×W to highlight that query, key, and value are two-dimensional matrices, where the first dimension is of length H×W and the second dimension is of length C.13 We configure the LSTM to be: Hidden layer (32 neurons) + Hidden layer (64 neurons) + Dropout(0.25) + Fully connected layer.14 We configure the 1D-CNN to be: Conv1D(32) + MaxPool1D + Conv1D(48) + MaxPool1D + Dropout(0.25) + Conv1D(64) + GlobalAveragePool1D + Dropout(0.25) + Fully connected layer.15 The buy-and-hold strategy is equivalent to classifying all samples into ‘up’. Table 1 shows that there are 822 ‘up’ days and 641 ‘down’ days for SPY in the test set, implying an accuracy of 822/(822+641)=0.562.16 All these experiments are conducted on a laptop equipped with an Intel(R) Core(TM) i7-9750H CPU @ 2.60GHz.17 A single chart may contain more than one technical pattern.18 The only exception is TBOT for 2833.HK.Additional informationFundingResearch support from the National Key R&D Program of China (2022YFA1007900), the National Natural Science Foundation of China (12271013), and the Fundamental Research Funds for the Central Universities (Peking University) is gratefully acknowledged.Notes on contributorsRuixun ZhangRuixun Zhang is an assistant professor at Peking University.Chaoyi ZhaoChaoyi Zhao is a student at Peking University.Guanglian LinGuanglian Lin is a student at Nankai University.