Out-of-Sample Forecasting of the CSI 300 Index: AR versus Machine Learning Models

Authors

  • Qi Qin

DOI:

https://doi.org/10.62051/7pgzdg13

Keywords:

Return predictability, Machine learning, Out-of-sample forecasting, CSI 300 Index.

Abstract

This study examines the out-of-sample predictability of daily returns of the CSI 300 Index by comparing a linear autoregressive (AR) benchmark with several machine learning models within a unified rolling forecasting framework. Using one-step-ahead forecasts, we evaluate predictive performance in terms of loss-based measures, directional accuracy, and statistical significance. The results show that machine learning models generally outperform the AR benchmark in terms of mean squared prediction error, with LSTM delivering the strongest gains and XGBoost achieving superior directional accuracy. However, predictive improvements are not uniformly distributed over time. Relative cumulative error analysis and volatility-based subsample tests indicate that machine learning advantages are primarily concentrated in high-volatility periods, while the linear benchmark remains competitive in tranquil markets. These findings provide evidence that return predictability and model performance may vary across market conditions.

Downloads

Download data is not yet available.

References

[1] Fama E F, French K R. Dividend yields and expected stock returns [J]. Journal of Financial Economics, 1988, 22 (1): 3-25.

[2] Welch I, Goyal A. A comprehensive look at the empirical performance of equity premium prediction [J]. The Review of Financial Studies, 2008, 21 (4): 1455-1508.

[3] Campbell J Y, Thompson S B. Predicting excess stock returns out of sample: Can anything beat the historical average? [J]. The Review of Financial Studies, 2008, 21 (4): 1509-1531.

[4] Meese R A, Rogoff K. Empirical exchange rate models of the seventies: Do they fit out of sample? [J]. Journal of International Economics, 1983, 14 (1-2): 3-24.

[5] Gu S, Kelly B, Xiu D. Empirical asset pricing via machine learning [J]. The Review of Financial Studies, 2020, 33 (5): 2223-2273.

[6] Chen L, Pelger M, Zhu J. Deep learning in asset pricing [J]. Management Science, 2024, 70 (2): 714-750.

[7] Bianchi D, Büchner M, Tamoni A. Bond risk premiums with machine learning [J]. The Review of Financial Studies, 2021, 34 (2): 1046-1089.

[8] Rapach D E, Strauss J K, Zhou G. International stock return predictability: what is the role of the United States? [J]. The Journal of Finance, 2013, 68 (4): 1633-1662.

[9] Dangl T, Halling M. Predictive regressions with time-varying coefficients [J]. Journal of Financial Economics, 2012, 106 (1): 157-181.

[10] Pettenuzzo D, Timmermann A, Valkanov R. Forecasting stock returns under economic constraints [J]. Journal of Financial Economics, 2014, 114 (3): 517-553.

[11] Kozak S, Nagel S, Santosh S. Shrinking the cross-section [J]. Journal of Financial Economics, 2020, 135 (2): 271-292.

[12] Cao L J, Tay F E H. Support vector machine with adaptive parameters in financial time series forecasting [J]. IEEE Transactions on neural networks, 2003, 14 (6): 1506-1518.

[13] Huang W, Nakamori Y, Wang S Y. Forecasting stock market movement direction with support vector machine [J]. Computers & operations research, 2005, 32 (10): 2513-2522.

[14] Fischer T, Krauss C. Deep learning with long short-term memory networks for financial market predictions [J]. European Journal of Operational Research, 2018, 270 (2): 654-669.

[15] Bao W, Yue J, Rao Y. A deep learning framework for financial time series using stacked autoencoders and long-short term memory [J]. PloS one, 2017, 12 (7): e0180944.

[16] Krauss C, Do X A, Huck N. Deep neural networks, gradient-boosted trees, random forests: Statistical arbitrage on the S&P 500 [J]. European Journal of Operational Research, 2017, 259 (2): 689-702.

[17] Zhang Z, Zohren S, Roberts S. Deep learning for portfolio optimization [J]. arXiv preprint arXiv:2005.13665, 2020.

[18] McLean R D, Pontiff J. Does academic research destroy stock return predictability? [J]. The Journal of Finance, 2016, 71 (1): 5-32.

[19] Kim H H, Swanson N R. Mining big data using parsimonious factor, machine learning, variable selection and shrinkage methods [J]. International Journal of Forecasting, 2018, 34 (2): 339-354.

[20] Lettau M, Pelger M. Factors that fit the time series and cross-section of stock returns [J]. The Review of Financial Studies, 2020, 33 (5): 2274-2325.

[21] Feng G, Giglio S, Xiu D. Taming the factor zoo: A test of new factors [J]. The Journal of Finance, 2020, 75 (3): 1327-1370.

[22] Rossi B. Forecasting in the presence of instabilities: How we know whether models predict well and how to improve them [J]. Journal of Economic Literature, 2021, 59 (4): 1135-1190.

[23] Campbell J Y, Thompson S B. Predicting excess stock returns out of sample: Can anything beat the historical average? [J]. The Review of Financial Studies, 2008, 21 (4): 1509-1531.

[24] Drucker H, Burges C J, Kaufman L, et al. Support vector regression machines [J]. Advances in neural information processing systems, 1996, 9.

[25] Breiman L. Random forests [J]. Machine learning, 2001, 45 (1): 5-32.

[26] Chen T, Guestrin C. Xgboost: A scalable tree boosting system [C]//Proceedings of the 22nd acm sigkdd international conference on knowledge discovery and data mining. 2016: 785-794.

[27] Tibshirani R. Regression shrinkage and selection via the lasso [J]. Journal of the Royal Statistical Society Series B: Statistical Methodology, 1996, 58 (1): 267-288.

[28] Hochreiter S, Schmidhuber J. Long short-term memory [J]. Neural computation, 1997, 9 (8): 1735-1780.

[29] Cho K, Van Merriënboer B, Gulcehre C, et al. Learning phrase representations using RNN encoder-decoder for statistical machine translation [J]. arxiv preprint arxiv:1406.1078, 2014.

[30] Clark T E, West K D. Approximately normal tests for equal predictive accuracy in nested models [J]. Journal of econometrics, 2007, 138 (1): 291-311.

Downloads

Published

19-03-2026

How to Cite

Qin, Q. (2026). Out-of-Sample Forecasting of the CSI 300 Index: AR versus Machine Learning Models. Transactions on Economics, Business and Management Research, 17, 155-166. https://doi.org/10.62051/7pgzdg13