Lai, M.Wulff, S. S.Cao, Y.Robinson, T. J.Rajapaksha, R.2024-12-162024-12-162024-11Lai, M., Wulff, S. S., Cao, Y., Robinson, T. J., & Rajapaksha, R. (2024). Temporal cross‐validation in forecasting: A case study of COVID‐19 incidence using wastewater data. Quality and Reliability Engineering International. https://doi.org/10.1002/qre.3686http://repository.kln.ac.lk/handle/123456789/28968Two predominant methodologies in forecasting temporal processes include traditional time series models and machine learning methods. This paper investigates the impact of time series cross-validation (TSCV) on both approaches in the context of a case study predicting the incidence of COVID-19 based on wastewater data. The TSCV framework outlined in the paper begins by engineering interpretable features hypothesized as potential predictors of COVID-19 incidence. Feature selection and hyperparameter tuning are then utilized with TSCV to identify the best features and hyperparameters for optimal model performance given a specific forecast horizon. While evidence supporting the utility of TSCV for auto-regressive integrated moving average model with exogenous variables (TS-ARIMAX) forecasts is lacking in this study, such an approach proves advantageous for gradient boosting machine forecasts (TS-GBM). In Wyoming, for instance, TS-GBM had a 34.9% improvement compared to naïve predictions, whereas GBM without TSCV only had a 15.6% improvement. However, TSCV also enhances interpretability for both TS-ARIMAX and TS-GBM models as this approach selects specific features, such as lagged values of COVID-19 cases, based on forecast performance and forecast length. Future research should work to explore the influence of stationarity and model averaging on the performance of TSCV in forecasting applications.Temporal cross-validation in forecasting: A case study of COVID-19 incidence using wastewater data