Timeseries forecasting plays an important role in many applications where knowledge of the future behaviour of a given quantity of interest is required.
Traditionally, this task is approached using methods such as exponential smoothing, ARIMA and, more recently, recurrent neural networs such as LSTM architectures or transformers. These approaches intrinsically rely on the auto-correlation or partial auto-correlation between subsequent events to forecast the future values. Essentially, the past values of the timeseries are used to model its future behaviour. Implicitly, this assumes that the auto-correlation and partial auto-correlation is genuine and not spurious. In the latter case, the methods exploit the (partial) auto-correlation in the prediction even though they are not grounded in the causal data generation process of the timeseries. This can happen if some external event or intervention affects the value of the timeseries at multiple times. In terms of causal analysis, this is equivalent to introducing a confounder into the timeseries where the variable of interest at different times takes over the role of multiple variables in standard causal analysis. This effectively opens a backdoor path between different times that in turn leads to a spurious auto-correlation. If a forecasting model is built including such spurious correlations, the generalizeability and forecasting power of the model is reduced and future predictions many consequently be wrong.
Using a supervised learning approach, we show how machine learning can be used to avoid temporal confounding in timeseries forecasting, thereby limiting or avoiding the influence of spurious auto-correlations or partial auto-correlations.