The goal of any forecasting model is to produce an accurate prediction of the future, but *how* that accuracy is measured is important. We will review common accuracy metrics, such as the ME, MAE and MSE, and see their limitation when comparing data at different scales. This will lead in to the MASE and scaled RMSE (RMSSE), which both provide a solution to this problem. We'll then write a function in Python to calculate the RMSSE which will be used in future tutorials.

Let's use an example of predicting the sale of screws at a hardware store. Perhaps we are given 5 days of sales, shown in the table below.

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | |
---|---|---|---|---|---|

Screws sold ($y$) | 4 | 0 | 1 | 3 | 2 |

The total quantity sold over the first 5 days is 10, so we averaged 2 units per day. If we were tasked with forecasting the next 5 days in sales, we might take that 2 unit average as the prediction. We'll now pretend we received the sales figures for days 6 to 10.

Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|

Screws sold ($y$) | 2 | 0 | 4 | 1 | 1 |

Predicted sales ($\hat{y}$) | 2 | 2 | 2 | 2 | 2 |

To calculate the mean error, we first find the difference between the observed sales and the prediction for each day. We'll call this the error, $e = y - \hat{y}$.

Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|

Observed sales ($y$) | 2 | 0 | 4 | 1 | 1 |

Predicted sales ($\hat{y}$) | 2 | 2 | 2 | 2 | 2 |

Error ($e$) | 0 | -2 | 2 | -1 | -1 |

The mean error is then calculated as the total sum of errors, divided by the number of days, given by

$$\begin{aligned} ME &= \frac{1}{n}\sum_{i=1}^n e_i \\ &= \frac{1}{5}\sum (0 - 2 + 2 - 1 - 1) \\ &= -0.4 \\ \end{aligned}$$This error is sometimes referred to as the **bias** of the model. If we were to forecast across multiple series and found them all to have a negative ME, we could conclude our model has a "negative bias". If we tune our model to have a low or zero mean error, it might not be representative of our predictions getting. It could be that the predictions are worse but the positive and negative errors are cancelling out. So in conjunction of the ME, we can also look at the mean absolute error.

Similar to the mean error, the MAE is the mean of all errors except this time we use their absolute values.

$$\begin{aligned} MAE &= \frac{1}{n}\sum_{i=1}^n |e_i| \\ &= \frac{1}{5}\sum|0| + |-2| + ... + |-1| \\ &= 1.2 \\ \end{aligned}$$Here we end up with MAE = 1.2. It's a bit of an arbitrary number, is 1.2 good? Is it bad? To illustrate this point, let's pretend that each unit was in fact a box of 100 screws. Our values would then be,

Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|

Observed sales, ($y$) | 200 | 0 | 400 | 100 | 100 |

Predicted sales, ($\hat{y}$) | 200 | 200 | 200 | 200 | 200 |

Error, ($e = y - \hat{y}$) | 0 | -200 | 200 | -100 | -100 |

And if we were to recalculate the MAE,

$$\begin{aligned} MAE &= \frac{1}{5}\sum|0| + |-200| + ... + |-100| \\ &= 120 \\ \end{aligned}$$Uh oh, we have two different values for the MAE for the exact same sale of screws. The issue here is scale. Using the MAE as the measurement of accuracy can cause us grief when we are trying to compare series at different scales. One solution is to instead look at the error in relative terms.

We can normalise each error by dividing it by the observed sales. This is written as

$$MAPE = \frac{1}{n}\sum_{i=1}^n |\frac{e_i}{y_i}| $$For our example of the sale of screws, we would input

$$= \frac{1}{5}\sum|\frac{0}{2}| + |\frac{-2}{0}| + ... + |\frac{-1}{1}| $$Unfortunately, we come across a problem - a zero-division on the second term. We can't calculate the MAPE for this example, so we will have to try a different metric.

We can scale our error without running into the zero-division problem by calculating the MASE. This method uses the "in-sample MAE of a one-step naive forecast". Don't worry, it will make sense soon. We have already calculated the MAE in the section above (recall that MAE=1.20). Now we need to scale it. For that, we look at the sales before our forecasting period - ie the first 5 days of sales.

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | |
---|---|---|---|---|---|

Observed sales, ($y$) | 4 | 2 | 1 | 3 | 2 |

The first step is to find the difference between each day of sales. eg Day1 = 4 and Day2 = 2, so the difference between these sales is 2. We can denote this with $|y_i - y_{i-1}|$.

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | |
---|---|---|---|---|---|

Observed sales, ($y$) | 4 | 2 | 1 | 3 | 2 |

|$y_i - y_{i-1}$| | 2 | 1 | 2 | 1 |

Now we take the mean of the bottom row, which is $\frac{2+1+2+1}{4} = 1.5$. Finally, we divide our MAE by this value, so we end up with $\frac{1.2}{1.5} = 0.8$. This is our scaled MAE, or MASE.

For a quick test, let's scale this up to boxes of 100 screws and see what result we get.

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|---|---|---|---|---|

Observed sales, ($y$) | 400 | 200 | 100 | 300 | 200 | 200 | 0 | 400 | 100 | 100 |

Predicted sales, ($\hat{y}$) | 200 | 200 | 200 | 200 | 200 |

If you repeat the procedure from before, you will end up with MASE = $\frac{120}{150} = 0.8$. Excellent, our error is the same regardless of scale!

We can write out the above method with this formula:

$$MASE = \frac{\frac{1}{h}\sum_{t=n+1}^{n+h} |y_t - \hat{y_t}|}{\frac{1}{n-1}\sum_{t=2}^{n}|y_t-y_{t-1}|}$$where n is the training sample and h is the forecasting period.

RMSSE is quite similar to MASE, except we are squaring the errors and then taking the square root of the final product.

$$RMSSE = \sqrt{\frac{\frac{1}{h}\sum_{t=n+1}^{n+h} (y_t - \hat{y_t})^2}{\frac{1}{n-1}\sum_{t=2}^{n}(y_t-y_{t-1})^2}}$$You could also think of it as finding the MSE of the forecast and then scaling it by the the MSE of a one-step naive forecast.

$$RMSSE = \sqrt{\frac{MSE}{\frac{1}{n-1}\sum_{t=2}^{n}(y_t-y_{t-1})^2}}$$For completeness, I'll run through the example again and find the RMSSE.

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|---|---|---|---|---|

Observed sales, ($y$) | 4 | 2 | 1 | 3 | 2 | 2 | 0 | 4 | 1 | 1 |

Predicted sales, ($\hat{y}$) | 2 | 2 | 2 | 2 | 2 |

**Step 1 - Find the MSE of the forecast**

Day 6 | Day 7 | Day 8 | Day 9 | Day 10 | |
---|---|---|---|---|---|

Observed sales, ($y$) | 2 | 0 | 4 | 1 | 1 |

Predicted sales, ($\hat{y}$) | 2 | 2 | 2 | 2 | 2 |

Squared Error, $(y - \hat{y})^2$ | 0 | 4 | 4 | 1 | 1 |

**Step 2 - Find the MSE of a one-step naive forecast on the training set**

Day 1 | Day 2 | Day 3 | Day 4 | Day 5 | |
---|---|---|---|---|---|

Observed sales, ($y$) | 4 | 2 | 1 | 3 | 2 |

$(y_i - y_{i-1})^2$ | 4 | 1 | 4 | 1 |

Our scaling factor is,

$$\frac{4+1+4+1}{4} = 2.5$$**Step 3 - Calculate the RMSSE**

Now we can calculate the RMSSE = $\sqrt{\frac{2}{2.5}} = 0.894$

Let's put the three steps in the example above into code. We'll use numpy to create train, test and forecast arrays on the two different scales.

```
import numpy as np
train = np.array([
[4,2,1,3,2],
[400,200,100,300,200]
])
test = np.array([
[2,0,4,1,1],
[200,0,400,100,100]
])
forecast = np.array([
[2,2,2,2,2],
[200,200,200,200,200]
])
```

**Step 1 - Find the MSE of the forecast**

We need to find the difference between the forecasted values and the actual values, square it, and then take the mean.

```
forecast_mse = np.mean((test-forecast)**2, axis=1)
```

**Step 2 - Calculate the MSE of a one-step naive forecast on the training set**

Here we need to square the difference between the elements in the array and then find the mean.

We will only be doing this calculation from the **first non-zero element**. This is because we may run into situations in the future where we are given a time-series where the beginning is filled with zeros. To pre-empt this problem, we will trim the zeros at the start. However, in this particular case it won't have any effect.

```
train_mse = [np.mean((np.diff(np.trim_zeros(row))**2)) for row in train]
```

**Step 3 - Calculate the RMSSE**

```
rmsse = np.sqrt(forecast_mse/train_mse)
print(rmsse)
```

Excellent! Both values match and we have replicated the result from the calculation before. Let's finish up by putting it in a function for later use.

```
def rmsse(train, test, forecast):
forecast_mse = np.mean((test-forecast)**2, axis=1)
train_mse = [np.mean((np.diff(np.trim_zeros(row))**2)) for row in train]
return np.sqrt(forecast_mse/train_mse)
print(f'RMSSE: {rmsse(train, test, forecast)}')
```

We encountered problems using the Mean Absolute Error when trying to compare series that were at different scales. Attempting to correct this using a percentage error (MAPE) was unsuccessful due to the zero-division terms. We could solve both of these issues by calculating either the Mean Absolute Scaled Error (MASE) or Root Mean Squared Scaled Error (RMSSE).

The next tutorial will look at how we can **weight** the RMSSE to make it more meaningful from a business perspective. A weighted approach allows us emphasise the importance of certain series while discounting others. See you there!