CMAF FFT: Generating stable forecasts: what it is, why it matters, and how to achieve it

By Lancaster CMAF

Summary

Topics Covered

Forecast Instability Evades Accuracy Checks
Stability Equals Higher Profits
Combine for Stability Without Accuracy Loss
Optimize Models for Dual Accuracy-Stability

Full Transcript

Hi everyone, welcome to Friday forecasting talks. Today we will we having Yent

talks. Today we will we having Yent Vanbele uh who will talk about generating stable forecasts and lots of details related to that. Before we do

that though, I wanted to say just couple of words about the center who organizes this events. Uh this slide shows all the

this events. Uh this slide shows all the members that we have at the moment. uh

with Robert Files, Professor Robert Files being the founder of the center, but uh he's not no longer involved as much as he used to. And you can see lots

of other great uh bright faces here, including me and Kandria. Kandrea is

organizing these events. Uh we provide a variety of services including bespoke bespoke short courses and um some of you

might have heard that we held an online uh course on uh business forecasting principles. We want to restart that in

principles. We want to restart that in May to hold another one in May. We have

some option opportunities for summer projects. We have expertise in software

projects. We have expertise in software development and so on so forth. You can

see lots of things here and our expertise spends from elements of marketing analytics to inventory management. But I would say that the

management. But I would say that the main focus is demand forecasting and supply chain forecasting. So topics

related to that. Uh so if you're interested in anything uh in those directions and want to work with us, please get in touch. How to get in

touch? Well, scan this QR code and this

touch? Well, scan this QR code and this will lead you to a page with a variety of options. Uh, oh yeah, we are still

of options. Uh, oh yeah, we are still sort of on Twitter, but we are not active there anymore. I should delete this actually. But LinkedIn, yes, the

this actually. But LinkedIn, yes, the center is present on LinkedIn and we sometimes post some things. You can

always send us an email, visit our glorious website and also we have YouTube channel where we are uploading videos from these events and also we

have initiative of recording educational videos in business forecasting principles. So these are the

principles. So these are the options and we can now move to the main event of today. So presentation of Yente and Yente can you please start sharing

your screen?

All right. Um thank you Ivon. Um so

welcome everyone to this uh to this talk. Um today we will uh talk about

talk. Um today we will uh talk about generating stable forecasts. Uh what it is, why it matters and how to achieve

it. Um so currently I'm um an FWO junior

it. Um so currently I'm um an FWO junior post-doctoral fellow at the um faculty of economics and business at KUven. And

um if you uh want to um or if you have questions after this presentation, you can always reach me via email. My email

address is here uh on the slide. Um but

I think we can get started. And um well before we dive into the into the content maybe first quick overview of the uh of

the talk today. So first I will um or we will look into um what is forecast instability or forecast stability. So we

will look at a definition and a little toy example to um to get familiar with the concept. Um next we will uh look

the concept. Um next we will uh look into well why does it matter? Why does

it matter? Why why should we care about forecast stability? And then finally in

forecast stability? And then finally in the well in the the third part of this presentation we will look at um approaches how to achieve um more stable

forecasts. Um and there we will look at

forecasts. Um and there we will look at um approaches or we will look at model selection, forecast combinations and finally also um model optimization. So

how to uh directly optimize your forecasting models um to make them produce more stable forecasts uh inherently more stable

forecasts. Um so well first of all what

forecasts. Um so well first of all what is it what is forecast instability? Um

and here well you can see the definition here on this slide. Um so rolling origin forecast instability is the variability in forecasts for a specific period caused by updating the forecast for this

period when new observations become available. So this is a forecast

available. So this is a forecast instability or forecast stability is a characteristic of forecasts in a context in which we generate multi uh multi-step

ahead forecasts on a rolling basis. And

this is also visualized here on this slide um where we see um one to six step ahead forecasts for um a time series

from the M3 competition and it's on a rolling basis. So our forecasting origin

rolling basis. So our forecasting origin um is updated each time um a new observation comes available. And if you

focus on the the red vertical line, it immediately is clear that um the the forecast distribution for this uh target

period is um well changes um sometimes uh a lot um depending on the forecasting origin that is

used. Well um why is this important? Um

used. Well um why is this important? Um

it's important to realize that this behavior um remains under the radar. So

it goes unnoticed if we only evaluate our forecasts from a forecast quality or a forecast accuracy perspective.

And to illustrate this, let us um first look at a toy example. Um the toy example here is um

example. Um the toy example here is um so the setting is is is visualized here on the slide. Um on the top we have the we have the table where um well where

the process um or of of of of generating a stable versus an unstable forecaster is um is detailed but let's not look too

much um let's not go too much into the details the visualizations will will actually help to understand this toy problem. So but the idea is well we

problem. So but the idea is well we start from nature. So we want to predict nature um at period t and here for for

for for the for this nature we we use um a normal distribution where we sample the mean from another normal distribution and so there is no difference between our stable forecaster

and our unstable forecaster as you can also see here um on the bottom of this slide. So the target distribution is a

slide. So the target distribution is a normal distribution and it is of course the same for the stable and the unstable forecaster. At um time t minus 3 we

forecaster. At um time t minus 3 we generate a first forecast for our target period t. And here what we do is

period t. And here what we do is basically we take this the ground through distribution because of course it's a simulation experiment and we add some distributional bias to it. So we

mix it with another normal distribution. Um and we do the same for

distribution. Um and we do the same for the stable and the unstable forecaster.

So there is no difference um up to this point uh between the stable and the unstable forecast. However, at time t minus 2, we

forecast. However, at time t minus 2, we update our forecast for our target period t and there we introduce a difference between the stable and the unstable

forecaster. Actually the the

forecaster. Actually the the distributional bias that we add to the stable and the unstable forecaster is um is is similar in terms of size but the

sign is different. And so here you can see that for the unstable forecaster the new forecast shifts to the opposite side of the ground through distribution whereas for the stable forecaster it

remains on the same side of the uh ground through distribution. If we then look at our

distribution. If we then look at our final forecast which we make at t minus one. So again for target period t then

one. So again for target period t then we see that the um forecasts are again the same. So the stable and the unstable

the same. So the stable and the unstable forecaster they generate the same forecast. But it is important to realize

forecast. But it is important to realize here that for the unstable forecaster this means that again our updates so the forecast shifts again to the opposite

side of the ground through distribution.

Well, why do we why do we simulate uh this behavior in this in in this manner?

Well, that is because typically forecast updates result in better forecast quality or forecast accuracy. Um because

they are based on a shorter forecast horizon. So that is the behavior that we

horizon. So that is the behavior that we try to mimic here. But this updating typically also introduces instability.

And this is the case for the stable and the unstable forecaster. The only

difference here is they'll the the the degree to which they introduce instability. And of course, a stable

instability. And of course, a stable forecaster introduces um not that much instability whereas the unstable forecaster uh introduces a lot of

instability.

If we now evaluate these uh forecasts, so we simulate um this process for 10,000 periods and then we evaluate the forecast in terms of forecast quality

and for forecast quality we use a CRPS so the conditional rank probability score which basically is the well which is basically the the MAE variant for

probabilistic forecasts and can be obtained by integrating over the quantile scores um for all quantile levels.

Um well what do we see here in terms of forecast quality that there is no difference between the unstable and the stable forecaster. Of course this is by

stable forecaster. Of course this is by design. However, more importantly, what

design. However, more importantly, what we also see is if we look at this um if we look at the forecast or if we evaluate the forecast from an

instability perspective and we do this by computing the waserstein one distance which which is a distance metric for um to to compute the dissimilarity between

two distributions um and here is also the the formula is also shown on the slide. So what we do is basically we um take the absolute difference between the quantile

forecasts for all quantile levels and we integrate over all these um differences and then we have the um the difference between our um forecast our

new forecast and our an older forecast for the same target period. We can do this for adjacent forecasting origins.

So by this I mean well if we compare t minus2 or the forecast generated for period t at t t minus2 and t minus3 or t minus 1 and t minus2 well that is

adjacent and non-adjacent is if we compare t minus one against t minus 3 and what we see here is that also again of course by construction that are um

that our unstable forecaster is um is way more or there there is clear difference between the uh in terms forecast stability between our unstable and our stable forecaster. With a stable

forecaster um well with a stable forecast resulting in lowerstein one metrics and here lower is better which is also the case for the

CRPS. All right. So here you can see

CRPS. All right. So here you can see that well we just um saw that it's possible that we have two different forecasters um which generate forecasts

that are of the same forecast quality but only differ in terms of their forecast um stability or forecast instability. And of course then the

instability. And of course then the question is why is this important? Why

why does it matter forecast stability?

Um and that is because forecasting is usually a means to an end. So usually we generate our forecasts um which are then used as inputs to a decision- making

problem. Um so we use we we generate

problem. Um so we use we we generate forecasts to inform decision making and in that sense we can um we can assume

that unstable forecasts will lead to larger adjustments um to plans and decisions that are based on the generated forecasts. For example, um

generated forecasts. For example, um assume that you are responsible for um for for for emergency evacuation planning um and

that you generate flood forecasts. If

your flood forecasts are very unstable and you um that that also means that your um your plans your your decisions um regarding emergency measures will

will change um throughout the the planning horizon. And this is of course

planning horizon. And this is of course not what you want. So it's it's not very desirable. Um same is true for inflation

desirable. Um same is true for inflation forecasting. For example, if your um if

forecasting. For example, if your um if your forecasts um for inflation are very widely or change widely through time,

that will harm your uh credibility. So

you will lose credibility and maybe forecast users will will lose trust in in the forecasting system.

Um and then finally a third example is uh is for demand forecasting where um unstable forecasts will typically lead to more or larger revisions to to supply plans and this will lead to higher

supply chain costs. And finally in in in a broader

costs. And finally in in in a broader sense unstable forecasts well they may erode trust in the forecasting system which then potentially leads to

unwarranted judgmental adjustments. Um

and here there is one one quote by Nord House from from a paper um published in 1987 already. He um phrases it as

1987 already. He um phrases it as follows. So moreover some forecasters

follows. So moreover some forecasters might smooth their forecasts as a service to customers. One forecaster

told me that he smoothed his forecast because a more accurate but jumpy forecast would drive his customers crazy. Of course, we know that

crazy. Of course, we know that unwarranted judgmental adjustments typically lead to a lower um or a decrease in forecast accuracy, but this is of so this is of course something

that we that we want to um that we want to minimize.

Um well to make this more more more more specific let us let us um take us let let us take our um our toy example um to

the next level and let us now use the stable and the unstable forecasts um in a decision-making problem. Um and to this end we will I will I will use a

news vendor problem with three ordering decisions. So um news vendor problems

decisions. So um news vendor problems are very well studied. The task here is to set the inventory levels for each period t so as to maximize expected

profit. And that basically means that at

profit. And that basically means that at each period t we are faced with uncertain demand for a perishable product. And um at the start of each

product. And um at the start of each period we receive all the orders that we make that we that we have made and all unsold units uh they will be worthless

at the end of the period. If we have product shortages, that basically results in lost revenue. So that's also something that you want to avoid. There

are um nice theoretical um results for the news vendor problem. And this is also the case for this uh this variant where we consider a news v vendor with

three ordering decisions. So basically

um we have for each period t and we make an initial order at t minus 3 which is um which can only be positive and it's

at a certain cost c but we can make two revisions at t minus 2 and t minus one and so we can either order more or

cancel part of our um ordered units and this will come with additional costs and so if you if you order more units you will have to pay premium. If you cancel

part of your um orders, then you will need to pay a cancellation fee. But of

course, you do not have to pay for the the cost of the of the of the ordered units of the canceled units, sorry.

So what is important here? Um so we will not look into the to the ma mathematical details, but what is important here is that there are optimal order quantities

that you can uh compute. So it's it's pos it's theoretically possible and of course you can use these uh optimal ordering quantities at each uh period to

make your orders or to make your ordering decisions for period t. Um but you do not have to as a as a

t. Um but you do not have to as a as a as a as a decision maker. You can also opt for other strategies. For example,

the anticipation strategy or the procrastination strategy where uh the first one means that we use the median forecast at t minus 3 and t minus 2 and

only rely on the optimal order quantity at t minus one. Whereas the

procrastination strategy there we only order at t minus one at a higher cost.

And remember um our stable and our unstable forecaster they generate the same forecasts um for uh the target period t at t minus one. So the

procrastination strategy here is like it's it's it's a check. Um so there shouldn't be a difference between the stable and the unstable forecaster. If we then look at the

forecaster. If we then look at the results for these three different strategies and here of course it's important to realize that um the um that the results will depend on the forecasts

used as inputs because the optimal order quantities are um or or a specific quantile level of your forecast distribution. Um so here we we we

distribution. Um so here we we we simulated um this this this um this process for 10,000 periods and we compute the uh

profit for a high profit margin and a low profit margin. And what is interesting here is that the stable forecasts for both profit margins and across strategies except for the

procrastination strategy um always results in higher profits. Um so it ranges from 0.85 85 to

3.47. So an increase in profits uh which

3.47. So an increase in profits uh which is quite substantial. And uh if you look at well the number of periods in which the profit is higher um for the sta if

you use a stable forecast versus the unstable forecasts it's about um 80% of the periods um in which you well will have a higher profit um if you use the

stable forecasts. So I think this is a

stable forecasts. So I think this is a this is a strong motivation to um to to to to make to clarify why um considering

forecast instability um is an important um is an important issue. All right, that brings us to the

issue. All right, that brings us to the the the third part of this presentation. How can we stabilize

presentation. How can we stabilize forecasts? How can we achieve um how how

forecasts? How can we achieve um how how can we achieve this? Um and there are basically three different strategies. Um

we can take forecast instability into account in model selection. We can um consider it or we

selection. We can um consider it or we we can use it as as a as an input to um forming model or to coming up with

forecast combinations. And then finally

forecast combinations. And then finally um we can also look at model optimization in terms of forecast accuracy and forecast stability. Um and

we will look into well we will we will look at um approaches here for point forecasts. So to stabilize point

forecasts. So to stabilize point forecasts uh to stabilize gshian probabilistic forecasts and to stabilize distribution free probabilistic

forecasts. Um so but first things first

forecasts. Um so but first things first let us first look at model selection um and this is uh so this this is based on

um a working paper by um Kandria and Nikos Kurensus um in which they define forecast congruence um a quantity to align

forecasts and inventory decisions. So

they focus on the impact of forecast congrence on inventory decisions. Um and

basically well forecast congruence is very similar uh to uh forecasts instability or to how I um introduced forecast instability

before. Um the only difference here is

before. Um the only difference here is that they um they they compute it by looking at the variance um across um

forecast for a specific target period t but for different forecast origins. So

basically if if we if we um if we look at the the the visualization or if we take the visualization from the first slide again and we look at the vertical

red line um then they will use all the forecasts for the same target um for for the for the target period t and they will compute the variance and that is

our forecast congrence. So of course lower is better. Um and what they show in the paper is that um they use inventory simulations for

simulated ARMA demands but they also use a real data set uh from an FMCG manufacturer and what they show or what they conclude is that accounting

accounting for congruence in addition to accuracy in model selection so um it's it's it's it's only used in model selection um can lead to favorable

inventory performance And why is this important? Well, because then we do not

important? Well, because then we do not need to rely on expensive inventory simulations. So it's it can be used as a

simulations. So it's it can be used as a proxy. So combining forecast accuracy

proxy. So combining forecast accuracy and forecast instability um to inform model selection can be used as a proxy

um for expensive inventory simulations. Uh and this is just one

simulations. Uh and this is just one example for the um simulated ARMA seasonal ARMA demand.

um which is shown here.

um and there they will they they generate forecasts for um with with different ETS variants but also with the

um MAPA algorithm and um the DGP because it's a simulation experiment of course is is is a is is not is a theoretical uh

forecast which which we cannot um which we cannot make in in in reality but it's a it's a good it's a it's it's a nice

check here. Um and what we see is that

check here. Um and what we see is that while uh MABA only um ranks fourth in terms of RMSSSE, so in terms of

accuracy, um it ranks it it's it's um it's the best method in terms of congruence, so in terms of stability.

And if we then look at the uh inventory performance which is uh quantifi which is quantified here by looking at the uh scaled stock on hand and the scaled loss

sales um then we see that MAPA outperforms the other methods. So um

because it's it it achieves um pro it achieves the same target service level um by um with with less stock on hand uh

and the difference in loss sales is is is negligible. Um so this is a very

is negligible. Um so this is a very interesting result. Um and what is also

interesting result. Um and what is also important here or what is also interesting here is that if we look at the congruence of the DGP which is a theoretical quantity which we can which

we cannot know in in in practice um the MAPA actually is over congruence. So

it's it's more stable than our underlying DGP. Um but this does not

underlying DGP. Um but this does not harm um the performance in terms of um in terms of inventory decision

making. Um so this is um for uh

making. Um so this is um for uh forecasts um for for model selection.

Another strategy that we can use is um forecast combination and this is based on a on another working paper by um BH

and um co-authors and what they propose is um a post-processing approach to uh stabilize newly generated forecasts by combining

them with um the most recent previous forecast for the same targets using a weighted average. So what we basically

weighted average. So what we basically do is well if we want to generate a forecast for period t then we look at uh then we use an older forecast for that

same target period to um to stabilize the new forecasts and uh this is what you see here they have two variants partial interpolation and full

interpolation and partial interpolation.

So where um the y tilda is the the stabilized forecasts. So partial

stabilized forecasts. So partial interpolation is a is a weighted average of um the the the newly generated forecast and the old original

forecasts. Whereas full interpolation is

forecasts. Whereas full interpolation is um is there we stabilize our forecasts by uh combining the newly generated forecast with the old stabilized

forecast. So this is this is um there

forecast. So this is this is um there you take into account all forecasts that you have made previously for the same targets. Here you have a single

targets. Here you have a single hyperparameter that controls a trade-off between stability and accuracy. So the

WS and this is of course a parameter that you need to that you need to uh that you need to tune or that you need to select based on some procedure. But

what is interesting here um is that this approach is model agnostic. Um

so it doesn't matter what how the forecast says. So the y heads here are

forecast says. So the y heads here are generated. It can be um a local

generated. It can be um a local forecasting they can originate from a local forecasting method, global forecasting method or even um a fully

judgmental uh forecasting pipeline.

Um here we we we look at one experiment that they report in the paper and it's uh it's an experiment experiment on the M4 monthly data set and they um for for

the for base forecasts that are generated by using the global model NBS. Um and what we can see here uh on

NBS. Um and what we can see here uh on in in in the in the figure is um they show they they visualize a paro plot uh

parto sorry. Um so on the x-axis we have

parto sorry. Um so on the x-axis we have accuracy in terms of the symmetric mean absolute percentage error. On the y-axis we have uh instability in terms of the

symmetric mean absolute percentage change which is just the same formula as the um symmetric mate but where we replace the actual by an older forecast

and for both metrics at lower is better um and what is interesting here to see is that it's it's not very easy um I think because it's yep the the

the it's it's um The text is quite small but in the uh upper left corner we have the base forecast. So the the the the base and

forecast. So the the the the base and beats forecast. Um,

forecast. Um, and if we draw a vertical line um to the to so parallel to the the the

y-axis, methods that lie to the left of the that line are um more stable but um do not um well are more stable and

are at least as accurate as the base forecast. And you can see here that

forecast. And you can see here that partial interpolation and full interpolation with a with a weighted uh with a WS equal to 0.2 so with with a hyperparameter equal to

20% results in uh in an improvement in stability without harming accuracy. So this is one way to um

accuracy. So this is one way to um select a hyperparameter for to select a value for for the hyperparameter WS. But

in principle actually you can um select any method on the paro front um which is obtained here via um a rolling origin evaluation on a validation set

um and it's up to the um well to the to the user to the forecast user to um determine or to trade uh stability for uh for accuracy. But it's also possible

to improve stability without um without well a considerable loss in forecast accuracy.

Um and then finally we can also um look at approaches that uh or we can also use approaches where we directly optimize our model to generate um more stable

forecasts. And the key element here is

forecasts. And the key element here is um that we cause the problem as a biobjective optimization task um by using a composite loss function. And so

the idea is very simple and we have our loss function and now we have two loss terms. One um term that looks at forecast errors so to optimize accuracy

or quality and another term that looks at forecast instability um where we compare forecasts so y head to an older

forecast. So we quantify dissimilarities

forecast. So we quantify dissimilarities between forecasts for the same target um but originating from different forecasting origins. And again here we

forecasting origins. And again here we have one hyperparameter which is here lambda. Um and we can ask well there are

lambda. Um and we can ask well there are two questions. Are there lambda values

two questions. Are there lambda values which lead to forecasts that are as accurate but more stable than those for lambda equal to zero which basically means that we only optimize for um for

accuracy or forecast quality. And how should we

quality. And how should we operationalize this operationalize this?

And this is basically where the the difficulty lies here because the composite loss function the idea is very uh straightforward. But how to operation

uh straightforward. But how to operation how to operationalize this? Well, this

that that is where the the difficulty lies here in in in this approach.

Um and let us first look at how to operationalize this for uh point forecasts and in this paper um so this

is from uh a paper published in IGF and I'm one of the authors um there we propose NBS s where the additional S

stands for a stabilized version of NBS and there we modify the NBS model in order to generate inherently more stable forecasts.

Um what is important here is to well we will not look into the NBS model in in in in detail. Um but it's important to mention here that it's a global deep

learning model. So it's a model that is

learning model. So it's a model that is um estimated what with a so it's one it's a model with um a set of global model parameters that are estimated

jointly across time series.

Um and this uh model is used for um univariate time series point forecasting. And the uh key idea is the

forecasting. And the uh key idea is the doubly residual stacking topology where we have a residual beckas and a partial forecast. Um and this residual beckas

forecast. Um and this residual beckas basically serves to filter the the the signal the input signal um as you go deeper into the network so as to only

focus on the uh part of the time series that is not explained yet in uh earlier um in in in in earlier blocks. So how

does this model work? Well, we we have a time series and we sample um an input output window from our uh time series.

So with a look back window uh which is used to generate forecasts for the uh observations in the forecast period and you feed this to the model um

and the model output is basically a one to eight step ahead forecasts for um the times you for your um for your forecast

periods. How can we um stabilize this

periods. How can we um stabilize this model? um well by using the composite

model? um well by using the composite loss function and um for the error component here um we use the

RMSSE so it's um well fairly standard and for the instability loss component we use a variant of the RMSSSE where we

replace the error uh where we replace sorry the actual by an older forecasts for the same target period. Um so in

that way we can quantify both forecast error and forecast instability. However

an important um thing here is that we in order to quantify um or in order to be able to compute the RMSSC we need a forecast for period t

that is generated at two different forecasting origins. So in order to

forecasting origins. So in order to train this model we use for each input output sample also um a leg sample. So

where we um where we take the for for each input output sample we also consider the output sample input output sample sorry with the forecasting origin

shifted one period um backwards in time and if you do this well then you can compute the RMSSE for um both input output samples and you also can compute

the RMSSSC for the overlapping time periods in these um in these forecast periods. periods for these two um for

periods. periods for these two um for these two related samples. Well,

samples. Well, and does this work? Um well, if we if we look at this figure here, this is um how we tune the the the lambda

parameter. So the the hyperparameter on

parameter. So the the hyperparameter on a validation set. Um here we can see that well if we um take into account forecast instability

um only slightly then it it's it um it's clear that we can of course improve stability but also that our accuracy um

improves at first um and it only well gets worse if we uh assign too much weight to um forecast instability in the

um in the composite loss function.

Um so here we we can see that it is possible so that there are certain lambda values for which um or which result in uh improvements in forecast stability without harming um the

forecast accuracy. Um and these are um the

accuracy. Um and these are um the results for an experiment on the M4 monthly data set. And as you can see here um on the left we have the MCB

results. So the the um the results in

results. So the the um the results in terms of rankings um ac across um across time series um and on the right we have

the um results MCB results in terms of stability. And what we see here is that

stability. And what we see here is that NBS as so the stabilized version of MBS um results in significantly more accurate forecasts uh and but but also

significantly more stable forecasts. So

it's um yeah it's that is exactly um what we want what we wanted to achieve. Um if we take this one step

achieve. Um if we take this one step further um and this is a working paper um which is available on archive um

there we um we look so we we look at the hypothesis or the hypothesis in this working paper is that um dynamically

tuning lambda. So instead of just

tuning lambda. So instead of just selecting a single value um by by as we did in the um on the previous slide um we can also dynamically tune our lambda

which basically means that we um assign a different value to lambda in each training iteration in each iteration of our stoastic gradient descent um

procedure. Um and if we dynamically tune

procedure. Um and if we dynamically tune uh Lambda, the idea is well that we can prioritize forecast accuracy during the early stages of training uh keeping in

mind the goal of improving forecast stability without significantly um sacrificing accuracy. And well the

sacrificing accuracy. And well the hypothesis is that this can lead to improved performance um compared to using a tuned static uh value for lambda.

Um and I will not go into the details but um there are a lot of u dynamic loss waiting algorithms. So which are algorithms um where that dynamically

tune uh which can be used to dynamically tune the lambda value. So the

hyperparameter um there are a lot of um them available um and we see here again an experiment on the M4 monthly data set um that we

can use them uh to um improve um our to even to further improve our forecasts in terms of um

stability over the um over the the method with a static um tuned Lambda value um without significantly ly um

harming accuracy. And on the left here,

harming accuracy. And on the left here, this is for one dynamic loss waiting method. What is interesting to see here

method. What is interesting to see here is on the y on the x-axis, we see the uh the we visualize a train the the

training iteration. So um from zero to

training iteration. So um from zero to um approximately 20,000 here um weight updates and on the y-axis we have our um

our lamb. But we also visualize the

our lamb. But we also visualize the orange um shaded area basically visualizes the weight that we assign um

to stability which is a result of the um so which which is an output of the dynamic loss weighting algorithm and it's interesting to uh to see that this corresponds to our hypothesis. So

initially it uh it assigns less weight to stability. So um first we need to

to stability. So um first we need to achieve um a decent forecast accuracy in order for forecast stability to become

um to become important or to become helpful in in in training our model. Um we can

model. Um we can also do this for gshian forecasts. Um

and so here the idea is how can we operationalize this? Well, we can first

operationalize this? Well, we can first modify the NB beats model by making it a probabilistic forecasting model by just

replacing the um the point predictions by now um median prediction but also a prediction for the variance for each period in our forecast

window.

Um and we can um we can um again use the composite loss function. And now we operationalize this

function. And now we operationalize this by um by well for the error component we use a likelihood based optimization uh similar to the deep AR um um way of

optimizing your um your your network. Um and for the instability loss

network. Um and for the instability loss component well there we can rely on closed form solutions for um the coolback lier divergence and the

waserstein two distance between two caution distributions. So this is in a

caution distributions. So this is in a caution setting um there are closed form distributions close form solutions available to um quantify the differences

between two distributions. And so this can readily be used um in in your loss function to optimize your model for both accuracy and uh

stability. And again here we have um an

stability. And again here we have um an experiment on the M4 multi data set. Um

and well the the the the conclusion here is that um we can so that we can use KL divergence but we can also use waserstein the waserstein two distance or we can even um only stabilize the

mean of our distributions and uh in all cases um it results or it is possible to achieve um

improvements in both forecast accuracy um or at least in in um in improvements in forecast stability without considerably harming forecast

quality. And then finally um we can take

quality. And then finally um we can take this one step further um and consider distribution free probabilistic forecasts. Uh this is still work in

forecasts. Uh this is still work in progress. So there is no um there is no

progress. So there is no um there is no working paper or no published paper available yet for this. Um but here the idea is that we um also operationalize

this for um distribution free probabilistic forecasts where we um model our forecasts as conditional quantile functions

um which are approximated by um linear isotonic regression splines. So

peacewise peacewise linear functions and these peacewise linear functions they are parameterized by training a neural network. Um and then again well we can

network. Um and then again well we can um rely on the CRPS to as as a as a as an error component or as an quality

component in our in our loss function and to quantify forecast instability. we

can rely on the Waserstein one distance or um another metric um to um to basically follow the same procedure to uh combine both um forecast accuracy or

forecast quality and forecast instability uh in the optimization of your model. And um why is this an

your model. And um why is this an interesting approach? Well um because

interesting approach? Well um because for the Waserstein one distance remember from the first slide the waserstein one distance um can be

com well is is calculated by um looking at the um the differences between um quant between specific quantiles of the two distributions and then well you

basically aggregate all those differences. uh but you can also use

differences. uh but you can also use quantile weighted versions of the waserstein one distance which allows you to put more weight on u the center of the distribution or on the tails of the

distribution. Um so basically this

distribution. Um so basically this allows us to focus our um well our our stabilization procedure on um specific

parts of the distribution. Um and well this this is a this is a nice feature of this um of this distribution free uh approach that is based on conditional quantile

functions. Um all right so uh some key

functions. Um all right so uh some key takeaways um first of all well forecast stability um can be considered as an

additional evaluation criterion to reduce the gap between uh forecast evaluations um and utility based evaluations of forecasts. Um second forecast

forecasts. Um second forecast instability may go unnoticed if uh forecasts are evaluated solely from a forecast quality perspective. Um and then third, well,

perspective. Um and then third, well, why is all this important? Um the idea is that more stable forecasts will lead to fewer and smaller adjustments uh to

the plans that you drafted based on your forecasts. Um and uh that it will also

forecasts. Um and uh that it will also uh enhance trust in the forecasting system. And then finally um well we we

system. And then finally um well we we uh we showed that we can stabilize forecasts without causing a considerable loss in forecast quality um either by uh

using um by using it in a model selection um procedure by using forecast combinations or by directly optimizing your models for both forecast quality

and forecast stability. So that was it. Thank you.

stability. So that was it. Thank you.

Thanks. Uh now we are ready to accept any questions that people might have and just to start this off

um I'll ask something uh that interests me. So um when you apply this composite

me. So um when you apply this composite loss function that you discussed in in the last part of the presentation, what do you think happens with the with

parameters of models? Uh have you done any sorts of investigations on that?

Um, sorry. So, um, do you mean that it acts

sorry. So, um, do you mean that it acts as a sort of shrinkage um estimator or is that what you are um what you're Yeah, that's that's that's what I

thought that it might act like this one, but uh that's my hunch. I don't know.

So, I'm interested in your opinion.

That's a good question. That's a good question. Well, we have we haven't

question. Well, we have we haven't looked into it. Um, and it's also well, it's not very easy to um to study it because it's it's a deep learning model.

So, there are um well, millions of parameters. Um, but yeah, it's it's an

parameters. Um, but yeah, it's it's an interesting um topic to to to explore further I would say.

Okay. What the dynamics are or the the effects on on the on the parameters on the model parameters. Yeah, that it would it would be interesting to explore

indeed. Um you've mentioned uh

indeed. Um you've mentioned uh stability sort of different measures for stability, right? But if I understand

stability, right? But if I understand correctly, the ones that you discuss mainly focus on, you know, just forecasts produced from yesterday and

from day before yesterday or did I misunderstand?

Uhhuh. Can you take more of them into account somehow?

Um yes that is possible and that is exactly what the difference between well the the the definition of forecast stability that I um that I adopt and um

the definition of forecast congruence and there they take into account um all the forecasts basically that are generated for a target time period um yeah a

target period um that originate from different forecasting origins Why do we um use the um well the setting

in which we only compare adjacent forecasting origins? Um well that is

forecasting origins? Um well that is because if we go here to the um to the toy example again um then we see that

basically this is or but well further investigation is is is needed um is needed there but basically it is the um it is this instability. So the

adjacent uh between the adjacent forecasting origins that matters because well if you make the assumption that each time you update your forecast you

also use it in decision making because then it's basically that instability that affects the that that that impacts um your decision- making

um or your decision basically.

Okay. So that's that's the reasoning behind the adjacent forecasting origins.

Mhm. I wonder also whether you need actually to make things more complicated and to add more uh horizons and so on.

Maybe it's just having a adjacent is more than enough because it already captures the dynamics somehow. So it

would be interesting also to to look at it.

Yeah, that's true. Well, I have experimented with it. Um so to also include but then you need you basically

need to add more um legged input output samples. Um so then you also have older

samples. Um so then you also have older for that you have that you have multiple older forecasts for each target.

Um and um it it it well it is possible or um preliminary experiments show that it

does not harm um your your model. And so

you can also achieve um better forecast stability without uh losing forecast accuracy. Um but it's not doesn't

accuracy. Um but it's not doesn't doesn't um add too much value. Um

because by focusing on the adjacent forecasting origins well you indirectly also penalize for um large differences

between between um larger gaps in in in forecast between your forecasting origins.

Yes. And you also use lambda parameter to tune that. So in a way you know if you substitute the yesterday one with a

week ago one in a way that would mean just having a different lambda parameter I guess. So in a way lambda already

I guess. So in a way lambda already takes care of that to some extent right if I understand correctly. Yes. Yes.

It's possible. What is interesting is that um so if if you use the approach um that relies on the forecast combinations

um there in the full interpolation setting um there you you basically take into account all forecasts um all all

older forecasts sorry um and the way that you assign to um older forecasts um the case um so that's a that's a nice property of the of the full

interpolation strategy um where you assign more weights to um well the the the more recent updates basically.

Mhm. Good. Uh we have a question from the audience. The question is does the

the audience. The question is does the focus on accuracy increase the upper bound of probabilistic forecasting? Uh I'm I'm not sure that I

forecasting? Uh I'm I'm not sure that I fully understand the question. Do you

Yente?

Um, Muhammad, do you want to ask the question person? Let me actually give

question person? Let me actually give you the you can unmute yourself now and even turn on the camera and maybe ask

and clarify a bit more because I don't understand what you mean by the upper bound probabilistic forecasting.

Yeah, Muhammad, you can unmute yourself and ask the question if if you can.

Um, okay, you cannot. That's okay. Maybe

you can clarify what you mean by upper bound of probabilistic forecasting. In

the meantime, well, what do you think, Yant? Does the accuracy the focus on

Yant? Does the accuracy the focus on increasing accuracy impact uh the performance in terms of probabilistic forecasting in in this setting?

Um I'll rephrase it. Yeah, I think that it might be related to um to the paro front. So you you need to select a

front. So you you need to select a hyperparameter and that that basically controls a trade-off between accuracy and stability. Um and the assumption or

and stability. Um and the assumption or the focus on accuracy uh the assumption there is that well the fact that we update our forecasts implicitly assumes

that the benefit of updating is larger than the costs um that it induc that it induces. Right? So that's why um well a

induces. Right? So that's why um well a natural assumption here is to um look at methods that improve or look at at

values for your hyperparameter where you see an improvement in stability without um without an accomp well without a loss in in in quality forecast quality. Um

but of course there are also um you can you can also argue for a larger value of lambda.

So where you uh stabilize your that that will lead to more stable forecasts at the cost of a slight decrease in in in forecast quality. Um and for example,

forecast quality. Um and for example, one one possible ex or one one one um um explanation for for for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for for why you would opt for

a larger value for um for lambda um is that if your um model outputs very unstable forecasts and the forecast users start to fiddle around with the

forecasts in order to make them more stable, um then you will eventually end up with a with a lower accuracy as well.

So um that's why it might be um it might be yeah you you you might justify um in in that sense you you can maybe justify

a larger value for lambda which will lead in in which will lead to more stable forecasts. Not sure whether that

stable forecasts. Not sure whether that fully answers your question.

Yeah, we we didn't fully understand the question so we answered what we understood I guess.

Uh right. Any other questions from the

right. Any other questions from the audience? Let's

audience? Let's see. Uh I must also say that I'm a bit

see. Uh I must also say that I'm a bit puzzled with uh your idea of using neural network to sort of train it

on the cumulative function.

uh there's lots of different methods to generate uh quantiles, generate probabilistic forecasts, distributions and so on. So why why

specifically this one?

Um well yeah good question. Um that is because the um the the quantile or the conditional quantile approach and the conditional quantile function approach

um allows you to well you can approximate the CRPS and the Wersteerstein one distance by by looking

at specific quantiles. Um. Mhm. And the

waserstein one distance um which is used to um to it to well take into account instability in in in your optimization.

Um can be um well it it can be used for for um in in this in this setting. And then

you also have the the benefit that you have the flexibility of these quantile weighted um um what's one variants which allows you

to put more weight on um the center of the distribution or on the tails um which which might be uh interesting for or which might be which might be of interest for specific applications. for

example in inventory management where you maybe maybe mainly want to focus on stabilizing the tails of your distribution. So that that was the

distribution. So that that was the reason why we we why we opt for um conditional quantile functions um to to model our probabistic forecasts. Okay.

So you might be more interested in tail performance for example and that gives you enough flexibility for that. Okay,

Muhammad. Uh, yeah, if you have questions, please unmute yourself if you can hear me. Hello. Hello. Thank you for the presentation and let me just ask my

question. Uh, the question is about the

question. Uh, the question is about the forecast horizon. For example, if you

forecast horizon. For example, if you want to forecast next 12 months and we know that the importance of the for example first months is higher than uh

other months ahead. So have you had anything uh with regard to the importance of the points forecast in the future considering that if you want to

have a more stable uh forecast for the first months rather than 12 months are they are they treated the same or no we have some sort of exponential smoothing

like when we are training the model learning uh the previous actually data.

Mhm. Yeah. Yeah. Yeah. I see. Well

indeed not at the moment. Um so at the moment all forecast horizons are treated as equally important. Um so there is

it's not that you focus on stabilizing um the the the one step ahead forecasts um or or the the shorter um forecast

horizons. But I think it there might be

horizons. But I think it there might be um or it should be fairly um easy to extend the methods um to to well to

focus only on on on part of your uh forecast trace um so to speak.

Right. Does it answer your question?

Thank you very much. Thank you for somehow yes. Yeah. Thank you very much.

somehow yes. Yeah. Thank you very much.

Right. Thank you.

Right. We are running out of time. So

thanks a lot Yenti for your presentation. I really liked uh the

presentation. I really liked uh the ideas and uh especially the estimation part. I don't

part. I don't know grabbed my attention I would say.

So thanks a lot for presenting and thanks everyone for joining us today and uh we'll be back with more events in

future and see you then.

Bye-bye. Thank you. Have a good day.

Bye.

Loading...

Loading video analysis...