CMAF FFT: Generating stable forecasts: what it is, why it matters, and how to achieve it
By Lancaster CMAF
Summary
Topics Covered
- Forecast Instability Evades Accuracy Checks
- Stability Equals Higher Profits
- Combine for Stability Without Accuracy Loss
- Optimize Models for Dual Accuracy-Stability
Full Transcript
Hi everyone, welcome to Friday forecasting talks. Today we will we having Yent
talks. Today we will we having Yent Vanbele uh who will talk about generating stable forecasts and lots of details related to that. Before we do
that though, I wanted to say just couple of words about the center who organizes this events. Uh this slide shows all the
this events. Uh this slide shows all the members that we have at the moment. uh
with Robert Files, Professor Robert Files being the founder of the center, but uh he's not no longer involved as much as he used to. And you can see lots
of other great uh bright faces here, including me and Kandria. Kandrea is
organizing these events. Uh we provide a variety of services including bespoke bespoke short courses and um some of you
might have heard that we held an online uh course on uh business forecasting principles. We want to restart that in
principles. We want to restart that in May to hold another one in May. We have
some option opportunities for summer projects. We have expertise in software
projects. We have expertise in software development and so on so forth. You can
see lots of things here and our expertise spends from elements of marketing analytics to inventory management. But I would say that the
management. But I would say that the main focus is demand forecasting and supply chain forecasting. So topics
related to that. Uh so if you're interested in anything uh in those directions and want to work with us, please get in touch. How to get in
touch? Well, scan this QR code and this
touch? Well, scan this QR code and this will lead you to a page with a variety of options. Uh, oh yeah, we are still
of options. Uh, oh yeah, we are still sort of on Twitter, but we are not active there anymore. I should delete this actually. But LinkedIn, yes, the
this actually. But LinkedIn, yes, the center is present on LinkedIn and we sometimes post some things. You can
always send us an email, visit our glorious website and also we have YouTube channel where we are uploading videos from these events and also we
have initiative of recording educational videos in business forecasting principles. So these are the
principles. So these are the options and we can now move to the main event of today. So presentation of Yente and Yente can you please start sharing
your screen?
All right. Um thank you Ivon. Um so
welcome everyone to this uh to this talk. Um today we will uh talk about
talk. Um today we will uh talk about generating stable forecasts. Uh what it is, why it matters and how to achieve
it. Um so currently I'm um an FWO junior
it. Um so currently I'm um an FWO junior post-doctoral fellow at the um faculty of economics and business at KUven. And
um if you uh want to um or if you have questions after this presentation, you can always reach me via email. My email
address is here uh on the slide. Um but
I think we can get started. And um well before we dive into the into the content maybe first quick overview of the uh of
the talk today. So first I will um or we will look into um what is forecast instability or forecast stability. So we
will look at a definition and a little toy example to um to get familiar with the concept. Um next we will uh look
the concept. Um next we will uh look into well why does it matter? Why does
it matter? Why why should we care about forecast stability? And then finally in
forecast stability? And then finally in the well in the the third part of this presentation we will look at um approaches how to achieve um more stable
forecasts. Um and there we will look at
forecasts. Um and there we will look at um approaches or we will look at model selection, forecast combinations and finally also um model optimization. So
how to uh directly optimize your forecasting models um to make them produce more stable forecasts uh inherently more stable
forecasts. Um so well first of all what
forecasts. Um so well first of all what is it what is forecast instability? Um
and here well you can see the definition here on this slide. Um so rolling origin forecast instability is the variability in forecasts for a specific period caused by updating the forecast for this
period when new observations become available. So this is a forecast
available. So this is a forecast instability or forecast stability is a characteristic of forecasts in a context in which we generate multi uh multi-step
ahead forecasts on a rolling basis. And
this is also visualized here on this slide um where we see um one to six step ahead forecasts for um a time series
from the M3 competition and it's on a rolling basis. So our forecasting origin
rolling basis. So our forecasting origin um is updated each time um a new observation comes available. And if you
focus on the the red vertical line, it immediately is clear that um the the forecast distribution for this uh target
period is um well changes um sometimes uh a lot um depending on the forecasting origin that is
used. Well um why is this important? Um
used. Well um why is this important? Um
it's important to realize that this behavior um remains under the radar. So
it goes unnoticed if we only evaluate our forecasts from a forecast quality or a forecast accuracy perspective.
And to illustrate this, let us um first look at a toy example. Um the toy example here is um
example. Um the toy example here is um so the setting is is is visualized here on the slide. Um on the top we have the we have the table where um well where
the process um or of of of of generating a stable versus an unstable forecaster is um is detailed but let's not look too
much um let's not go too much into the details the visualizations will will actually help to understand this toy problem. So but the idea is well we
problem. So but the idea is well we start from nature. So we want to predict nature um at period t and here for for
for for the for this nature we we use um a normal distribution where we sample the mean from another normal distribution and so there is no difference between our stable forecaster
and our unstable forecaster as you can also see here um on the bottom of this slide. So the target distribution is a
slide. So the target distribution is a normal distribution and it is of course the same for the stable and the unstable forecaster. At um time t minus 3 we
forecaster. At um time t minus 3 we generate a first forecast for our target period t. And here what we do is
period t. And here what we do is basically we take this the ground through distribution because of course it's a simulation experiment and we add some distributional bias to it. So we
mix it with another normal distribution. Um and we do the same for
distribution. Um and we do the same for the stable and the unstable forecaster.
So there is no difference um up to this point uh between the stable and the unstable forecast. However, at time t minus 2, we
forecast. However, at time t minus 2, we update our forecast for our target period t and there we introduce a difference between the stable and the unstable
forecaster. Actually the the
forecaster. Actually the the distributional bias that we add to the stable and the unstable forecaster is um is is similar in terms of size but the
sign is different. And so here you can see that for the unstable forecaster the new forecast shifts to the opposite side of the ground through distribution whereas for the stable forecaster it
remains on the same side of the uh ground through distribution. If we then look at our
distribution. If we then look at our final forecast which we make at t minus one. So again for target period t then
one. So again for target period t then we see that the um forecasts are again the same. So the stable and the unstable
the same. So the stable and the unstable forecaster they generate the same forecast. But it is important to realize
forecast. But it is important to realize here that for the unstable forecaster this means that again our updates so the forecast shifts again to the opposite
side of the ground through distribution.
Well, why do we why do we simulate uh this behavior in this in in this manner?
Well, that is because typically forecast updates result in better forecast quality or forecast accuracy. Um because
they are based on a shorter forecast horizon. So that is the behavior that we
horizon. So that is the behavior that we try to mimic here. But this updating typically also introduces instability.
And this is the case for the stable and the unstable forecaster. The only
difference here is they'll the the the degree to which they introduce instability. And of course, a stable
instability. And of course, a stable forecaster introduces um not that much instability whereas the unstable forecaster uh introduces a lot of
instability.
If we now evaluate these uh forecasts, so we simulate um this process for 10,000 periods and then we evaluate the forecast in terms of forecast quality
and for forecast quality we use a CRPS so the conditional rank probability score which basically is the well which is basically the the MAE variant for
probabilistic forecasts and can be obtained by integrating over the quantile scores um for all quantile levels.
Um well what do we see here in terms of forecast quality that there is no difference between the unstable and the stable forecaster. Of course this is by
stable forecaster. Of course this is by design. However, more importantly, what
design. However, more importantly, what we also see is if we look at this um if we look at the forecast or if we evaluate the forecast from an
instability perspective and we do this by computing the waserstein one distance which which is a distance metric for um to to compute the dissimilarity between
two distributions um and here is also the the formula is also shown on the slide. So what we do is basically we um take the absolute difference between the quantile
forecasts for all quantile levels and we integrate over all these um differences and then we have the um the difference between our um forecast our
new forecast and our an older forecast for the same target period. We can do this for adjacent forecasting origins.
So by this I mean well if we compare t minus2 or the forecast generated for period t at t t minus2 and t minus3 or t minus 1 and t minus2 well that is
adjacent and non-adjacent is if we compare t minus one against t minus 3 and what we see here is that also again of course by construction that are um
that our unstable forecaster is um is way more or there there is clear difference between the uh in terms forecast stability between our unstable and our stable forecaster. With a stable
forecaster um well with a stable forecast resulting in lowerstein one metrics and here lower is better which is also the case for the
CRPS. All right. So here you can see
CRPS. All right. So here you can see that well we just um saw that it's possible that we have two different forecasters um which generate forecasts
that are of the same forecast quality but only differ in terms of their forecast um stability or forecast instability. And of course then the
instability. And of course then the question is why is this important? Why
why does it matter forecast stability?
Um and that is because forecasting is usually a means to an end. So usually we generate our forecasts um which are then used as inputs to a decision- making
problem. Um so we use we we generate
problem. Um so we use we we generate forecasts to inform decision making and in that sense we can um we can assume
that unstable forecasts will lead to larger adjustments um to plans and decisions that are based on the generated forecasts. For example, um
generated forecasts. For example, um assume that you are responsible for um for for for emergency evacuation planning um and
that you generate flood forecasts. If
your flood forecasts are very unstable and you um that that also means that your um your plans your your decisions um regarding emergency measures will
will change um throughout the the planning horizon. And this is of course
planning horizon. And this is of course not what you want. So it's it's not very desirable. Um same is true for inflation
desirable. Um same is true for inflation forecasting. For example, if your um if
forecasting. For example, if your um if your forecasts um for inflation are very widely or change widely through time,
that will harm your uh credibility. So
you will lose credibility and maybe forecast users will will lose trust in in the forecasting system.
Um and then finally a third example is uh is for demand forecasting where um unstable forecasts will typically lead to more or larger revisions to to supply plans and this will lead to higher
supply chain costs. And finally in in in a broader
costs. And finally in in in a broader sense unstable forecasts well they may erode trust in the forecasting system which then potentially leads to
unwarranted judgmental adjustments. Um
and here there is one one quote by Nord House from from a paper um published in 1987 already. He um phrases it as
1987 already. He um phrases it as follows. So moreover some forecasters
follows. So moreover some forecasters might smooth their forecasts as a service to customers. One forecaster
told me that he smoothed his forecast because a more accurate but jumpy forecast would drive his customers crazy. Of course, we know that
crazy. Of course, we know that unwarranted judgmental adjustments typically lead to a lower um or a decrease in forecast accuracy, but this is of so this is of course something
that we that we want to um that we want to minimize.
Um well to make this more more more more specific let us let us um take us let let us take our um our toy example um to
the next level and let us now use the stable and the unstable forecasts um in a decision-making problem. Um and to this end we will I will I will use a
news vendor problem with three ordering decisions. So um news vendor problems
decisions. So um news vendor problems are very well studied. The task here is to set the inventory levels for each period t so as to maximize expected
profit. And that basically means that at
profit. And that basically means that at each period t we are faced with uncertain demand for a perishable product. And um at the start of each
product. And um at the start of each period we receive all the orders that we make that we that we have made and all unsold units uh they will be worthless
at the end of the period. If we have product shortages, that basically results in lost revenue. So that's also something that you want to avoid. There
are um nice theoretical um results for the news vendor problem. And this is also the case for this uh this variant where we consider a news v vendor with
three ordering decisions. So basically
um we have for each period t and we make an initial order at t minus 3 which is um which can only be positive and it's
at a certain cost c but we can make two revisions at t minus 2 and t minus one and so we can either order more or
cancel part of our um ordered units and this will come with additional costs and so if you if you order more units you will have to pay premium. If you cancel
part of your um orders, then you will need to pay a cancellation fee. But of
course, you do not have to pay for the the cost of the of the of the ordered units of the canceled units, sorry.
So what is important here? Um so we will not look into the to the ma mathematical details, but what is important here is that there are optimal order quantities
that you can uh compute. So it's it's pos it's theoretically possible and of course you can use these uh optimal ordering quantities at each uh period to
make your orders or to make your ordering decisions for period t. Um but you do not have to as a as a
t. Um but you do not have to as a as a as a as a decision maker. You can also opt for other strategies. For example,
the anticipation strategy or the procrastination strategy where uh the first one means that we use the median forecast at t minus 3 and t minus 2 and
only rely on the optimal order quantity at t minus one. Whereas the
procrastination strategy there we only order at t minus one at a higher cost.
And remember um our stable and our unstable forecaster they generate the same forecasts um for uh the target period t at t minus one. So the
procrastination strategy here is like it's it's it's a check. Um so there shouldn't be a difference between the stable and the unstable forecaster. If we then look at the
forecaster. If we then look at the results for these three different strategies and here of course it's important to realize that um the um that the results will depend on the forecasts
used as inputs because the optimal order quantities are um or or a specific quantile level of your forecast distribution. Um so here we we we
distribution. Um so here we we we simulated um this this this um this process for 10,000 periods and we compute the uh
profit for a high profit margin and a low profit margin. And what is interesting here is that the stable forecasts for both profit margins and across strategies except for the
procrastination strategy um always results in higher profits. Um so it ranges from 0.85 85 to
3.47. So an increase in profits uh which
3.47. So an increase in profits uh which is quite substantial. And uh if you look at well the number of periods in which the profit is higher um for the sta if
you use a stable forecast versus the unstable forecasts it's about um 80% of the periods um in which you well will have a higher profit um if you use the
stable forecasts. So I think this is a
stable forecasts. So I think this is a this is a strong motivation to um to to to to make to clarify why um considering
forecast instability um is an important um is an important issue. All right, that brings us to the
issue. All right, that brings us to the the the third part of this presentation. How can we stabilize
presentation. How can we stabilize forecasts? How can we achieve um how how
forecasts? How can we achieve um how how can we achieve this? Um and there are basically three different strategies. Um
we can take forecast instability into account in model selection. We can um consider it or we
selection. We can um consider it or we we can use it as as a as an input to um forming model or to coming up with
forecast combinations. And then finally
forecast combinations. And then finally um we can also look at model optimization in terms of forecast accuracy and forecast stability. Um and
we will look into well we will we will look at um approaches here for point forecasts. So to stabilize point
forecasts. So to stabilize point forecasts uh to stabilize gshian probabilistic forecasts and to stabilize distribution free probabilistic
forecasts. Um so but first things first
forecasts. Um so but first things first let us first look at model selection um and this is uh so this this is based on
um a working paper by um Kandria and Nikos Kurensus um in which they define forecast congruence um a quantity to align
forecasts and inventory decisions. So
they focus on the impact of forecast congrence on inventory decisions. Um and
basically well forecast congruence is very similar uh to uh forecasts instability or to how I um introduced forecast instability
before. Um the only difference here is
before. Um the only difference here is that they um they they compute it by looking at the variance um across um
forecast for a specific target period t but for different forecast origins. So
basically if if we if we um if we look at the the the visualization or if we take the visualization from the first slide again and we look at the vertical
red line um then they will use all the forecasts for the same target um for for the for the target period t and they will compute the variance and that is
our forecast congrence. So of course lower is better. Um and what they show in the paper is that um they use inventory simulations for
simulated ARMA demands but they also use a real data set uh from an FMCG manufacturer and what they show or what they conclude is that accounting
accounting for congruence in addition to accuracy in model selection so um it's it's it's it's only used in model selection um can lead to favorable
inventory performance And why is this important? Well, because then we do not
important? Well, because then we do not need to rely on expensive inventory simulations. So it's it can be used as a
simulations. So it's it can be used as a proxy. So combining forecast accuracy
proxy. So combining forecast accuracy and forecast instability um to inform model selection can be used as a proxy
um for expensive inventory simulations. Uh and this is just one
simulations. Uh and this is just one example for the um simulated ARMA seasonal ARMA demand.
um which is shown here.
um and there they will they they generate forecasts for um with with different ETS variants but also with the
um MAPA algorithm and um the DGP because it's a simulation experiment of course is is is a is is not is a theoretical uh
forecast which which we cannot um which we cannot make in in in reality but it's a it's a good it's a it's it's a nice
check here. Um and what we see is that
check here. Um and what we see is that while uh MABA only um ranks fourth in terms of RMSSSE, so in terms of
accuracy, um it ranks it it's it's um it's the best method in terms of congruence, so in terms of stability.
And if we then look at the uh inventory performance which is uh quantifi which is quantified here by looking at the uh scaled stock on hand and the scaled loss
sales um then we see that MAPA outperforms the other methods. So um
because it's it it achieves um pro it achieves the same target service level um by um with with less stock on hand uh
and the difference in loss sales is is is negligible. Um so this is a very
is negligible. Um so this is a very interesting result. Um and what is also
interesting result. Um and what is also important here or what is also interesting here is that if we look at the congruence of the DGP which is a theoretical quantity which we can which
we cannot know in in in practice um the MAPA actually is over congruence. So
it's it's more stable than our underlying DGP. Um but this does not
underlying DGP. Um but this does not harm um the performance in terms of um in terms of inventory decision
making. Um so this is um for uh
making. Um so this is um for uh forecasts um for for model selection.
Another strategy that we can use is um forecast combination and this is based on a on another working paper by um BH
and um co-authors and what they propose is um a post-processing approach to uh stabilize newly generated forecasts by combining
them with um the most recent previous forecast for the same targets using a weighted average. So what we basically
weighted average. So what we basically do is well if we want to generate a forecast for period t then we look at uh then we use an older forecast for that
same target period to um to stabilize the new forecasts and uh this is what you see here they have two variants partial interpolation and full
interpolation and partial interpolation.
So where um the y tilda is the the stabilized forecasts. So partial
stabilized forecasts. So partial interpolation is a is a weighted average of um the the the newly generated forecast and the old original
forecasts. Whereas full interpolation is
forecasts. Whereas full interpolation is um is there we stabilize our forecasts by uh combining the newly generated forecast with the old stabilized
forecast. So this is this is um there
forecast. So this is this is um there you take into account all forecasts that you have made previously for the same targets. Here you have a single
targets. Here you have a single hyperparameter that controls a trade-off between stability and accuracy. So the
WS and this is of course a parameter that you need to that you need to uh that you need to tune or that you need to select based on some procedure. But
what is interesting here um is that this approach is model agnostic. Um
so it doesn't matter what how the forecast says. So the y heads here are
forecast says. So the y heads here are generated. It can be um a local
generated. It can be um a local forecasting they can originate from a local forecasting method, global forecasting method or even um a fully
judgmental uh forecasting pipeline.
Um here we we we look at one experiment that they report in the paper and it's uh it's an experiment experiment on the M4 monthly data set and they um for for
the for base forecasts that are generated by using the global model NBS. Um and what we can see here uh on
NBS. Um and what we can see here uh on in in in the in the figure is um they show they they visualize a paro plot uh
parto sorry. Um so on the x-axis we have
parto sorry. Um so on the x-axis we have accuracy in terms of the symmetric mean absolute percentage error. On the y-axis we have uh instability in terms of the
symmetric mean absolute percentage change which is just the same formula as the um symmetric mate but where we replace the actual by an older forecast
and for both metrics at lower is better um and what is interesting here to see is that it's it's not very easy um I think because it's yep the the
the it's it's um The text is quite small but in the uh upper left corner we have the base forecast. So the the the the base and
forecast. So the the the the base and beats forecast. Um,
forecast. Um, and if we draw a vertical line um to the to so parallel to the the the
y-axis, methods that lie to the left of the that line are um more stable but um do not um well are more stable and
are at least as accurate as the base forecast. And you can see here that
forecast. And you can see here that partial interpolation and full interpolation with a with a weighted uh with a WS equal to 0.2 so with with a hyperparameter equal to
20% results in uh in an improvement in stability without harming accuracy. So this is one way to um
accuracy. So this is one way to um select a hyperparameter for to select a value for for the hyperparameter WS. But
in principle actually you can um select any method on the paro front um which is obtained here via um a rolling origin evaluation on a validation set
um and it's up to the um well to the to the user to the forecast user to um determine or to trade uh stability for uh for accuracy. But it's also possible
to improve stability without um without well a considerable loss in forecast accuracy.
Um and then finally we can also um look at approaches that uh or we can also use approaches where we directly optimize our model to generate um more stable
forecasts. And the key element here is
forecasts. And the key element here is um that we cause the problem as a biobjective optimization task um by using a composite loss function. And so
the idea is very simple and we have our loss function and now we have two loss terms. One um term that looks at forecast errors so to optimize accuracy
or quality and another term that looks at forecast instability um where we compare forecasts so y head to an older
forecast. So we quantify dissimilarities
forecast. So we quantify dissimilarities between forecasts for the same target um but originating from different forecasting origins. And again here we
forecasting origins. And again here we have one hyperparameter which is here lambda. Um and we can ask well there are
lambda. Um and we can ask well there are two questions. Are there lambda values
two questions. Are there lambda values which lead to forecasts that are as accurate but more stable than those for lambda equal to zero which basically means that we only optimize for um for
accuracy or forecast quality. And how should we
quality. And how should we operationalize this operationalize this?
And this is basically where the the difficulty lies here because the composite loss function the idea is very uh straightforward. But how to operation
uh straightforward. But how to operation how to operationalize this? Well, this
that that is where the the difficulty lies here in in in this approach.
Um and let us first look at how to operationalize this for uh point forecasts and in this paper um so this
is from uh a paper published in IGF and I'm one of the authors um there we propose NBS s where the additional S
stands for a stabilized version of NBS and there we modify the NBS model in order to generate inherently more stable forecasts.
Um what is important here is to well we will not look into the NBS model in in in in detail. Um but it's important to mention here that it's a global deep
learning model. So it's a model that is
learning model. So it's a model that is um estimated what with a so it's one it's a model with um a set of global model parameters that are estimated
jointly across time series.
Um and this uh model is used for um univariate time series point forecasting. And the uh key idea is the
forecasting. And the uh key idea is the doubly residual stacking topology where we have a residual beckas and a partial forecast. Um and this residual beckas
forecast. Um and this residual beckas basically serves to filter the the the signal the input signal um as you go deeper into the network so as to only
focus on the uh part of the time series that is not explained yet in uh earlier um in in in in earlier blocks. So how
does this model work? Well, we we have a time series and we sample um an input output window from our uh time series.
So with a look back window uh which is used to generate forecasts for the uh observations in the forecast period and you feed this to the model um
and the model output is basically a one to eight step ahead forecasts for um the times you for your um for your forecast
periods. How can we um stabilize this
periods. How can we um stabilize this model? um well by using the composite
model? um well by using the composite loss function and um for the error component here um we use the
RMSSE so it's um well fairly standard and for the instability loss component we use a variant of the RMSSSE where we
replace the error uh where we replace sorry the actual by an older forecasts for the same target period. Um so in
that way we can quantify both forecast error and forecast instability. However
an important um thing here is that we in order to quantify um or in order to be able to compute the RMSSC we need a forecast for period t
that is generated at two different forecasting origins. So in order to
forecasting origins. So in order to train this model we use for each input output sample also um a leg sample. So
where we um where we take the for for each input output sample we also consider the output sample input output sample sorry with the forecasting origin
shifted one period um backwards in time and if you do this well then you can compute the RMSSE for um both input output samples and you also can compute
the RMSSSC for the overlapping time periods in these um in these forecast periods. periods for these two um for
periods. periods for these two um for these two related samples. Well,
samples. Well, and does this work? Um well, if we if we look at this figure here, this is um how we tune the the the lambda
parameter. So the the hyperparameter on
parameter. So the the hyperparameter on a validation set. Um here we can see that well if we um take into account forecast instability
um only slightly then it it's it um it's clear that we can of course improve stability but also that our accuracy um
improves at first um and it only well gets worse if we uh assign too much weight to um forecast instability in the
um in the composite loss function.
Um so here we we can see that it is possible so that there are certain lambda values for which um or which result in uh improvements in forecast stability without harming um the
forecast accuracy. Um and these are um the
accuracy. Um and these are um the results for an experiment on the M4 monthly data set. And as you can see here um on the left we have the MCB
results. So the the um the results in
results. So the the um the results in terms of rankings um ac across um across time series um and on the right we have
the um results MCB results in terms of stability. And what we see here is that
stability. And what we see here is that NBS as so the stabilized version of MBS um results in significantly more accurate forecasts uh and but but also
significantly more stable forecasts. So
it's um yeah it's that is exactly um what we want what we wanted to achieve. Um if we take this one step
achieve. Um if we take this one step further um and this is a working paper um which is available on archive um
there we um we look so we we look at the hypothesis or the hypothesis in this working paper is that um dynamically
tuning lambda. So instead of just
tuning lambda. So instead of just selecting a single value um by by as we did in the um on the previous slide um we can also dynamically tune our lambda
which basically means that we um assign a different value to lambda in each training iteration in each iteration of our stoastic gradient descent um
procedure. Um and if we dynamically tune
procedure. Um and if we dynamically tune uh Lambda, the idea is well that we can prioritize forecast accuracy during the early stages of training uh keeping in
mind the goal of improving forecast stability without significantly um sacrificing accuracy. And well the
sacrificing accuracy. And well the hypothesis is that this can lead to improved performance um compared to using a tuned static uh value for lambda.
Um and I will not go into the details but um there are a lot of u dynamic loss waiting algorithms. So which are algorithms um where that dynamically
tune uh which can be used to dynamically tune the lambda value. So the
hyperparameter um there are a lot of um them available um and we see here again an experiment on the M4 monthly data set um that we
can use them uh to um improve um our to even to further improve our forecasts in terms of um
stability over the um over the the method with a static um tuned Lambda value um without significantly ly um
harming accuracy. And on the left here,
harming accuracy. And on the left here, this is for one dynamic loss waiting method. What is interesting to see here
method. What is interesting to see here is on the y on the x-axis, we see the uh the we visualize a train the the
training iteration. So um from zero to
training iteration. So um from zero to um approximately 20,000 here um weight updates and on the y-axis we have our um
our lamb. But we also visualize the
our lamb. But we also visualize the orange um shaded area basically visualizes the weight that we assign um
to stability which is a result of the um so which which is an output of the dynamic loss weighting algorithm and it's interesting to uh to see that this corresponds to our hypothesis. So
initially it uh it assigns less weight to stability. So um first we need to
to stability. So um first we need to achieve um a decent forecast accuracy in order for forecast stability to become
um to become important or to become helpful in in in training our model. Um we can
model. Um we can also do this for gshian forecasts. Um
and so here the idea is how can we operationalize this? Well, we can first
operationalize this? Well, we can first modify the NB beats model by making it a probabilistic forecasting model by just
replacing the um the point predictions by now um median prediction but also a prediction for the variance for each period in our forecast
window.
Um and we can um we can um again use the composite loss function. And now we operationalize this
function. And now we operationalize this by um by well for the error component we use a likelihood based optimization uh similar to the deep AR um um way of
optimizing your um your your network. Um and for the instability loss
network. Um and for the instability loss component well there we can rely on closed form solutions for um the coolback lier divergence and the
waserstein two distance between two caution distributions. So this is in a
caution distributions. So this is in a caution setting um there are closed form distributions close form solutions available to um quantify the differences
between two distributions. And so this can readily be used um in in your loss function to optimize your model for both accuracy and uh
stability. And again here we have um an
stability. And again here we have um an experiment on the M4 multi data set. Um
and well the the the the conclusion here is that um we can so that we can use KL divergence but we can also use waserstein the waserstein two distance or we can even um only stabilize the
mean of our distributions and uh in all cases um it results or it is possible to achieve um
improvements in both forecast accuracy um or at least in in um in improvements in forecast stability without considerably harming forecast
quality. And then finally um we can take
quality. And then finally um we can take this one step further um and consider distribution free probabilistic forecasts. Uh this is still work in
forecasts. Uh this is still work in progress. So there is no um there is no
progress. So there is no um there is no working paper or no published paper available yet for this. Um but here the idea is that we um also operationalize
this for um distribution free probabilistic forecasts where we um model our forecasts as conditional quantile functions
um which are approximated by um linear isotonic regression splines. So
peacewise peacewise linear functions and these peacewise linear functions they are parameterized by training a neural network. Um and then again well we can
network. Um and then again well we can um rely on the CRPS to as as a as a as an error component or as an quality
component in our in our loss function and to quantify forecast instability. we
can rely on the Waserstein one distance or um another metric um to um to basically follow the same procedure to uh combine both um forecast accuracy or
forecast quality and forecast instability uh in the optimization of your model. And um why is this an
your model. And um why is this an interesting approach? Well um because
interesting approach? Well um because for the Waserstein one distance remember from the first slide the waserstein one distance um can be
com well is is calculated by um looking at the um the differences between um quant between specific quantiles of the two distributions and then well you
basically aggregate all those differences. uh but you can also use
differences. uh but you can also use quantile weighted versions of the waserstein one distance which allows you to put more weight on u the center of the distribution or on the tails of the
distribution. Um so basically this
distribution. Um so basically this allows us to focus our um well our our stabilization procedure on um specific
parts of the distribution. Um and well this this is a this is a nice feature of this um of this distribution free uh approach that is based on conditional quantile
functions. Um all right so uh some key
functions. Um all right so uh some key takeaways um first of all well forecast stability um can be considered as an
additional evaluation criterion to reduce the gap between uh forecast evaluations um and utility based evaluations of forecasts. Um second forecast
forecasts. Um second forecast instability may go unnoticed if uh forecasts are evaluated solely from a forecast quality perspective. Um and then third, well,
perspective. Um and then third, well, why is all this important? Um the idea is that more stable forecasts will lead to fewer and smaller adjustments uh to
the plans that you drafted based on your forecasts. Um and uh that it will also
forecasts. Um and uh that it will also uh enhance trust in the forecasting system. And then finally um well we we
system. And then finally um well we we uh we showed that we can stabilize forecasts without causing a considerable loss in forecast quality um either by uh
using um by using it in a model selection um procedure by using forecast combinations or by directly optimizing your models for both forecast quality
and forecast stability. So that was it. Thank you.
stability. So that was it. Thank you.
Thanks. Uh now we are ready to accept any questions that people might have and just to start this off
um I'll ask something uh that interests me. So um when you apply this composite
me. So um when you apply this composite loss function that you discussed in in the last part of the presentation, what do you think happens with the with
parameters of models? Uh have you done any sorts of investigations on that?
Um, sorry. So, um, do you mean that it acts
sorry. So, um, do you mean that it acts as a sort of shrinkage um estimator or is that what you are um what you're Yeah, that's that's that's what I
thought that it might act like this one, but uh that's my hunch. I don't know.
So, I'm interested in your opinion.
That's a good question. That's a good question. Well, we have we haven't
question. Well, we have we haven't looked into it. Um, and it's also well, it's not very easy to um to study it because it's it's a deep learning model.
So, there are um well, millions of parameters. Um, but yeah, it's it's an
parameters. Um, but yeah, it's it's an interesting um topic to to to explore further I would say.
Okay. What the dynamics are or the the effects on on the on the parameters on the model parameters. Yeah, that it would it would be interesting to explore
indeed. Um you've mentioned uh
indeed. Um you've mentioned uh stability sort of different measures for stability, right? But if I understand
stability, right? But if I understand correctly, the ones that you discuss mainly focus on, you know, just forecasts produced from yesterday and
from day before yesterday or did I misunderstand?
Uhhuh. Can you take more of them into account somehow?
Um yes that is possible and that is exactly what the difference between well the the the definition of forecast stability that I um that I adopt and um
the definition of forecast congruence and there they take into account um all the forecasts basically that are generated for a target time period um yeah a
target period um that originate from different forecasting origins Why do we um use the um well the setting
in which we only compare adjacent forecasting origins? Um well that is
forecasting origins? Um well that is because if we go here to the um to the toy example again um then we see that
basically this is or but well further investigation is is is needed um is needed there but basically it is the um it is this instability. So the
adjacent uh between the adjacent forecasting origins that matters because well if you make the assumption that each time you update your forecast you
also use it in decision making because then it's basically that instability that affects the that that that impacts um your decision- making
um or your decision basically.
Okay. So that's that's the reasoning behind the adjacent forecasting origins.
Mhm. I wonder also whether you need actually to make things more complicated and to add more uh horizons and so on.
Maybe it's just having a adjacent is more than enough because it already captures the dynamics somehow. So it
would be interesting also to to look at it.
Yeah, that's true. Well, I have experimented with it. Um so to also include but then you need you basically
need to add more um legged input output samples. Um so then you also have older
samples. Um so then you also have older for that you have that you have multiple older forecasts for each target.
Um and um it it it well it is possible or um preliminary experiments show that it
does not harm um your your model. And so
you can also achieve um better forecast stability without uh losing forecast accuracy. Um but it's not doesn't
accuracy. Um but it's not doesn't doesn't um add too much value. Um
because by focusing on the adjacent forecasting origins well you indirectly also penalize for um large differences
between between um larger gaps in in in forecast between your forecasting origins.
Yes. And you also use lambda parameter to tune that. So in a way you know if you substitute the yesterday one with a
week ago one in a way that would mean just having a different lambda parameter I guess. So in a way lambda already
I guess. So in a way lambda already takes care of that to some extent right if I understand correctly. Yes. Yes.
It's possible. What is interesting is that um so if if you use the approach um that relies on the forecast combinations
um there in the full interpolation setting um there you you basically take into account all forecasts um all all
older forecasts sorry um and the way that you assign to um older forecasts um the case um so that's a that's a nice property of the of the full
interpolation strategy um where you assign more weights to um well the the the more recent updates basically.
Mhm. Good. Uh we have a question from the audience. The question is does the
the audience. The question is does the focus on accuracy increase the upper bound of probabilistic forecasting? Uh I'm I'm not sure that I
forecasting? Uh I'm I'm not sure that I fully understand the question. Do you
Yente?
Um, Muhammad, do you want to ask the question person? Let me actually give
question person? Let me actually give you the you can unmute yourself now and even turn on the camera and maybe ask
and clarify a bit more because I don't understand what you mean by the upper bound probabilistic forecasting.
Yeah, Muhammad, you can unmute yourself and ask the question if if you can.
Um, okay, you cannot. That's okay. Maybe
you can clarify what you mean by upper bound of probabilistic forecasting. In
the meantime, well, what do you think, Yant? Does the accuracy the focus on
Yant? Does the accuracy the focus on increasing accuracy impact uh the performance in terms of probabilistic forecasting in in this setting?
Um I'll rephrase it. Yeah, I think that it might be related to um to the paro front. So you you need to select a
front. So you you need to select a hyperparameter and that that basically controls a trade-off between accuracy and stability. Um and the assumption or
and stability. Um and the assumption or the focus on accuracy uh the assumption there is that well the fact that we update our forecasts implicitly assumes
that the benefit of updating is larger than the costs um that it induc that it induces. Right? So that's why um well a
induces. Right? So that's why um well a natural assumption here is to um look at methods that improve or look at at
values for your hyperparameter where you see an improvement in stability without um without an accomp well without a loss in in in quality forecast quality. Um
but of course there are also um you can you can also argue for a larger value of lambda.
So where you uh stabilize your that that will lead to more stable forecasts at the cost of a slight decrease in in in forecast quality. Um and for example,
forecast quality. Um and for example, one one possible ex or one one one um um explanation for for for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for or for for why you would opt for
a larger value for um for lambda um is that if your um model outputs very unstable forecasts and the forecast users start to fiddle around with the
forecasts in order to make them more stable, um then you will eventually end up with a with a lower accuracy as well.
So um that's why it might be um it might be yeah you you you might justify um in in that sense you you can maybe justify
a larger value for lambda which will lead in in which will lead to more stable forecasts. Not sure whether that
stable forecasts. Not sure whether that fully answers your question.
Yeah, we we didn't fully understand the question so we answered what we understood I guess.
Uh right. Any other questions from the
right. Any other questions from the audience? Let's
audience? Let's see. Uh I must also say that I'm a bit
see. Uh I must also say that I'm a bit puzzled with uh your idea of using neural network to sort of train it
on the cumulative function.
uh there's lots of different methods to generate uh quantiles, generate probabilistic forecasts, distributions and so on. So why why
specifically this one?
Um well yeah good question. Um that is because the um the the quantile or the conditional quantile approach and the conditional quantile function approach
um allows you to well you can approximate the CRPS and the Wersteerstein one distance by by looking
at specific quantiles. Um. Mhm. And the
waserstein one distance um which is used to um to it to well take into account instability in in in your optimization.
Um can be um well it it can be used for for um in in this in this setting. And then
you also have the the benefit that you have the flexibility of these quantile weighted um um what's one variants which allows you
to put more weight on um the center of the distribution or on the tails um which which might be uh interesting for or which might be which might be of interest for specific applications. for
example in inventory management where you maybe maybe mainly want to focus on stabilizing the tails of your distribution. So that that was the
distribution. So that that was the reason why we we why we opt for um conditional quantile functions um to to model our probabistic forecasts. Okay.
So you might be more interested in tail performance for example and that gives you enough flexibility for that. Okay,
Muhammad. Uh, yeah, if you have questions, please unmute yourself if you can hear me. Hello. Hello. Thank you for the presentation and let me just ask my
question. Uh, the question is about the
question. Uh, the question is about the forecast horizon. For example, if you
forecast horizon. For example, if you want to forecast next 12 months and we know that the importance of the for example first months is higher than uh
other months ahead. So have you had anything uh with regard to the importance of the points forecast in the future considering that if you want to
have a more stable uh forecast for the first months rather than 12 months are they are they treated the same or no we have some sort of exponential smoothing
like when we are training the model learning uh the previous actually data.
Mhm. Yeah. Yeah. Yeah. I see. Well
indeed not at the moment. Um so at the moment all forecast horizons are treated as equally important. Um so there is
it's not that you focus on stabilizing um the the the one step ahead forecasts um or or the the shorter um forecast
horizons. But I think it there might be
horizons. But I think it there might be um or it should be fairly um easy to extend the methods um to to well to
focus only on on on part of your uh forecast trace um so to speak.
Right. Does it answer your question?
Thank you very much. Thank you for somehow yes. Yeah. Thank you very much.
somehow yes. Yeah. Thank you very much.
Right. Thank you.
Right. We are running out of time. So
thanks a lot Yenti for your presentation. I really liked uh the
presentation. I really liked uh the ideas and uh especially the estimation part. I don't
part. I don't know grabbed my attention I would say.
So thanks a lot for presenting and thanks everyone for joining us today and uh we'll be back with more events in
future and see you then.
Bye-bye. Thank you. Have a good day.
Bye.
Loading video analysis...