Models, Mathematics and Data Science: How to Make Sure We're Answering the Right Questions
By Hertie School Data Science Lab
Summary
Topics Covered
- Models Span Interpolatory to Extrapolatory Spectrum
- Loss Functions Embed Value Judgments
- Cat Models Can't Reconstruct Dog Reality
- Prove No Plausible Model Shows Opposite
- Diversity Expands Plausible Model Boundaries
Full Transcript
good afternoon everyone thank you so much for joining us today for our civica data science seminar series on this International women's day
um and uh I just want to tell you a couple of the um house rules uh so you are all muted or like we ask you to mute yourself if
you're not muted during the seminar and uh the uh the zoom call is being recorded you all should have got notification that the zoom call is being
recorded uh so if you do want to interact in the Q a section when we will open the Q a section and you will unmute yourself or like turn your camera on
you're automatically agreeing for your voice and your image being recorded um so that's just a quick note uh and
I'm extremely extremely happy to introduce Dr Erica Thompson uh it's a person who's part of the data science
seminar uh series uh uh a host team and and we're very happy to have her uh at our seminar today she's a senior policy fellow in ethics of modeling and
simulator at the LLC data Science Institute Erica's research is centered around the use of mathematical and computational models to inform a real world decision
making and today Erica is going to introduce present uh her um just recently published a book Escape from
Mortal land and we're really really happy to have her today and Erica the floor is yours and you can share a screen now and yeah thank you everyone
fantastic thank you all birds good to be here I'll just uh share my screen so hopefully now you can see that
um great so I'm going to talk to you today about my new book which is called escape from model land um and it's about how we how we get out of Model land which is where we are when
we're sort of inside the computer or inside our model or inside the system of equations that we might have written to describe the real world how we get out of modern land and make statements that
are genuinely relevant to the real world and that can help us to make better decisions than we would have done without that information and so I'm going to try to talk with more of a data
science flavor today to talk about hopefully things that are relevant for this kind of audience um but with a focus on the big picture really thinking about how we make sure
that we are answering the right kind of questions when we when we take our data and we try to use it to construct a model and think about influencing the real world
okay so I thought I'd start you know quite basic by uh discussing how we get confidence in our models so if you think um about the the phone in your pocket
you pick it up and you want to know what the weather's going to be like tomorrow um I think it might snow in London tomorrow actually so I don't know what it's like where you are but you take you
pick it up and you look at the forecast uh and you would like to make a decision based on that maybe you want to decide how to get to work or whether to take an
umbrella or whether to hold an event um and but how do you know whether that forecast is any good how do you know whether it's going to be reliable well you probably have some degree of
experience right so you have consulted this weather forecast on many occasions and you have found that it is typically maybe reasonably good I know maybe
you're a bit skeptical about it maybe you're you take a few different forecasts and sort of triangulate them a bit um but you you have experience a new uh
and you bring that to your interpretation of how good you think this forecast is going to be and probably the forecast provider also has some kind of statistics
and so you can put that together over time you know if we were doing a study of it you could you could collect that data more systematically and you could
say uh on what percentage of occasions for example if someone was forecast or rain was forecast with a certain probability how often does that actually happen in practice and so you can build
up a statistical picture of when it's likely to be right and when it's likely to be less confident or less accurate um and then you also make informed
judgments about the quality of the model so you can say uh it's not just because of my prior experience it's also because I you know because I trust the the UK
Met Office for example or it's because I think that this model is based on uh fundamental physical laws and physical processes in which I have confidence I expect them to continue to be valid in
the future and therefore I have some degree of expectation that this model is is doing something sensible and not something crazy so there are different ways of of
generating confidence in in forecasts and confidence in the model that produced that forecast but there are different kinds of forecasts and there are different kinds
of models and I want to distinguish here between uh you know not a binary but a spectrum of of different kinds of forecasts and on one end of that Spectrum on the left of this picture are
are forecasts that are more like uh the weather or they're based on the laws of physics so these are forecasts where we expect that tomorrow is not fundamentally different from today where
we have a lot of data that we can use to evaluate the performance of the model we can do that continuously we can keep checking we get more out of sample data every day out of our weather forecasts
and every day that gets used and fed back into improving the quality of the model um and so what that means is that our uncertainty is generically our uncertainty is
quantifiable because our uncertainty is is measurement error and it's um its uncertainties based on the uh the limitations of the data that we've
collected it's not uncertainty that's more that goes deeper than that so those on the on the left hand side of this diagram are um situations that I would call
essentially interpolatory where the models that we're making are basically interpolatory and they are very much data driven and then at the other end of the
spectrum is models which I would call essentially extrapolatory so those are ones where the for example the underlying conditions might be changing in ways that are not fully understood so
that could be uh the climate for example so that's something that I have worked on quite a lot in the past is thinking about models of weather and models of climate and one of the problems with uh
forecasting the climate is that even though obviously we expect the laws of physics to remain the same in the future we don't know that the the calibration
of our models that we have done based on observations of 20th century climate we don't necessarily know that those will be the appropriate parameterizations and
model representations in a climate which is actually quite radically different as we're expecting for the end of this century and so that means that our
confidence is not fully data driven our confidence is based more on these other aspects of sort of expecting the model
quality to be good so we are expecting um for example that the laws of physics will remain the same um and we maybe take multiple models we try to make different models and we have
to make expert judgments about the degree to which we expect the models to be to be good in future the degree to which we expect that past performance is
indicative of future success so with the data-driven models on the left hand side we have a we are making an expert judgment that past performance is indicative of future success with high
confidence and on the right hand side where we are more extrapolatory we say past performance hopefully is useful um you know obviously we wouldn't trust a model that was extremely bad in the
past but the past performance is not a good guarantee of future success and that we have to incorporate this expert judgment um so I'm hoping that makes sense to you
all uh and that you can sort of see the distinction that I'm making between these two types of models and also that there is a that there is a spectrum and that in most cases we'll be somewhere in the middle so the highly data-driven
approaches may be yes for a far left hand side and the highly extrapolatory or maybe social systems will be at the at the far right hand side but in general we're somewhere in the middle and we have to do a bit of a bit of both
a bit of everything um so just to give a few examples I think because that might be helpful also in situating this um in that interpolatory category the left-hand side
um I would imagine of the sort of things that quite a lot of you might be working on um so for example the the large language models depending on what you're using them for
I mean if you're using them to do something that is going beyond the training sales they are not going to be able to construct
um you know imaginative responses to uh to Solutions on a human scale things but they are going to be extremely good at uh summarizing data for which we have
strong um data-driven representations of the relationship between different words and the construction of different sentences um so weather forecasting is in this
category uh the basic rules of physics um if you think about sort of Industrial and Commercial uses mechanical testing optimization of manufacturing
transport timetabling online marketing these are sorts of things where we can expect to have lots of data and we expect that yesterday's data are relevant for tomorrow's forecast
and this is where Ai and machine learning approaches can be extremely valuable because they can help us to
find and use those patterns in that are in the data that we wouldn't necessarily be able to get if we were coming in as a human and imposing our own expectations
of the structure on it we can we can find the less the less obvious structures in these in these data and then at the other end um and this is probably more the focus
of my work but I'm interested in in both ends of this of this spectrum and the extrapolatory category there are the what I sort of think of as large-scale prediction questions and these are
things like climate change impacts energy policy um the pandemic is an interesting example because I think on on the maybe one to two week time scale we have something which is effectively more
interpolatory and data driven um and because the outcomes over the next two to three weeks are baked into infections that have already happened
and so in principle it's possible to uh measure that and to understand it and to estimate it um but if we're looking at the pandemic say on a six-month time scale and we
want to know how it will evolve how the numbers of infections will change over six months well that's something that we in principle cannot possibly get off of past data because it is also a function
of um public attitudes it's a function of policies and uh government decisions um it's a function of social behaviors
which are inherently not really predictable or you know potentially predictable in some ways but always subject to the radical uncertainty that something might happen that just changes the way people think about things
um then there's things like uh financial markets bank and insurance regulation is something that I've been looking at um security cyber security or the use of
new technologies so you know a huge range and obviously you know you you could think of many many more examples in both of these categories and perhaps we can come back in the discussion if
anyone has any ideas about that but maybe a point to to sort of stress here is that um if we're over at this end of the spectrum we cannot expect purely
data-driven approaches to give us an answer um artificial intelligence essentially always needs some kind of help with extrapolatory questions because it
requires these expert judgments and it requires these assumptions to be built in now you can do that it doesn't mean that you can't attack these problems with AI and machine learning approaches but it means that we have to be careful
about how we um you know how being transparent about how we embed value judgments um that are going to be necessary and how we how we
approach the question of making these expert judgments okay so so thinking then a bit more about evaluation um so how we would like it to work uh
and hopefully many of you are in this lucky category um if you if you're on the the left hand end of the spectrum you're in in the interpolatory data-driven regime then
then you can sort of proceed with model evaluation in the in the the textbook where you can say I'm going to gather lots of data I'll keep some back for
testing or I'll plan to collect some more for testing I train the model using the data but also using my understanding of the system you find out that the model is good but it's not perfect
because it never is because it can't be because all models are wrong as a famous statistician once said uh and then you use your held out data
to evaluate the performance of the model on the on what you're interested in hopefully uh you can use that to improve the model but even if you can't use it
to improve the model at least you have it to uh to understand when you expect to have confidence and when you expect not to have confidence you know this none of none of this implies that the model is
perfect but you don't need a perfect model you only need to know when it's reliable and when it's not reliable so then hopefully you know if you're on a short time scale you'll be gathering
new out of sample data and so you can keep doing this process uh and you'll be able to generate ranges which are actionable in terms of being able to use
them to inform decision support um and hopefully you will be confident about those uncertainty ranges you'll be confident that if you if you generate a
90 confidence interval for the target of your prediction that on 90 of occasions the actual outturn will be within your 90 confidence interval and on the other
10 of occasions of course it will be outside that interval but that was what you expected so that's okay um and if you have that then you have something which is genuinely useful
because you understand and you're confident in it but this isn't always the case um I mean the maybe the other thing to
say here is that if you when you are doing this even even if you're in this happy situation where we have lots of data we can use it and the model is and
the model is evaluatable um we are also introducing expert judgments so I'll just say very briefly um sort of illustrate how that happens um and again this is something I'd be
interested to hear feedback on from the audience at the end as to whether this is something you consider um so here's a little sketch of our of a model you know you you have inputs and
you have some kind of output I've drawn this in uh two Dimensions but of course there could be hundreds or thousands of Dimensions it's just harder to draw that on a on a computer screen and the model
the red line here is what links the input with the output so the model is just a function it could be multi-valued I mean it could be whatever it could be something extremely complicated but I've drawn a straight line
and then you have observations and because your model is never perfect the observations never perfectly line up with the model um so your question then is how do I how
do I use my observations to refit a better model um so this is uh not a difficult question something that you've probably
already done or at least worked through examples of you might imagine a a root mean square error or a standard linear regression here but of course it could be in many dimensions and it could be
done over different variables so let's just take an example how are we going to generate that loss function well let's call let's call the target of our prediction air temperature
in Brighton so that's a seaside resort in the south of the UK for international audiences um and if you are an ice cream seller say you have an ice cream van and you
want to drive your ice cream van down to the seaside to sell ice creams in Brighton um and you have a model and you want to know what the temperature will be in order to know whether you take your ice
cream van out or not um then actually if you're fitting that model or if you're selecting a model uh your loss function or objective function
or you know the thing that you are minimizing in order to fit this doesn't actually care at all about the difference between it being -2 degrees and minus six degrees you have no interest because it's the middle of
winter and you're not going to be out selling ice cream anyway you're interested in what's happening at the top end of the distribution and maybe you're interested in the difference between 22 degrees and 26 degrees and
maybe that has quite a big impact on your ice cream sales because it changes whether people feel hot enough to buy an ice cream I don't know um so in principle you might choose a
loss function which is weighted more towards the top end of this domain and prioritizes those um those observations and those outputs over the ones at the bottom maybe it
throws away the information at the bottom in service of getting a better fit at the top now suppose instead that you are a civil
engineer in Brighton and you are overseeing a project where you're pouring the concrete foundations for a new development um in that case you don't really care
very much about the difference between 22 degrees and 26 degrees but you care a lot about whether it's going to be below zero because at freezing uh what that means is that your concrete doesn't set
properly it crumbles you might have to take it up and start again you've lost a lot of time and you've lost a lot of money um so the civil engineer probably isn't
interested in fitting the top end of the distribution and they're probably not that interested in the bottom end either um but they are really interested in whether it's going to be
um above or below zero and the confidence of being above zero um because that's ultimately the decision that they're trying to inform so that loss function will look very different
to the ice cream Sellers and then just to give a third example um supposing you're Public Health England and you run a hospital trust in Brighton and you're interested in
predicting the number of hospital beds that will be occupied then maybe you're interested in um admittance due to heat stress on the
very hot days and maybe you're interested in admittance due to Falls On Ice on the very cold days but actually you don't see a huge signal in the middle of the distribution because
that's just a normal day and there's nothing but really interesting about the weather happening so in that case your loss function that you want to calibrate
in order to select or evaluate a model would be uh prioritizing the two ends of the distribution and it wouldn't have made much difference what's going on in the middle
okay so all of that hopefully makes sense but it's a complicated way of saying that loss functions are utility functions and they imply value judgments
what it means what that means is that you know your uh position and the decision that you're trying to inform you have made a value judgment about
what outcomes are good and what outcomes are bad whether that's selling more ice cream or not having to repor the concrete
um those value judgments do and can and should enter into your choice of mathematical fit when you're constructing your model
and you might say oh well I'm just going to do a standard linear regression I'm not going to make any value judgments but that is also a value judgment you know that is putting a weight a clearly
defined weight on the uh the importance of different observations which is different from an alternative choice that you might make um okay so uh if you're in if you're in
this sort of nice data-driven situation you are still making value difference um now what happens if you actually don't have that much data or you're in the situation um that I described where you have
um you know you might have uh underlying conditions changing or you might be interested in extreme events where there is a much shorter data set available
um then you've got you've got you know you've just got a bigger challenge to do this to do this evaluation and to do the calibration um so maybe to say briefly then uh that
the the appropriate choices of evaluation tools um depend on how it is that you think about model error you know people have
very different ideas in their heads of what it is that they mean when they're talking about the discrepancy between a model in the computer or on the paper or
whatever and reality which is the thing that's out there that we're interested in um so just to illustrate that
um supposing we have a little sketch of reality and we have the little sketch of our model here these two sort of spiky blobs
um if you are a statistician if you think statistically uh then your idea of what model error is reality minus the model
um is probably statistical you probably say well I've got you know I've got this complicated thing and I've tried to model it as well as I can and in doing so I have modeled all of the in
principle predictable components and that therefore what's left we hope is random residuals it's random noise
um and so we expect that the uh the distribution of our error term will be random noise According to some distribution with some
parameters um and then that is sort of that's a coherent and defensible position but I think it's important to see this then
from different angles supposing you are a dynamical system theorist then the way that you think about your model is as a a dynamical system probably quite a
complex one depending exactly what we're doing um and reality itself is also a complex dynamical system so when we subtract one complex dynamical system from another
complex dynamical system what do we get we get a third complex dynamical system and probably one that's even more difficult to to look at and understand intuitively because it's comprised of
again residuals um and so what that might look like in My Little sketch here is is this kind of weird spiky thing I'm subtracting I'm subtracting the one weird spiky thing
from another weird spiky thing and I get an even more weird looking spiky thing so that's how you would think about model error if you're a dynamical systems thinker
um and you can see that that might lead you to uh to to analyze your model error in quite a different way are now taking a third example let's say
you are not a statistician or a dynamical systems theorist let's say you are a anthropologist and you're coming to this and you say okay there's reality
out there and here's my model and the model that I choose is essentially arrived at by throwing away the things that I don't think to be important about
this situation and that is essentially a personal judgment um and so the Anthropologist would say that the the residual When You Subtract
model from reality will be something that is inherently social and political something that um that reflects the blind spots and the
biases and the incorrect assumptions of the modeler as well as their the constraints of their of their modeling environments so if that's kind of an inability to
express certain kinds of relationships in the right mathematical form to be able to model it for example um and so again that's a third really
very different way of looking at model error that could lead you to interpret your model in quite a different way right so supposing we have um
a model which is not perfect so it's going to have an error term here's just another uh illustration of what that means and why it's important
so for a bit of Light Relief here are our seven models and they are as you can see cute and fluffy I've got seven cats here so the question is if these are the
seven models that I have made of something unknown object behind a curtain um what can I how can I use these this
Ensemble of models to make inferences about the properties of the unknown object so I could look at these and I could say what do my seven models have in common
that would be a good start I could use them to construct confidence intervals I could say there's a there's an uncertainty range here about the color
and the length of fur and the types of vocalization you know you name it um and that would be interesting but it would only be telling us about our set
of models and not about reality so in order to find out something about reality uh what do we do we go out and we make an observation so here it comes
um we go out and we make an observation here it is in the bottom of the screen so that is a it looks like a tail um and it's sort of it's within the range
of our observations of the color and the fur lengths of the of the seven cats um so what are we going to do with this observation well let's go back to our
models maybe we could do a best fit we could do a model selection and say basically we're going to take that observation and pick the sing the one
single model that is most reflective of the observation so that then depends which uh which dimension you you check it against because we have multiple axes
here we have fur length and we have color and we might get a different answer if we picked a different dimension to to choose the best fit model
um or we might say we've got these multiple dimensions and we're going to keep all of the models that are still consistent with our observation so then maybe we choose to keep uh 4 out of R7
because they are all consistent and we rule out the other three for having the wrong color or for being the the fur being too short or something um so you can probably see where I'm going here
um but the the point is that these models there are many different statistical methods that we can now use having having generated that observation from reality
and gone back to our set of models but we we are not able to to infer with full confidence what we think reality looks like now actually I suspect a number of you are a step ahead and you have
inferred with full confidence what reality looks like uh so I will show you here it is it's a golden retriever um now those models the seven cats they
they are genuinely informative about um about reality about the dog you know there there is a whole host of aspects of
those cats that are informative and useful and help us to make better decisions faced with the dog um you know if you if you want to decide
what to feed it or how to interact with it or um all of that sort of thing you know it's it's pretty good right but it's not perfect um
a vet uh that was trained on those seven cats would probably be able to have a pretty good step at treating the dog for most of the things that are wrong with it
um but what we can't say is that there is any way a priority given those set of seven cats and the observation of the tail of the dog there isn't a good way
to reconstruct from the cats a picture of what the dog looks like without making significant additional expert judgments okay and the and the the
variance within that set of models is not necessarily reflective of the uncertainty that we a priori ought to have about the characteristics of the dog because the dog you know we are
getting good information but it's definitely outside the range of the of the set of cats on a number of different dimensions
okay so that that's a very non-mathematical uh criticism of a lot of different mathematical techniques and statistical techniques that are used to
make inferences from sets of models and you know just to say that the finding the cat that looks most like a dog is not what we're aiming to do
okay so this is essentially what I've already said but in more technical language we have we have options of how we deal with multiple models and they are all possible and they're all things
that people do but we need to always bear in mind that the the models are not perfect and that the set of models that we have we should not necessarily expect
it to be exchangeable uh with the real system now if we're in the fully data-driven regime the left-hand side of the spectrum that I showed to begin with
then you can because you can take enough observations of the dog to be able to refit your models and come up with an ensemble of seven dogs rather than an
ensemble of seven cats so that this again is is a distinction between the data-driven regime and the more expert judgment-driven regime the interpolatory
versus extrapolatory on the left-hand side of that Spectrum we can in principle construct a set of models that are close enough to being dogs and on the right hand side we're stuck with a
set of models that are that are cats and that are not dogs and that we have no a priori way to distinguish okay
um so moving on then if if if we are in this sort of cats and dog situation um how can we make inferences from ensembles of models we don't just want
to say oh hold our hands up and say we can't do anything um I think what we need to do here is reframe our our sort of
um the Paradigm of how we understand it and and how we go forward so at the moment many fields would take the approach of saying okay we've got a set
of models they all show X that something will happen and that gives us more confidence but I think if you if you accept my point then it's clear that
that is not a strong a strong uh evidence that X is the case it's only a case that we have it's only evidence that we have a constrained set of models
so what instead we need to know is that there is no plausible model that could show the opposite so if we go from instead instead of trying to find all models that show X we
want to to demonstrate that no plausible model could show not X that would be a much stronger evidence that X is in fact the case
okay but what you immediately notice I expect is that the word plausible is doing a huge amount of work in this sentence the word plausible is what everything hinges on what do we mean by
plausible how are we going to define a plausible model versus say an unrealistic model um well that's it's really difficult that's
really challenging we we have to think about um what we mean by expertise who do we trust how are we going to construct this
model what kind of models are uh acceptable models do they have to be you know thinking thinking about sort of physical models do they have to be physics based do they have to conform to
the laws of physics yes they probably do um thinking about more uh abstract models or say models of social systems what what level of expertise should
qualify uh the modeler to be able to say that something is plausible versus not plausible um so if you're modeling
um I don't know say you're modeling uh virus spread in communities you know are is it the epidemiologist that has the expertise is it the virologist that has
the expertise is it the social worker that has expertise because they're the one that actually knows who talks to who and when or is it the landlord of the local pub that has the expertise what
and how do these different forms of expertise about a situation then end up in the model and who gets to decide socially whether that's whether that
counts as being plausible or not um so that introduces a whole load of interesting questions about the the social context of of doing science and doing this sort of science but I think
um what's important to say here maybe is that this is not just a function of me introducing this new idea this is just
reframing the way that we think about the model it was always important who was making the model and how credible they were and what letters they had after their name and which university
they went to but when we reframe it in this way we can see more explicitly and more transparently why that's the case and why we need to think about it and
why we need to you know write about it foreign model how do we know uh so I wanted to show you a few
um more generic examples from the from the larger scale the right hand side of the spectrum the extrapolatory models that are primarily the subject of my book um hopefully these will also be
interesting so here's an expert you probably recognize him especially if you're in the UK that's Neil Ferguson from Imperial College in London and he produced this model on the right hand
side which was very much influential in the early stages of decision making around the covid-19 pandemic so you can see this is sort of March April 2020
just as things were starting to take off and people were starting to say ah you know we need a lockdown we need to do something um and so his model uh was based on a influenza model from that he put
together previously um and was then calibrated according to the data that was available at the time which was not very much
um so the expert creates a model um and in this sketch you know the the expert has to make a number of expert judgments when they make the model they have to decide what to include they have
to decide how to represent it they have to decide how to tune it how to evaluate it um and so their expertise and their understanding of the situation is going
into the model um but what I want to argue and what I say in the book is that there's also the the reverse step that the model kind of creates the expert there's a feedback here
um because you don't just create the model and then stop and throw it over if it's to the decision maker actually what you do is you go away and you play with it and you think about it and you test your hypotheses and you change things
you tweak it a bit to get a different output um maybe you try switching one bit of it off and switching another bit on and you try to make predictions and those all
influence then how it is that you as the expert understand the situation I mean that's the point of making the model right is to be able to play with it and experiment with it
um so there's this feedback loop um which means that the model and the expert are kind of acting as a system and the model encapsulates the expert
judgments of an expert or perhaps a group of experts and it and you know that that's how it should be treated statistically as well that's how we should be understanding the model when
we come to um put models together and try to make inferences about about future conditions
so it is not just predicting because the model you know if this is what had turned out it would have been absolutely catastrophic um the model putting together this model
allowed a discussion to happen which prompted policy intervention uh which uh changed the outcome so the mod the model is part of the system
um but who decides what judgments are represented who decides what is politically feasible when the politician comes and says oh I'd really like to know what the impact will be of closing schools
um but maybe they don't ask about other kinds of interventions then you could you you can sort of argue when people have about how those value judgments are
embedded within models representations and visualizations and then influence the politics so very briefly this is a second example of essentially the same point that we
have uh on the left hand side um six alternative models for a baseline energy policy scenario where we continue to use fossil fuels and on the right
hand side a uh a scenario where we meet the Paris targets by phasing down fossil fuels and ramping up Renewables but you can see that these six different models
which are essentially doing the same thing they have quite different outputs because they make different assumptions so we have the same question
again arising about how uh who it is that is making the decision about what is uh what is plausible in future whether that's the increase of the price
of um of fossil fuels due to policy or whether that's the decreasing price of solar energy due to improvements in
technology all of that goes in there somebody's making that judge um somebody's also deciding what kinds of energy policy are plausible and what are not
and that influences the output you get and then that output goes back to the decision makers to inform their policies in future so these models are incredibly powerful you know they are they are
changing decisions that influence Millions if not billions of people um and they could really do with more uh criticism and transparency about the
degree to which these assumptions are built in I could say a lot more about that one but I won't but if anybody wants to ask about it do um so the the mathematical challenges
then I think for for decision relevant modeling um the first challenge is escaping from model land um and that means understanding that the
that the the world within the model is not the same as the world outside the model and not just understanding it kind of on that basis but also propagating
that through to the mathematical techniques that we use to analyze these models because too many of our analytical techniques are based on
assumptions which imply that reality is one of our candidate set of models um and if you're if you're if you are confident that you're right at the end
of the Spectrum in the fully data-driven regime then perhaps that's okay but if you're not sure or if you think you might be somewhere in the middle then it is something that really Bears consideration
and in that case then we have to think really carefully about the relationship between model and expert judgment and we have to acknowledge that there is significant social and political content
to our models um and so uh how do we how do we deal with that um well it's International women's day
so I'm going to make um my key point about uh diversity so one of the things that we need to do is
improve the diversity of our models by pushing the boundaries of models as far as we can you know exploring what is plausible and what plausible means to different people with different
understanding of the situation and different experience of of what's relevant um and if we can do that and it's not easy you know this is a challenging
thing to to do and to incorporate into the way that science Works um if we can if we can push our models out
by incorporating more social diversity then we will get better understanding of the uncertainty ranges that we ought to have and hopefully also encourage a
better relationship between science and Society you know there's um there are strong groups at the moment who are very anti-science
um thinking of you know vaccine Skeptics um also climate Skeptics and people people who sort of a priori distrust anything that is said by
um large-scale organizations and governments and scientific institutions um but in order to bring those people in
and help them to uh understand the benefits of modeling they have to be genuinely consulted and genuinely listened to in terms of bringing their
understanding and their experience into the model otherwise it won't be credible to them um so can we do that I think that's a really interesting challenge um but we also don't want to throw the
bbr with the bath water so I've I've shown you a number of um critiques of large and complex models and forward predictions and of course that might be
um that might be attractive to some in this in this anti-science brigade and I think that is a risk uh that if if there
is oversell of what science can do in these extrapolatory regimes and if there is too much of a tendency to obscure the social and political content of models
um then there will be a lack of confidence in science and this is partly this is what's already happening I think
so we need to work out a way to be more transparent about the the the use of these models and be more democratic in the way that they're generated and then
we will have useful scientific information without undermining that confidence um so yeah just a couple of points to finish then I mean the bad news is that
unless we are fully data driven and at the far left hand side where we expect the future to be a perfectly perfectly represented by the statistics of the past
um if we're not in that then we have radical uncertainty of possible future events and we simply can't expect to be able to fully model it and fully constrain it there will always be a
model risk and there will always be a gap between modern land and real world um which may have cascading consequences when it's discovered so the global financial crisis in 2008 was essentially
a collapse of that form um where a you know the the model the the discrepancy between the model and the real uh became too large and reality
always wins um so we have to do something about this it has it has massive consequences if we if we fail to do something about it but the good news is that models because
they're not just there to represent and predict they're also here to change the world we are we make models because we want to intervene on the system of interest and we want to be able to
change it and we want to be able to change it for the better better obviously as a value judgment so we need to talk about what we mean by that um but we can we can we can look at our
models we can scrutinize the value judgments that go into them and we can change them if we don't agree with them so I think that there is a there's a strong positive message here that you
know that that models are difficult beasts to get grips with but that if we if we do it well we can use models and we can develop new statistical methods
that can support us to make positive and inclusive decisions about a whole range of uh future um policies and actions that affect
everybody so that's uh I think a big opportunity as well as a challenge and I'll stop there thank you thank you so much Erica that was super
interesting and I am happy to open the Q a floor um and uh if you guys want to leave questions in the chat that's also fine but if you want to ask out loud that's
great as well um so yeah foreign maybe I will use the
um the opportunity since I am hosting um so I had this question um so basically we just need to embrace the fact that
um if we using different models we're getting different results because the approach to this different modals can be uh different and you know we can be coming out from different assumptions
um and uh um we can have like just different measures of Which models are you know good or bad plausible or not plausible uh but then like
um how uh and and that's kind of you know understandable from the statistical perspective as well uh but then uh it doesn't mean that we really never know
the truth or um or we kind of making this assumption that uh the plausible uh models or like the ones that we test is like good fit
ones they are the ones which bring us like closer to the reality or the truth uh but we still never know the truth because it's you know in a sense like some approximation that we're basing on
some assumptions is this enough like in terms of science it's a very philosophical question I don't know but I'm just curious your thoughts on that
yes yeah I mean I think in principle uh many aspects of the future are not predictable um but we happen to agree on some things
you know so I I have pretty high confidence that the sun will rise tomorrow and so do you and so does everybody else um but that's a judgment you know it
could be that it won't it could be that the laws of physics have been the same up to now and then tomorrow suddenly the speed of light's going to change and everything goes Haywire um but we have very high confidence that
that's not the case we sort of we have we have this hierarchy of things that we that we socially are really confident in and so
socially we can kind of agree on this set of foundational assumptions without which we can't make any predictions
about the future and with which we can make predictions and we have got you know a few centuries of evidence that we make really good predictions and they're
really useful and you know we send people to the Moon 50 years ago uh you know that's an incredible test of a whole range of
physical assumptions and ideas and models um so then the question is how far down do you go so there are some
things that everybody agrees on and then there are some things that just nobody could possibly know but and all of our science is sort of in the Middle where
some people might agree and other people might not agree so you know if we're interested in um vaccine efficacy you could make two
separate models one based on somebody with uh a strong prior belief in the uh expectation that this will work and also
with a trust in the system that generates evidence underlying it but somebody else coming in with a with a distrust of the system that generated
the evidence and a distrust of the model and the modelers they would end up making a different judgment about the quality of the output of that model but
I suppose what I'm saying is that that's that's not unreasonable and that the only way to get the only way to get past that is to um bring those people into the fold
somehow by talking more about the assumptions so supposing I claim that I'm a mind reader and I can predict the card that you have on your desk and it's the nine of diamonds and then you just
it again and it's the Ace of Spades and I get it right five times in a row um you know is that extremely strong evidence that I'm a mind reader or is it extremely strong evidence that I'm cheating
um you know if you have the a priori belief in the in the model then in a Bayesian sense every time you see that correct prediction it will confirm the
belief that it's a good model but if you're suspicious already then every time you see it it will just confirm your belief that I'm cheating um so what you need is not further runs
of the same model you need to be able to you know come to my office and check that I'm not cheating or do it in a different room and you have control of the experimental setup and then you'd be
convinced and I think the same goes for some of these other more political questions so I can see a couple of questions in the chat too yes uh there is a question there's two questions and one person asked whether
they can ask a question yes of course go ahead and ask but maybe maybe Eric if you can uh take a question from the chat uh like there was before that
um just to go in order yeah excuse me model selection Place key role how to model Ensemble methods uh fit in here and is there any recommendations on this I mean I guess
the recommendation is just to think about what you're doing and not to just take a uh take a method out of a statistics textbook and say I'm going to apply this um think about think about what's being
implied by the statistical method I think that is that is the key and there are many different methods and they will have different underlying assumptions
um and it's hey you know if if you just get it in a in an r package or something that does it all automatically then it can be pretty difficult to um to analyze that and to to find out
what's actually going on underneath but um so it isn't it probably isn't appropriate for everybody to do this but I think you know due diligence wise if we're going to be making uh significant
decisions based on the output of these kind of models we need to understand what they are doing and why and what the what the assumptions are that are embedded in them um Lee do you want to
unmute I don't know if they're there maybe we can go to the next one go to the next one shall we and they actually wrote a question in
the chat so you can address it like okay well should we go to Marcos next um how do you see the movement from Big Tech and other settings towards data
Centric scenarios yeah I mean I'm worried about it I think that uh that actually um we are moving further towards model land
and further intermodel land um and not on the on the assumption that we're at the far left hand side of this spectrum and that and that we can take the purely data-driven approaches and
just sort of throwing away and ignoring all of these questions about uh value judgments and about alternative possibilities um yes I think that there's a there's a
huge risk in this of people assuming that their model is perfect either by explicitly stating that or by by what they do with the
model that implies that judgment um and I suppose one of the aims of my book is to try to try to generate more discussion about about model land and
about the the difficulties when you're there and about how we how we go how we get out of Model land because we always have to get out of Model land if you want to do something in the real world
we always have to get out of Model land so if you if you just take your model and apply it you are getting out of Model Land by making an implicit
expert judgment that your model is perfect but in most cases that is obviously wrong uh so we could you know we can
start criticizing on that basis yeah yeah and we have um I guess like two more questions yeah yeah do you want me to just read them
out so shinwei asks um we know that we want to compare the models by a sharpness subject to calibration yeah okay and what about using scores such as CRPS
um so that's the continuous rank probability score um so yes I mean there's the CRPS there's the prior score there's log probability score
um and then you could go into the information criteria there are many different ways of scoring but I suppose this goes back to the point that any
choice of score implies a value judgment about the relative importance of different kinds of outcomes and that will be relative to whatever you hope to use your model for
um so you know what it what is the basis of comparison for your models if you've got six different models
you could pick the one that that does the best On Any Given score but you'll get the cat that looks most like a dog you don't get a dog um so how
what is the what is it that you're aiming to do and how good do you think your models are you know you have to be extremely confident in your set of models
to want to choose the cat that looks most like a dog if you're slightly less confident then it's okay to choose maybe a range of cats that will hopefully produce a confidence interval for your
dog uh you know then there are other methods that perhaps put a discrepancy term in and say we we will take a range
of cats and they will essentially generate a fuzzy picture of the dog because we'll add noise to everything or we'll take a range um
but regardless of what score you used to do that you still have the fundamental problem of the potential mismatch between model and reality so if you are completely data driven then you're
you're sort of okay and it probably won't make much difference which skill score you choose to use whether that's CRPS or Briar or or a log likelihood
um you'll probably get more or less the same answer um yeah but most of us are not there and if you're not there it makes quite a big difference because for example if you
use a log likelihood then you get an infinite probability that you get an infinite penalty for doing something that the model says was impossible um so then then that introduces uh
strong dependencies on your choice of scoring function I mean there's a huge amount more I could say about that but you can't stop there all right
um if there is no more questions there's thank you notes in the in the chat uh we are actually uh on time uh and uh
um yeah if there's no more questions I'm gonna be closing the session and thanking so much Erica for this uh amazing talk and congratulations to the Volk again
um and uh um yeah thank you so much everyone for joining us today and follow our page at Civic data science seminar series and we're going to have another seminar in two weeks so thank you
everyone and we're going to stop the recording all right thank you very much good to be here
Loading video analysis...