Models, Mathematics and Data Science: How to Make Sure We're Answering the Right Questions

By Hertie School Data Science Lab

Summary

Topics Covered

Models Span Interpolatory to Extrapolatory Spectrum
Loss Functions Embed Value Judgments
Cat Models Can't Reconstruct Dog Reality
Prove No Plausible Model Shows Opposite
Diversity Expands Plausible Model Boundaries

Full Transcript

good afternoon everyone thank you so much for joining us today for our civica data science seminar series on this International women's day

um and uh I just want to tell you a couple of the um house rules uh so you are all muted or like we ask you to mute yourself if

you're not muted during the seminar and uh the uh the zoom call is being recorded you all should have got notification that the zoom call is being

recorded uh so if you do want to interact in the Q a section when we will open the Q a section and you will unmute yourself or like turn your camera on

you're automatically agreeing for your voice and your image being recorded um so that's just a quick note uh and

I'm extremely extremely happy to introduce Dr Erica Thompson uh it's a person who's part of the data science

seminar uh series uh uh a host team and and we're very happy to have her uh at our seminar today she's a senior policy fellow in ethics of modeling and

simulator at the LLC data Science Institute Erica's research is centered around the use of mathematical and computational models to inform a real world decision

making and today Erica is going to introduce present uh her um just recently published a book Escape from

Mortal land and we're really really happy to have her today and Erica the floor is yours and you can share a screen now and yeah thank you everyone

fantastic thank you all birds good to be here I'll just uh share my screen so hopefully now you can see that

um great so I'm going to talk to you today about my new book which is called escape from model land um and it's about how we how we get out of Model land which is where we are when

we're sort of inside the computer or inside our model or inside the system of equations that we might have written to describe the real world how we get out of modern land and make statements that

are genuinely relevant to the real world and that can help us to make better decisions than we would have done without that information and so I'm going to try to talk with more of a data

science flavor today to talk about hopefully things that are relevant for this kind of audience um but with a focus on the big picture really thinking about how we make sure

that we are answering the right kind of questions when we when we take our data and we try to use it to construct a model and think about influencing the real world

okay so I thought I'd start you know quite basic by uh discussing how we get confidence in our models so if you think um about the the phone in your pocket

you pick it up and you want to know what the weather's going to be like tomorrow um I think it might snow in London tomorrow actually so I don't know what it's like where you are but you take you

pick it up and you look at the forecast uh and you would like to make a decision based on that maybe you want to decide how to get to work or whether to take an

umbrella or whether to hold an event um and but how do you know whether that forecast is any good how do you know whether it's going to be reliable well you probably have some degree of

experience right so you have consulted this weather forecast on many occasions and you have found that it is typically maybe reasonably good I know maybe

you're a bit skeptical about it maybe you're you take a few different forecasts and sort of triangulate them a bit um but you you have experience a new uh

and you bring that to your interpretation of how good you think this forecast is going to be and probably the forecast provider also has some kind of statistics

and so you can put that together over time you know if we were doing a study of it you could you could collect that data more systematically and you could

say uh on what percentage of occasions for example if someone was forecast or rain was forecast with a certain probability how often does that actually happen in practice and so you can build

up a statistical picture of when it's likely to be right and when it's likely to be less confident or less accurate um and then you also make informed

judgments about the quality of the model so you can say uh it's not just because of my prior experience it's also because I you know because I trust the the UK

Met Office for example or it's because I think that this model is based on uh fundamental physical laws and physical processes in which I have confidence I expect them to continue to be valid in

the future and therefore I have some degree of expectation that this model is is doing something sensible and not something crazy so there are different ways of of

generating confidence in in forecasts and confidence in the model that produced that forecast but there are different kinds of forecasts and there are different kinds

of models and I want to distinguish here between uh you know not a binary but a spectrum of of different kinds of forecasts and on one end of that Spectrum on the left of this picture are

are forecasts that are more like uh the weather or they're based on the laws of physics so these are forecasts where we expect that tomorrow is not fundamentally different from today where

we have a lot of data that we can use to evaluate the performance of the model we can do that continuously we can keep checking we get more out of sample data every day out of our weather forecasts

and every day that gets used and fed back into improving the quality of the model um and so what that means is that our uncertainty is generically our uncertainty is

quantifiable because our uncertainty is is measurement error and it's um its uncertainties based on the uh the limitations of the data that we've

collected it's not uncertainty that's more that goes deeper than that so those on the on the left hand side of this diagram are um situations that I would call

essentially interpolatory where the models that we're making are basically interpolatory and they are very much data driven and then at the other end of the

spectrum is models which I would call essentially extrapolatory so those are ones where the for example the underlying conditions might be changing in ways that are not fully understood so

that could be uh the climate for example so that's something that I have worked on quite a lot in the past is thinking about models of weather and models of climate and one of the problems with uh

forecasting the climate is that even though obviously we expect the laws of physics to remain the same in the future we don't know that the the calibration

of our models that we have done based on observations of 20th century climate we don't necessarily know that those will be the appropriate parameterizations and

model representations in a climate which is actually quite radically different as we're expecting for the end of this century and so that means that our

confidence is not fully data driven our confidence is based more on these other aspects of sort of expecting the model

quality to be good so we are expecting um for example that the laws of physics will remain the same um and we maybe take multiple models we try to make different models and we have

to make expert judgments about the degree to which we expect the models to be to be good in future the degree to which we expect that past performance is

indicative of future success so with the data-driven models on the left hand side we have a we are making an expert judgment that past performance is indicative of future success with high

confidence and on the right hand side where we are more extrapolatory we say past performance hopefully is useful um you know obviously we wouldn't trust a model that was extremely bad in the

past but the past performance is not a good guarantee of future success and that we have to incorporate this expert judgment um so I'm hoping that makes sense to you

all uh and that you can sort of see the distinction that I'm making between these two types of models and also that there is a that there is a spectrum and that in most cases we'll be somewhere in the middle so the highly data-driven

approaches may be yes for a far left hand side and the highly extrapolatory or maybe social systems will be at the at the far right hand side but in general we're somewhere in the middle and we have to do a bit of a bit of both

a bit of everything um so just to give a few examples I think because that might be helpful also in situating this um in that interpolatory category the left-hand side

um I would imagine of the sort of things that quite a lot of you might be working on um so for example the the large language models depending on what you're using them for

I mean if you're using them to do something that is going beyond the training sales they are not going to be able to construct

um you know imaginative responses to uh to Solutions on a human scale things but they are going to be extremely good at uh summarizing data for which we have

strong um data-driven representations of the relationship between different words and the construction of different sentences um so weather forecasting is in this

category uh the basic rules of physics um if you think about sort of Industrial and Commercial uses mechanical testing optimization of manufacturing

transport timetabling online marketing these are sorts of things where we can expect to have lots of data and we expect that yesterday's data are relevant for tomorrow's forecast

and this is where Ai and machine learning approaches can be extremely valuable because they can help us to

find and use those patterns in that are in the data that we wouldn't necessarily be able to get if we were coming in as a human and imposing our own expectations

of the structure on it we can we can find the less the less obvious structures in these in these data and then at the other end um and this is probably more the focus

of my work but I'm interested in in both ends of this of this spectrum and the extrapolatory category there are the what I sort of think of as large-scale prediction questions and these are

things like climate change impacts energy policy um the pandemic is an interesting example because I think on on the maybe one to two week time scale we have something which is effectively more

interpolatory and data driven um and because the outcomes over the next two to three weeks are baked into infections that have already happened

and so in principle it's possible to uh measure that and to understand it and to estimate it um but if we're looking at the pandemic say on a six-month time scale and we

want to know how it will evolve how the numbers of infections will change over six months well that's something that we in principle cannot possibly get off of past data because it is also a function

of um public attitudes it's a function of policies and uh government decisions um it's a function of social behaviors

which are inherently not really predictable or you know potentially predictable in some ways but always subject to the radical uncertainty that something might happen that just changes the way people think about things

um then there's things like uh financial markets bank and insurance regulation is something that I've been looking at um security cyber security or the use of

new technologies so you know a huge range and obviously you know you you could think of many many more examples in both of these categories and perhaps we can come back in the discussion if

anyone has any ideas about that but maybe a point to to sort of stress here is that um if we're over at this end of the spectrum we cannot expect purely

data-driven approaches to give us an answer um artificial intelligence essentially always needs some kind of help with extrapolatory questions because it

requires these expert judgments and it requires these assumptions to be built in now you can do that it doesn't mean that you can't attack these problems with AI and machine learning approaches but it means that we have to be careful

about how we um you know how being transparent about how we embed value judgments um that are going to be necessary and how we how we

approach the question of making these expert judgments okay so so thinking then a bit more about evaluation um so how we would like it to work uh

and hopefully many of you are in this lucky category um if you if you're on the the left hand end of the spectrum you're in in the interpolatory data-driven regime then

then you can sort of proceed with model evaluation in the in the the textbook where you can say I'm going to gather lots of data I'll keep some back for

testing or I'll plan to collect some more for testing I train the model using the data but also using my understanding of the system you find out that the model is good but it's not perfect

because it never is because it can't be because all models are wrong as a famous statistician once said uh and then you use your held out data

to evaluate the performance of the model on the on what you're interested in hopefully uh you can use that to improve the model but even if you can't use it

to improve the model at least you have it to uh to understand when you expect to have confidence and when you expect not to have confidence you know this none of none of this implies that the model is

perfect but you don't need a perfect model you only need to know when it's reliable and when it's not reliable so then hopefully you know if you're on a short time scale you'll be gathering

new out of sample data and so you can keep doing this process uh and you'll be able to generate ranges which are actionable in terms of being able to use

them to inform decision support um and hopefully you will be confident about those uncertainty ranges you'll be confident that if you if you generate a

90 confidence interval for the target of your prediction that on 90 of occasions the actual outturn will be within your 90 confidence interval and on the other

10 of occasions of course it will be outside that interval but that was what you expected so that's okay um and if you have that then you have something which is genuinely useful

because you understand and you're confident in it but this isn't always the case um I mean the maybe the other thing to

say here is that if you when you are doing this even even if you're in this happy situation where we have lots of data we can use it and the model is and

the model is evaluatable um we are also introducing expert judgments so I'll just say very briefly um sort of illustrate how that happens um and again this is something I'd be

interested to hear feedback on from the audience at the end as to whether this is something you consider um so here's a little sketch of our of a model you know you you have inputs and

you have some kind of output I've drawn this in uh two Dimensions but of course there could be hundreds or thousands of Dimensions it's just harder to draw that on a on a computer screen and the model

the red line here is what links the input with the output so the model is just a function it could be multi-valued I mean it could be whatever it could be something extremely complicated but I've drawn a straight line

and then you have observations and because your model is never perfect the observations never perfectly line up with the model um so your question then is how do I how

do I use my observations to refit a better model um so this is uh not a difficult question something that you've probably

already done or at least worked through examples of you might imagine a a root mean square error or a standard linear regression here but of course it could be in many dimensions and it could be

done over different variables so let's just take an example how are we going to generate that loss function well let's call let's call the target of our prediction air temperature

in Brighton so that's a seaside resort in the south of the UK for international audiences um and if you are an ice cream seller say you have an ice cream van and you

want to drive your ice cream van down to the seaside to sell ice creams in Brighton um and you have a model and you want to know what the temperature will be in order to know whether you take your ice

cream van out or not um then actually if you're fitting that model or if you're selecting a model uh your loss function or objective function

or you know the thing that you are minimizing in order to fit this doesn't actually care at all about the difference between it being -2 degrees and minus six degrees you have no interest because it's the middle of

winter and you're not going to be out selling ice cream anyway you're interested in what's happening at the top end of the distribution and maybe you're interested in the difference between 22 degrees and 26 degrees and

maybe that has quite a big impact on your ice cream sales because it changes whether people feel hot enough to buy an ice cream I don't know um so in principle you might choose a

loss function which is weighted more towards the top end of this domain and prioritizes those um those observations and those outputs over the ones at the bottom maybe it

throws away the information at the bottom in service of getting a better fit at the top now suppose instead that you are a civil

engineer in Brighton and you are overseeing a project where you're pouring the concrete foundations for a new development um in that case you don't really care

very much about the difference between 22 degrees and 26 degrees but you care a lot about whether it's going to be below zero because at freezing uh what that means is that your concrete doesn't set

properly it crumbles you might have to take it up and start again you've lost a lot of time and you've lost a lot of money um so the civil engineer probably isn't

interested in fitting the top end of the distribution and they're probably not that interested in the bottom end either um but they are really interested in whether it's going to be

um above or below zero and the confidence of being above zero um because that's ultimately the decision that they're trying to inform so that loss function will look very different

to the ice cream Sellers and then just to give a third example um supposing you're Public Health England and you run a hospital trust in Brighton and you're interested in

predicting the number of hospital beds that will be occupied then maybe you're interested in um admittance due to heat stress on the

very hot days and maybe you're interested in admittance due to Falls On Ice on the very cold days but actually you don't see a huge signal in the middle of the distribution because

that's just a normal day and there's nothing but really interesting about the weather happening so in that case your loss function that you want to calibrate

in order to select or evaluate a model would be uh prioritizing the two ends of the distribution and it wouldn't have made much difference what's going on in the middle

okay so all of that hopefully makes sense but it's a complicated way of saying that loss functions are utility functions and they imply value judgments

what it means what that means is that you know your uh position and the decision that you're trying to inform you have made a value judgment about

what outcomes are good and what outcomes are bad whether that's selling more ice cream or not having to repor the concrete

um those value judgments do and can and should enter into your choice of mathematical fit when you're constructing your model

and you might say oh well I'm just going to do a standard linear regression I'm not going to make any value judgments but that is also a value judgment you know that is putting a weight a clearly

defined weight on the uh the importance of different observations which is different from an alternative choice that you might make um okay so uh if you're in if you're in

this sort of nice data-driven situation you are still making value difference um now what happens if you actually don't have that much data or you're in the situation um that I described where you have

um you know you might have uh underlying conditions changing or you might be interested in extreme events where there is a much shorter data set available

um then you've got you've got you know you've just got a bigger challenge to do this to do this evaluation and to do the calibration um so maybe to say briefly then uh that

the the appropriate choices of evaluation tools um depend on how it is that you think about model error you know people have

very different ideas in their heads of what it is that they mean when they're talking about the discrepancy between a model in the computer or on the paper or

whatever and reality which is the thing that's out there that we're interested in um so just to illustrate that

um supposing we have a little sketch of reality and we have the little sketch of our model here these two sort of spiky blobs

um if you are a statistician if you think statistically uh then your idea of what model error is reality minus the model

um is probably statistical you probably say well I've got you know I've got this complicated thing and I've tried to model it as well as I can and in doing so I have modeled all of the in

principle predictable components and that therefore what's left we hope is random residuals it's random noise

um and so we expect that the uh the distribution of our error term will be random noise According to some distribution with some

parameters um and then that is sort of that's a coherent and defensible position but I think it's important to see this then

from different angles supposing you are a dynamical system theorist then the way that you think about your model is as a a dynamical system probably quite a

complex one depending exactly what we're doing um and reality itself is also a complex dynamical system so when we subtract one complex dynamical system from another

complex dynamical system what do we get we get a third complex dynamical system and probably one that's even more difficult to to look at and understand intuitively because it's comprised of

again residuals um and so what that might look like in My Little sketch here is is this kind of weird spiky thing I'm subtracting I'm subtracting the one weird spiky thing

from another weird spiky thing and I get an even more weird looking spiky thing so that's how you would think about model error if you're a dynamical systems thinker

um and you can see that that might lead you to uh to to analyze your model error in quite a different way are now taking a third example let's say

you are not a statistician or a dynamical systems theorist let's say you are a anthropologist and you're coming to this and you say okay there's reality

out there and here's my model and the model that I choose is essentially arrived at by throwing away the things that I don't think to be important about

this situation and that is essentially a personal judgment um and so the Anthropologist would say that the the residual When You Subtract

model from reality will be something that is inherently social and political something that um that reflects the blind spots and the

biases and the incorrect assumptions of the modeler as well as their the constraints of their of their modeling environments so if that's kind of an inability to

express certain kinds of relationships in the right mathematical form to be able to model it for example um and so again that's a third really

very different way of looking at model error that could lead you to interpret your model in quite a different way right so supposing we have um

a model which is not perfect so it's going to have an error term here's just another uh illustration of what that means and why it's important

so for a bit of Light Relief here are our seven models and they are as you can see cute and fluffy I've got seven cats here so the question is if these are the

seven models that I have made of something unknown object behind a curtain um what can I how can I use these this

Ensemble of models to make inferences about the properties of the unknown object so I could look at these and I could say what do my seven models have in common

that would be a good start I could use them to construct confidence intervals I could say there's a there's an uncertainty range here about the color

and the length of fur and the types of vocalization you know you name it um and that would be interesting but it would only be telling us about our set

of models and not about reality so in order to find out something about reality uh what do we do we go out and we make an observation so here it comes

um we go out and we make an observation here it is in the bottom of the screen so that is a it looks like a tail um and it's sort of it's within the range

of our observations of the color and the fur lengths of the of the seven cats um so what are we going to do with this observation well let's go back to our

models maybe we could do a best fit we could do a model selection and say basically we're going to take that observation and pick the sing the one

single model that is most reflective of the observation so that then depends which uh which dimension you you check it against because we have multiple axes

here we have fur length and we have color and we might get a different answer if we picked a different dimension to to choose the best fit model

um or we might say we've got these multiple dimensions and we're going to keep all of the models that are still consistent with our observation so then maybe we choose to keep uh 4 out of R7

because they are all consistent and we rule out the other three for having the wrong color or for being the the fur being too short or something um so you can probably see where I'm going here

um but the the point is that these models there are many different statistical methods that we can now use having having generated that observation from reality

and gone back to our set of models but we we are not able to to infer with full confidence what we think reality looks like now actually I suspect a number of you are a step ahead and you have

inferred with full confidence what reality looks like uh so I will show you here it is it's a golden retriever um now those models the seven cats they

they are genuinely informative about um about reality about the dog you know there there is a whole host of aspects of

those cats that are informative and useful and help us to make better decisions faced with the dog um you know if you if you want to decide

what to feed it or how to interact with it or um all of that sort of thing you know it's it's pretty good right but it's not perfect um

a vet uh that was trained on those seven cats would probably be able to have a pretty good step at treating the dog for most of the things that are wrong with it

um but what we can't say is that there is any way a priority given those set of seven cats and the observation of the tail of the dog there isn't a good way

to reconstruct from the cats a picture of what the dog looks like without making significant additional expert judgments okay and the and the the

variance within that set of models is not necessarily reflective of the uncertainty that we a priori ought to have about the characteristics of the dog because the dog you know we are

getting good information but it's definitely outside the range of the of the set of cats on a number of different dimensions

okay so that that's a very non-mathematical uh criticism of a lot of different mathematical techniques and statistical techniques that are used to

make inferences from sets of models and you know just to say that the finding the cat that looks most like a dog is not what we're aiming to do

okay so this is essentially what I've already said but in more technical language we have we have options of how we deal with multiple models and they are all possible and they're all things

that people do but we need to always bear in mind that the the models are not perfect and that the set of models that we have we should not necessarily expect

it to be exchangeable uh with the real system now if we're in the fully data-driven regime the left-hand side of the spectrum that I showed to begin with

then you can because you can take enough observations of the dog to be able to refit your models and come up with an ensemble of seven dogs rather than an

ensemble of seven cats so that this again is is a distinction between the data-driven regime and the more expert judgment-driven regime the interpolatory

versus extrapolatory on the left-hand side of that Spectrum we can in principle construct a set of models that are close enough to being dogs and on the right hand side we're stuck with a

set of models that are that are cats and that are not dogs and that we have no a priori way to distinguish okay

um so moving on then if if if we are in this sort of cats and dog situation um how can we make inferences from ensembles of models we don't just want

to say oh hold our hands up and say we can't do anything um I think what we need to do here is reframe our our sort of

um the Paradigm of how we understand it and and how we go forward so at the moment many fields would take the approach of saying okay we've got a set

of models they all show X that something will happen and that gives us more confidence but I think if you if you accept my point then it's clear that

that is not a strong a strong uh evidence that X is the case it's only a case that we have it's only evidence that we have a constrained set of models

so what instead we need to know is that there is no plausible model that could show the opposite so if we go from instead instead of trying to find all models that show X we

want to to demonstrate that no plausible model could show not X that would be a much stronger evidence that X is in fact the case

okay but what you immediately notice I expect is that the word plausible is doing a huge amount of work in this sentence the word plausible is what everything hinges on what do we mean by

plausible how are we going to define a plausible model versus say an unrealistic model um well that's it's really difficult that's

really challenging we we have to think about um what we mean by expertise who do we trust how are we going to construct this

model what kind of models are uh acceptable models do they have to be you know thinking thinking about sort of physical models do they have to be physics based do they have to conform to

the laws of physics yes they probably do um thinking about more uh abstract models or say models of social systems what what level of expertise should

qualify uh the modeler to be able to say that something is plausible versus not plausible um so if you're modeling

um I don't know say you're modeling uh virus spread in communities you know are is it the epidemiologist that has the expertise is it the virologist that has

the expertise is it the social worker that has expertise because they're the one that actually knows who talks to who and when or is it the landlord of the local pub that has the expertise what

and how do these different forms of expertise about a situation then end up in the model and who gets to decide socially whether that's whether that

counts as being plausible or not um so that introduces a whole load of interesting questions about the the social context of of doing science and doing this sort of science but I think

um what's important to say here maybe is that this is not just a function of me introducing this new idea this is just

reframing the way that we think about the model it was always important who was making the model and how credible they were and what letters they had after their name and which university

they went to but when we reframe it in this way we can see more explicitly and more transparently why that's the case and why we need to think about it and

why we need to you know write about it foreign model how do we know uh so I wanted to show you a few

um more generic examples from the from the larger scale the right hand side of the spectrum the extrapolatory models that are primarily the subject of my book um hopefully these will also be

interesting so here's an expert you probably recognize him especially if you're in the UK that's Neil Ferguson from Imperial College in London and he produced this model on the right hand

side which was very much influential in the early stages of decision making around the covid-19 pandemic so you can see this is sort of March April 2020

just as things were starting to take off and people were starting to say ah you know we need a lockdown we need to do something um and so his model uh was based on a influenza model from that he put

together previously um and was then calibrated according to the data that was available at the time which was not very much

um so the expert creates a model um and in this sketch you know the the expert has to make a number of expert judgments when they make the model they have to decide what to include they have

to decide how to represent it they have to decide how to tune it how to evaluate it um and so their expertise and their understanding of the situation is going

into the model um but what I want to argue and what I say in the book is that there's also the the reverse step that the model kind of creates the expert there's a feedback here

um because you don't just create the model and then stop and throw it over if it's to the decision maker actually what you do is you go away and you play with it and you think about it and you test your hypotheses and you change things

you tweak it a bit to get a different output um maybe you try switching one bit of it off and switching another bit on and you try to make predictions and those all

influence then how it is that you as the expert understand the situation I mean that's the point of making the model right is to be able to play with it and experiment with it

um so there's this feedback loop um which means that the model and the expert are kind of acting as a system and the model encapsulates the expert

judgments of an expert or perhaps a group of experts and it and you know that that's how it should be treated statistically as well that's how we should be understanding the model when

we come to um put models together and try to make inferences about about future conditions

so it is not just predicting because the model you know if this is what had turned out it would have been absolutely catastrophic um the model putting together this model

allowed a discussion to happen which prompted policy intervention uh which uh changed the outcome so the mod the model is part of the system

um but who decides what judgments are represented who decides what is politically feasible when the politician comes and says oh I'd really like to know what the impact will be of closing schools

um but maybe they don't ask about other kinds of interventions then you could you you can sort of argue when people have about how those value judgments are

embedded within models representations and visualizations and then influence the politics so very briefly this is a second example of essentially the same point that we

have uh on the left hand side um six alternative models for a baseline energy policy scenario where we continue to use fossil fuels and on the right

hand side a uh a scenario where we meet the Paris targets by phasing down fossil fuels and ramping up Renewables but you can see that these six different models

which are essentially doing the same thing they have quite different outputs because they make different assumptions so we have the same question

again arising about how uh who it is that is making the decision about what is uh what is plausible in future whether that's the increase of the price

of um of fossil fuels due to policy or whether that's the decreasing price of solar energy due to improvements in

technology all of that goes in there somebody's making that judge um somebody's also deciding what kinds of energy policy are plausible and what are not

and that influences the output you get and then that output goes back to the decision makers to inform their policies in future so these models are incredibly powerful you know they are they are

changing decisions that influence Millions if not billions of people um and they could really do with more uh criticism and transparency about the

degree to which these assumptions are built in I could say a lot more about that one but I won't but if anybody wants to ask about it do um so the the mathematical challenges

then I think for for decision relevant modeling um the first challenge is escaping from model land um and that means understanding that the

that the the world within the model is not the same as the world outside the model and not just understanding it kind of on that basis but also propagating

that through to the mathematical techniques that we use to analyze these models because too many of our analytical techniques are based on

assumptions which imply that reality is one of our candidate set of models um and if you're if you're if you are confident that you're right at the end

of the Spectrum in the fully data-driven regime then perhaps that's okay but if you're not sure or if you think you might be somewhere in the middle then it is something that really Bears consideration

and in that case then we have to think really carefully about the relationship between model and expert judgment and we have to acknowledge that there is significant social and political content

to our models um and so uh how do we how do we deal with that um well it's International women's day

so I'm going to make um my key point about uh diversity so one of the things that we need to do is

improve the diversity of our models by pushing the boundaries of models as far as we can you know exploring what is plausible and what plausible means to different people with different

understanding of the situation and different experience of of what's relevant um and if we can do that and it's not easy you know this is a challenging

thing to to do and to incorporate into the way that science Works um if we can if we can push our models out

by incorporating more social diversity then we will get better understanding of the uncertainty ranges that we ought to have and hopefully also encourage a

better relationship between science and Society you know there's um there are strong groups at the moment who are very anti-science

um thinking of you know vaccine Skeptics um also climate Skeptics and people people who sort of a priori distrust anything that is said by

um large-scale organizations and governments and scientific institutions um but in order to bring those people in

and help them to uh understand the benefits of modeling they have to be genuinely consulted and genuinely listened to in terms of bringing their

understanding and their experience into the model otherwise it won't be credible to them um so can we do that I think that's a really interesting challenge um but we also don't want to throw the

bbr with the bath water so I've I've shown you a number of um critiques of large and complex models and forward predictions and of course that might be

um that might be attractive to some in this in this anti-science brigade and I think that is a risk uh that if if there

is oversell of what science can do in these extrapolatory regimes and if there is too much of a tendency to obscure the social and political content of models

um then there will be a lack of confidence in science and this is partly this is what's already happening I think

so we need to work out a way to be more transparent about the the the use of these models and be more democratic in the way that they're generated and then

we will have useful scientific information without undermining that confidence um so yeah just a couple of points to finish then I mean the bad news is that

unless we are fully data driven and at the far left hand side where we expect the future to be a perfectly perfectly represented by the statistics of the past

um if we're not in that then we have radical uncertainty of possible future events and we simply can't expect to be able to fully model it and fully constrain it there will always be a

model risk and there will always be a gap between modern land and real world um which may have cascading consequences when it's discovered so the global financial crisis in 2008 was essentially

a collapse of that form um where a you know the the model the the discrepancy between the model and the real uh became too large and reality

always wins um so we have to do something about this it has it has massive consequences if we if we fail to do something about it but the good news is that models because

they're not just there to represent and predict they're also here to change the world we are we make models because we want to intervene on the system of interest and we want to be able to

change it and we want to be able to change it for the better better obviously as a value judgment so we need to talk about what we mean by that um but we can we can we can look at our

models we can scrutinize the value judgments that go into them and we can change them if we don't agree with them so I think that there is a there's a strong positive message here that you

know that that models are difficult beasts to get grips with but that if we if we do it well we can use models and we can develop new statistical methods

that can support us to make positive and inclusive decisions about a whole range of uh future um policies and actions that affect

everybody so that's uh I think a big opportunity as well as a challenge and I'll stop there thank you thank you so much Erica that was super

interesting and I am happy to open the Q a floor um and uh if you guys want to leave questions in the chat that's also fine but if you want to ask out loud that's

great as well um so yeah foreign maybe I will use the

um the opportunity since I am hosting um so I had this question um so basically we just need to embrace the fact that

um if we using different models we're getting different results because the approach to this different modals can be uh different and you know we can be coming out from different assumptions

um and uh um we can have like just different measures of Which models are you know good or bad plausible or not plausible uh but then like

um how uh and and that's kind of you know understandable from the statistical perspective as well uh but then uh it doesn't mean that we really never know

the truth or um or we kind of making this assumption that uh the plausible uh models or like the ones that we test is like good fit

ones they are the ones which bring us like closer to the reality or the truth uh but we still never know the truth because it's you know in a sense like some approximation that we're basing on

some assumptions is this enough like in terms of science it's a very philosophical question I don't know but I'm just curious your thoughts on that

yes yeah I mean I think in principle uh many aspects of the future are not predictable um but we happen to agree on some things

you know so I I have pretty high confidence that the sun will rise tomorrow and so do you and so does everybody else um but that's a judgment you know it

could be that it won't it could be that the laws of physics have been the same up to now and then tomorrow suddenly the speed of light's going to change and everything goes Haywire um but we have very high confidence that

that's not the case we sort of we have we have this hierarchy of things that we that we socially are really confident in and so

socially we can kind of agree on this set of foundational assumptions without which we can't make any predictions

about the future and with which we can make predictions and we have got you know a few centuries of evidence that we make really good predictions and they're

really useful and you know we send people to the Moon 50 years ago uh you know that's an incredible test of a whole range of

physical assumptions and ideas and models um so then the question is how far down do you go so there are some

things that everybody agrees on and then there are some things that just nobody could possibly know but and all of our science is sort of in the Middle where

some people might agree and other people might not agree so you know if we're interested in um vaccine efficacy you could make two

separate models one based on somebody with uh a strong prior belief in the uh expectation that this will work and also

with a trust in the system that generates evidence underlying it but somebody else coming in with a with a distrust of the system that generated

the evidence and a distrust of the model and the modelers they would end up making a different judgment about the quality of the output of that model but

I suppose what I'm saying is that that's that's not unreasonable and that the only way to get the only way to get past that is to um bring those people into the fold

somehow by talking more about the assumptions so supposing I claim that I'm a mind reader and I can predict the card that you have on your desk and it's the nine of diamonds and then you just

it again and it's the Ace of Spades and I get it right five times in a row um you know is that extremely strong evidence that I'm a mind reader or is it extremely strong evidence that I'm cheating

um you know if you have the a priori belief in the in the model then in a Bayesian sense every time you see that correct prediction it will confirm the

belief that it's a good model but if you're suspicious already then every time you see it it will just confirm your belief that I'm cheating um so what you need is not further runs

of the same model you need to be able to you know come to my office and check that I'm not cheating or do it in a different room and you have control of the experimental setup and then you'd be

convinced and I think the same goes for some of these other more political questions so I can see a couple of questions in the chat too yes uh there is a question there's two questions and one person asked whether

they can ask a question yes of course go ahead and ask but maybe maybe Eric if you can uh take a question from the chat uh like there was before that

um just to go in order yeah excuse me model selection Place key role how to model Ensemble methods uh fit in here and is there any recommendations on this I mean I guess

the recommendation is just to think about what you're doing and not to just take a uh take a method out of a statistics textbook and say I'm going to apply this um think about think about what's being

implied by the statistical method I think that is that is the key and there are many different methods and they will have different underlying assumptions

um and it's hey you know if if you just get it in a in an r package or something that does it all automatically then it can be pretty difficult to um to analyze that and to to find out

what's actually going on underneath but um so it isn't it probably isn't appropriate for everybody to do this but I think you know due diligence wise if we're going to be making uh significant

decisions based on the output of these kind of models we need to understand what they are doing and why and what the what the assumptions are that are embedded in them um Lee do you want to

unmute I don't know if they're there maybe we can go to the next one go to the next one shall we and they actually wrote a question in

the chat so you can address it like okay well should we go to Marcos next um how do you see the movement from Big Tech and other settings towards data

Centric scenarios yeah I mean I'm worried about it I think that uh that actually um we are moving further towards model land

and further intermodel land um and not on the on the assumption that we're at the far left hand side of this spectrum and that and that we can take the purely data-driven approaches and

just sort of throwing away and ignoring all of these questions about uh value judgments and about alternative possibilities um yes I think that there's a there's a

huge risk in this of people assuming that their model is perfect either by explicitly stating that or by by what they do with the

model that implies that judgment um and I suppose one of the aims of my book is to try to try to generate more discussion about about model land and

about the the difficulties when you're there and about how we how we go how we get out of Model land because we always have to get out of Model land if you want to do something in the real world

we always have to get out of Model land so if you if you just take your model and apply it you are getting out of Model Land by making an implicit

expert judgment that your model is perfect but in most cases that is obviously wrong uh so we could you know we can

start criticizing on that basis yeah yeah and we have um I guess like two more questions yeah yeah do you want me to just read them

out so shinwei asks um we know that we want to compare the models by a sharpness subject to calibration yeah okay and what about using scores such as CRPS

um so that's the continuous rank probability score um so yes I mean there's the CRPS there's the prior score there's log probability score

um and then you could go into the information criteria there are many different ways of scoring but I suppose this goes back to the point that any

choice of score implies a value judgment about the relative importance of different kinds of outcomes and that will be relative to whatever you hope to use your model for

um so you know what it what is the basis of comparison for your models if you've got six different models

you could pick the one that that does the best On Any Given score but you'll get the cat that looks most like a dog you don't get a dog um so how

what is the what is it that you're aiming to do and how good do you think your models are you know you have to be extremely confident in your set of models

to want to choose the cat that looks most like a dog if you're slightly less confident then it's okay to choose maybe a range of cats that will hopefully produce a confidence interval for your

dog uh you know then there are other methods that perhaps put a discrepancy term in and say we we will take a range

of cats and they will essentially generate a fuzzy picture of the dog because we'll add noise to everything or we'll take a range um

but regardless of what score you used to do that you still have the fundamental problem of the potential mismatch between model and reality so if you are completely data driven then you're

you're sort of okay and it probably won't make much difference which skill score you choose to use whether that's CRPS or Briar or or a log likelihood

um you'll probably get more or less the same answer um yeah but most of us are not there and if you're not there it makes quite a big difference because for example if you

use a log likelihood then you get an infinite probability that you get an infinite penalty for doing something that the model says was impossible um so then then that introduces uh

strong dependencies on your choice of scoring function I mean there's a huge amount more I could say about that but you can't stop there all right

um if there is no more questions there's thank you notes in the in the chat uh we are actually uh on time uh and uh

um yeah if there's no more questions I'm gonna be closing the session and thanking so much Erica for this uh amazing talk and congratulations to the Volk again

um and uh um yeah thank you so much everyone for joining us today and follow our page at Civic data science seminar series and we're going to have another seminar in two weeks so thank you

everyone and we're going to stop the recording all right thank you very much good to be here

Loading...

Loading video analysis...