The Dangerous Illusion of AI Coding? - Jeremy Howard
By Machine Learning Street Talk
Summary
Topics Covered
- Next Word Prediction Builds Hierarchies
- LLMs Cosplay Understanding
- AI Erodes Human Engineering Growth
- AI Coding Creates Slot Machine Illusion
- Interactive Notebooks Trump Line-Based Coding
Full Transcript
It literally disgusts me. Like I literally think it's inhumane. My mission remains the same as it has been for like 20 years, which is to stop people working like this. Jeremy Howard, a deep learning pioneer, a Kaggle grandmaster. He is a
like this. Jeremy Howard, a deep learning pioneer, a Kaggle grandmaster. He is a huge advocate for actually understanding what we are building through an interactive loop, a notebook, a REPL, the act of poking at a problem until it pushes back. He argues this is where the real insight happens.
And the funny thing is they're both right. LLMs cosplay understanding things. They pretend to understand things. No one's actually creating 50 times
things. They pretend to understand things. No one's actually creating 50 times more high-quality software than they were before. So we've actually just done a study of this and there's a tiny uptick, tiny uptick in what people are actually shipping. The thing about AI-based coding is that
actually shipping. The thing about AI-based coding is that it's like a slot machine in that you have an illusion of control. You know,
you can get to craft your prompt and your list of MCPs and your skills and whatever. But in the end, you pull the lever, right? Here's a piece of
and whatever. But in the end, you pull the lever, right? Here's a piece of code that no one understands. And am I going to bet my company's product on it? And the answer is I don't know. Because
I don't know what to do now because no one's been in this situation.
They're really bad at software engineering. And then I think that's possibly always going to be true. The idea that a human can do a lot more with a computer when the human can manipulate things the objects inside that computer in real time and study them and move them around
and combine them together. Whoever you listen to, whether it be Feynman or whatever, you always hear from the great scientists how they build deeper intuition by building mental models, which they get over time by interacting with the things that they're learning about. A machine could kind of
build an effective hierarchy of abstractions about what the world is and how it works entirely through looking at the statistical correlations of a huge corpus of text using a deep learning model. That was my premise. This video is brought to you by NVIDIA GTC. It's running March the 16th until the 19th in San Jose and
GTC. It's running March the 16th until the 19th in San Jose and streaming free online. The key topics this year are agentic AI and reasoning, high performance inference and training, open models and physical AI and robotics. I'm so excited about the DJX Spark. I've been on the waiting list for over a year now. It's
a personal supercomputer that is about the size of a Mac Mini. It's the perfect adornment to a MacBook Pro, by the way. And you can fine-tune a 70 billion parameter language model with one of these things. And I'm giving one away for free.
All you have to do is sign up to the conference and attend one of the sessions using the link in the description. As for the sessions, I'm interested in attending Aman Sang-is talk. So he's the CTO of Cursor. And his session is Code with Context, build an agentic IDE that truly understands your code base. Now, obviously, Jensen's keynote is on March the 16th. He said he's going to unveil a new chip
that will surprise the world. Their next generation architecture, Vera Rubin, is already in full production. And there's speculation we might even get an early glimpse of their new Feynman
production. And there's speculation we might even get an early glimpse of their new Feynman architecture. So don't forget, folks, the link is in the description. If you're attending virtually,
architecture. So don't forget, folks, the link is in the description. If you're attending virtually, it's completely free. Don't miss it. Jeremy Howard, welcome to MLST. Welcome to
my home. Thanks for coming. Yeah, well, where are we now? We are in beautiful Moreton Bay in southeast Queensland. We are by the sea in my backyard. The weather didn't disappoint. It certainly didn't. It doesn't often, but if you were
backyard. The weather didn't disappoint. It certainly didn't. It doesn't often, but if you were here yesterday, it would have been very different. Well, I don't know where to start.
So I've been a huge fan probably since about 2017, 18. Of course, you had the famous ULM Fit paper. And when I was at Microsoft, I remember doing a presentation about that because it was actually, I mean, now we take it for granted that we fine-tune language models on a corpus of text and then we kind of like continue to train them and specialize them. But apparently this was not received wisdom.
Oh, this was the first time it happened. Yeah, kind of the first or second.
So... Wockley and Andrew Dye had done something a few years ago, but they'd missed the key point, which is the thing you pre-train on has to be a general purpose corpus. So no one quite realized this key thing. And maybe I had a
purpose corpus. So no one quite realized this key thing. And maybe I had a bit of fortune here that my background was in philosophy and cognitive science.
And so I'd spent some decades thinking about this. The technical architecture of ULM fit, just sketch that out. I'm a huge fan of regularization. I'm a huge fan of taking a model that's incredibly flexible and then making it more constrained not by decreasing the size of the architecture, but by adding regularization. So even that at the time
was extremely controversial, but that was by no means a unique insight of ours. So what Steve and Merity had done is he'd taken the extreme flexibility
of ours. So what Steve and Merity had done is he'd taken the extreme flexibility of an LSTM, kind of the classic stateful recurrent neural net towards which things are kind of gradually heading back towards nowadays, And it added five different types of regularization. He added every type of regularization you can
imagine. And then that was my starting point, was to say, okay, I now have
imagine. And then that was my starting point, was to say, okay, I now have a massively flexible deep learning model that can be as powerful as I want it to be. And it can also be as constrained as I need it to be.
to be. And it can also be as constrained as I need it to be.
And then I needed a really big corpus of text. Funnily enough, this is also Stephen. He had been at Common Crawl and I think he helped or made the
Stephen. He had been at Common Crawl and I think he helped or made the Wikipedia data set. And then I realized actually the Wikipedia data set made lots of assumptions. It had all these like, unk for unknown words, because it all assumed classic
assumptions. It had all these like, unk for unknown words, because it all assumed classic NLP approaches. So I redid the whole thing, created a new Wikipedia data set, and
NLP approaches. So I redid the whole thing, created a new Wikipedia data set, and that was my general corpus. And then I used AWD LSTM, trained it. So it
was actually overnight. So for eight hours on a gaming GPU, you know, um, because I was at the University of San Francisco, we didn't have heaps of resources. Um, probably like a 2080 TI or something, I suspect. Um,
resources. Um, probably like a 2080 TI or something, I suspect. Um,
and then the next morning when I woke up, I then, it's the same three-stage architecture that we do today. You know, pre-training, mid-training, post-training. So then I figured, okay, now that I've trained something to predict the next word of Wikipedia, it must know a lot about the world. I then figured if I then fine-tune it on a
corpus-specific, so what we could now call supervised fine-tuning, um, dataset, which in this case was the dataset of movie reviews. It would become especially good at predicting the next word of those, so I'd learn a lot about movies. Did that for like an hour, and then like a few minutes of fine-tuning the
downstream classifier, which was a classic academic dataset, it's kind of considered the hardest one, which was to take like 5,000-word movie reviews and to say like, was this a positive or negative sentiment, which today is considered easy. But at that time, you know, the only things that did it quite well were highly specialized models that people wrote
their whole PhDs on. And I beat all of their results, you know, five minutes later when fine-tuning that model was amazing. And the other interesting thing is this kind of methodology around how you do the fine-tuning. Yeah.
So the how we do the fine-tuning was something we had developed at Fast AI.
So this is kind of year one of fast AI. So this is still in our very early days. And one of the extremely controversial things we did was we felt that we should focus on fine-tuning existing models because we thought fine-tuning was important.
Some other folks were doing work contemporaneously with that. So Jason Nuszynski did some really great research, I think it was during his PhD, on how to fine-tune models and how good they can be. And some other folks in the Computer vision world, we were, you know, amongst the first. There's a bunch of us kind
of really investing in fine tuning. And so, yeah, we felt that using a single learning rate to fine tune the whole thing all at once made no sense because the different layers have different behaviors. And this is one of the things Jason Yusinski's research also showed. We developed this idea of like, well, it's also way faster if you just train the last layer, right? Because it only has to back prop
the last layer. And then once that's pretty good, backprop the last two and then the last three. And then we use something called discriminative learning rates. So different layers we would give different learning rates to. And then another critical insight that no one realized for years, even though we had told everybody, was that you actually have to
fine-tune every batch norm. So all the normalization layers, you do actually have to fine-tune because that's moving the whole thing up and down and changing its scale. So, yeah,
when you do that, you can often just fine-tune the last layer or two. And
we found that actually with ULM fit, although we did end up unfreezing all the layers, only the last two were really needed to get the close to state-of-the-art result.
So it took like seconds. Yeah, because the discriminative learning rate thing is interesting because I think the received wisdom at the time was when you fine-tune a model, if the learning rate is too high, you kind of blow out. representations. So I guess the wisdom was if you don't have a really low learning rate, you'll just destroy the representations. I mean, there was no received wisdom because nobody talked about it. No
the representations. I mean, there was no received wisdom because nobody talked about it. No
one cared, you know. It's just this... nearly no one cared.
Transfer learning was just not something anybody thought about. And Rachel and I felt like it matters more than anything, you know, because only one person... has to train a really big model once, and then the rest of us can all fine-tune it.
So we thought we just should learn how to do that really well. So we
spend a lot of time just trying lots of things. But in the end, the intuition was pretty straightforward, and what intuitively seemed like it ought to work, basically always did work. Which is another big difference between how people still today tend to do
did work. Which is another big difference between how people still today tend to do ML research I think it's all about ablations and you can't make any assumptions or guesses. And it's not at all true. I
find nearly everything that I expect to work almost always works first time because I spent a lot of time building up those intuitions, that kind of understanding of how gradients behave. I think there's a dichotomy though between continual learning, which is when we
gradients behave. I think there's a dichotomy though between continual learning, which is when we want to keep training the thing but maintain generality versus fine-tuning a thing to do something specific. So there's always been this idea that, yes, you can make a model
something specific. So there's always been this idea that, yes, you can make a model specific, you can bend it to your will, but you lose generality and you kind of degrade the representation. So tell me about that. Yeah, there's some truth in that, although not as much as you might think. On the whole, the big problem is that people don't actually look at their activations and don't actually look at their gradients.
So something we do in our software, in our FastAI software, is we have built into it this ability to create see in a glance what your entire network looks like. And once you've done it a few times, it just takes a couple
looks like. And once you've done it a few times, it just takes a couple of hours to learn, you can immediately see, oh, I see, this is overtrained or undertrained or this layer that something went wrong. It's not a mystery, you know. So
basically what happens is, for example, you end up with dead neurons that go to a point where they've got zero gradient, regardless of what you do with them. That
often happens if they, you know, head off towards infinity. You can always fix that.
So yeah, it's not as bad as people think by any means. Something that trains well for continuous learning when done properly can also be done well to train well for a particular task, if you're careful. In a sense, you do want the neurons to die. And I'll explain what I mean by this. Like, we want
to bend the behavior of models to introduce implicit constraints because without constraints, there is no creativity, there is no reasoning and so on and so forth. So,
so in a sense, you actually want it to say, don't do that. You want
it to do something else. I don't think of it that way. Like to me, it's more like, I find thinking about humans extremely helpful when it comes to thinking about AI. I find they behave more similarly than differently differently. and
about AI. I find they behave more similarly than differently differently. and
my intuition about each tends to work quite well. You know, with a human, when you learn something new, it's not about unlearning something else. And so something I always found is when I got models to try to learn to do two somewhat similar tasks, they almost always got better at both of them than one that only learned one of them. I was reminded a little bit of, you know, the Dino
paper from Lacoon. So this whole kind of... regime of self-supervised learning with, I mean, that was a vision model. But, you know, the idea was, okay, so we're doing pre-training and we want to maintain as much diversity and fidelity as possible so that when we do the downstream task, we can kind of, we've got more things that we can latch on. Yeah, yeah. And, you know, semi-supervised and
self-supervised learning was such an unappreciated area. And, yeah, Jan LeCun was absolutely one of the guys who was also... working on it. I actually did a post because I was so annoyed at how few people cared about semi-supervised learning. I did
a whole post about it years ago. Jan LeCun looked at it for me as well and, you know, suggested a few other pieces of work that I had missed.
But I was kind of surprised at how incredibly useful it is to basically say, like, basically come up with a pre-text task, right? So we did this in vision before ULM fit. So it was like, In medical imaging, you know, take a histology slide and predict, you know, mask out a few squares and predict what used to be there. So
some of my students at USF, I had doing stuff with that. It was basically entirely taking stuff that we and others had already done in vision. So like this idea of masking out squares, we didn't invent it. Masking out words, was the obvious thing, you know, and this idea of, um, gradually unfreezing layers we had done before in computer vision, the whole idea of starting with a pre-trained model
that was general purpose had been in computer vision. There was a really classic paper actually in computer vision in, I'd have been around 2015, was entirely an empirical paper saying, look what happens when we take a pre-trained ImageNet model, predicting what sculptor created this sculpture, or predicting what architecture style this is. And like
in every task, it got the state-of-the-art result. And it really surprised me people didn't look at that and think like, I bet that ought to work in every other area as well, whether it be genome sequences or language or whatever. But people have a bit of a lack of imagination, I find. They tend to assume things only work in one particular field. That's really true. Yeah, I mean, I guess there's two
things there. I mean, first of all, we were kind of hinting at this notion
things there. I mean, first of all, we were kind of hinting at this notion of almost Goodhart's law or the shortcut rule that you get exactly what you optimize for at the cost of everything else. But that doesn't seem to be the case because we can optimize for perplexity in the case of language models. And as you say, what seems to happen is we're getting into the distributional hypothesis here a little
bit. So, you know, you know the word by the company it keeps. So when
bit. So, you know, you know the word by the company it keeps. So when
we have an incredible amount of associative data, it might be massed autoprediction or any of these things like that. the model seems to build something that we might call an understanding. I've always thought of it as a hierarchy of abstractions.
an understanding. I've always thought of it as a hierarchy of abstractions.
If it's going to predict, if the document is, here was the opening that Bobby Fischer used, and it has chess notation to predict the next thing, it needs to know something about chess notation, or at least openings. If it's like,
you know, and this was vetoed by the 1956 US president, comma, you need to know, you don't just need to know who the president was, but the idea that there are presidents, and therefore the idea that there are leaders, and therefore the idea that there are groups of people who have hierarchies, and therefore that there are people, and therefore that there are objects. Like, you can't predict the next word of a
sentence well without knowing all of these things. So that knowing my hypothesis for why I created ULM fit is to say,
things. So that knowing my hypothesis for why I created ULM fit is to say, it would end, to compress that as well as possible to get that knowledge, it would have to create these abstractions, these hierarchical abstractions, somewhere deep inside its model.
Otherwise, how could it possibly do a good job of predicting the next word? You
know, and because deep learning models are universal learning machines, you know, and we had a universal way to train them, I figured if we get the data right and if the hardware's good enough, then in theory we ought to be able to build that next word predicting machine, which ought to implicitly build
a hierarchical structural understanding of the things that are being described by the text that it is learning to predict. I think that they can know in quite a, you know, they know in quite a superficial way. So there's a myriad of surface statistical relationships and they generalize extraordinarily well. It's miraculous. It is. But the
thing is, I want to contrast this with other comments you've made about creativity. So
I think knowledge is about constraints. And I think creativity is the evolution of knowledge respecting those constraints. Therefore, AI is not creative. And you've said the same thing. You've
said AI isn't creative. So on the one hand, how can you say that they know and not think that they can be creative? I mean, I don't think I've used that exact expression. You know, I know I've actually, I remember... chatting with Peter Norvig on camera, and both of us said, well, actually, they kind of are creative.
We've just got to be a bit careful about our choices of words, I guess.
So, you know, Peter Wozniak, who's a guy I really, really respect, who kind of rediscovered space-to-be-hittitive learning, built the super-memory system, and is the modern-day guru of memory. The entire reason he's based his life around remembering things is because
memory. The entire reason he's based his life around remembering things is because he believes that creativity comes from having lot of stuff remembered, which is to say putting together stuff you've remembered in interesting ways is a great way to be creative.
LLMs are actually quite good at that, but there's a kind of creativity they're not at all good at, which is, you know, moving outside the distribution. So,
uh, which I think is where you're heading with your question. Um, but I just kind of, I'm framing it this way to say you have to be so nuanced about this stuff because if you say like they're not creative, it can give you the wrong idea because they can do very creative seeming things.
But if it's like, well, can they really extrapolate outside the training distribution?
The answer is no, they can't. But the training distribution is so big and the number of ways to interpolate between them is so vast, we don't really know yet what the limitations of that is. But I see it every day, you know, because my work is R&D. I'm constantly on the edge of and outside the training data.
I'm doing things that haven't been done before. And there's this weird thing, I don't know if you've ever seen it before, I see it, but I see it multiple times every day, where the LM goes from being incredibly clever to like worse than stupid, like not understanding the most basic fundamental premises about how the world works. Yeah. And it's like, oh, whoops, I feel
outside the training data distribution. It's gone dumb. And then like, there's no point having that discussion any further because, you know, you've lost it at that point. Yes. Yeah.
I mean, I love, you know, Margaret Bowden, she had this kind of hierarchy of creativity. So there's like combinatorial, exploratory and transformative. And the models can certainly do
creativity. So there's like combinatorial, exploratory and transformative. And the models can certainly do combinatorial creativity. But for me, it's all about constraints. So I mean, this is what
combinatorial creativity. But for me, it's all about constraints. So I mean, this is what Bowden said. And even Leonardo da Vinci, he said that creativity is all about constraints.
Bowden said. And even Leonardo da Vinci, he said that creativity is all about constraints.
And you've spoken about, you know, we'll talk about this dialogue engineering. But what happens is when we talk with language models, it's a specification acquisition problem. So we go back and forth. And actually, when we think, you know, The process of intelligence is about building this imaginary Lego block in our mind and respecting various constraints. And when
you respect those constraints and you just continue to evolve, then those things are said to be creative. So language models, when you add constraints to them, so this could be via supervision, via critics, via verifiers, then they are creative. And we see Alpha Evolve, we've seen many examples of this. The illusion is, on their own, sans constraints, obviously they have this behavioural shaping stuff that we're talking about, they don't have hard
constraints, and that's why they can't go outside their distribution. I mean, I think they can't go outside their distribution because it's just something that that type of mathematical model can't do. I mean, it can do it, but it won't do it well. When
can't do. I mean, it can do it, but it won't do it well. When
you look at the kind of 2D case of fitting a curve to data, once you go outside the area that the data covers, the curves... disappear off into space in wild directions, you know. And that's all we're doing, but we're doing it in multiple dimensions. I think Bowdoin might be pretty shocked at
multiple dimensions. I think Bowdoin might be pretty shocked at how far compositional creativity can go when you can compose the entirety of the human knowledge corpus. And I think this is where people often get confused, because it's like… So for example, I was talking to a chris latner yesterday about
uh how claude uh anthropic you know had had got claude to write a c compiler and they were like oh this is a clean room c compiler you can tell it's clean room because it was created in rust you know and um so chris created the kind of I guess it's probably the most widely
used C++ compiler nowadays, Clang, on top of LLVM, which is the most widely used kind of foundation for compilers. They were like, oh, well, Chris didn't use Rust, and we didn't give it access to any compiler source code, so it's a cleanroom implementation.
But that misunderstands how LLMs work, right? Which is,
all of Chris's work was in the training data, many, many times LLVM is used widely and lots and lots of things are built on it, including lots of C and C++ compilers. Converting it to Rust is an interpolation between parts of the training data, you know. It's a style transfer problem. So it's definitely
compositional creativity at most, if you can call it creative at all. And you actually see it when you look at the the repo that it created, it's copied parts of the LLVM code, which today Chris says, like, oh, I made a mistake. I shouldn't have done it that way. Nobody
else does it that way. You know, oh, wow, look, they're the only other one that did it that way. That doesn't happen accidentally. That happens because you're not actually being creative. You're actually just finding the kind of nonlinear average point in your training
being creative. You're actually just finding the kind of nonlinear average point in your training data between, like, Rust things and building compiler things. All of that is true. I
mean, first of all, I think we shouldn't underestimate the size of how big this combinatorial creativity is. So all of that is true. So the code is on the internet, but also they had a whole bunch of tests which were scaffolded, which meant that every single time some code was committed, they could run the test and they basically had a critic and they could then do this autonomous feedback loop. So in
a sense, it's very similar to the recent research by OpenAI and Gemini where you're trying to solve a problem in math and you already have an evaluation function.
The same on the ARC prize, right? You have an evaluation function. And what people discount is even knowledge of what the evaluation function is, is partial knowledge of the problem. So you can then brute force search. You can use the statistical pattern matching,
problem. So you can then brute force search. You can use the statistical pattern matching, use the verifier as a constraint, and you can actually solve it. And they don't even need to do that, right? Like you literally already know how to pass those tests because there's lots of software that already does it. So it just uses that.
and translates them to Rust. That's all it did, which is impressive. And I'm much less familiar with math than I am computer science, but from talking to mathematicians, they tell me that that's also what's happening with like EDoS problems and stuff. Some of them are newly solved, but they are not,
sparks of insight. You know, they're solving ones that you can solve by meshing up together very closely related things that humans have already figured out. So on the subject of clawed code, Now, I know you've spoken extensively about vibe coding. Actually, Rachel
had some interesting work out. I mean, she quoted the meter study, which showed that productivity actually went down when people were vibe coding. But I think... And they thought that they went up, which is the most interesting bit. And then also there was the anthropic study. I mean, you know, maybe we should rewind a little bit. I
mean, Dario had this essay out the other day. I think it was called The Adolescence of Technology or something like that. And he was basically saying, look, you know, we have all of these amazing software engineers at Anthropic and they are just so productive. And he was experienced extrapolating to the average software engineer so there's going to
productive. And he was experienced extrapolating to the average software engineer so there's going to be mass unemployment because soon we're going to be able to automate all of this with ai i mean it doesn't make any sense um elon musk said something a bit similar a few days ago saying like oh llms will just spit out the machine code directly we won't need libraries programming languages
yeah um yeah look the thing is none of these guys have have been software engineers recently. I'm not sure Dario's ever been a software engineer at all. Software
engineering is a unusual discipline and a lot of people mistake it for being the same as typing code into an IDE. Coding is
another one of these style transfer problems. You take a specification of the problem to solve and you can use your compositional creativity to find the parts of the training data which interpolated between them solve that problem and interpolate that with syntax of the target language and you get code. There's a very famous essay by
Fred Brooks written many decades ago, No Silver Bullet, and which it almost sounded like he was talking about today. He was specifically saying something, he was responding to something very similar, which is in those days it was all like, oh, what about all these new fourth generation languages and stuff like that, you know, we're not
going to need any coders anymore, any software engineers anymore, because software is now so easy to write, anybody can write it. And he said, well, he guessed that you could get at maximum a 30% improvement.
He specifically said a 30% improvement in the next decade, but I don't think he needed to limit it that much, because the vast majority of work in software engineering isn't typing in the code. Um, so in some sense, parts of what Dario said were right, just like, for quite a few people now,
most of their code is being typed by a language model.
Um, that's true for me. Uh, say like maybe 90%.
Um, but it hasn't made me that much more productive. Um, because that was never the slow bit. It's also helped me
productive. Um, because that was never the slow bit. It's also helped me with kind of the research a lot and figuring out, you know, which files are going to be touched. But any time I've made any attempt at getting an LLM to like design a solution to something that hasn't been
designed lots of times before, it's horrible. Because what it actually every time gives me is... design of something that looks on its surface a bit similar.
is... design of something that looks on its surface a bit similar.
And often that's going to be an absolute disaster because things that look on the surface a bit similar and like I'm literally trying to create something new to get away from the similar thing. It's very misleading. First of all, I'm exasperated by what I see as the tech bro predilection to misunderstand cognitive science and philosophy and whatnot. Because we've spoken to so many really interesting people on MLST, like for example,
whatnot. Because we've spoken to so many really interesting people on MLST, like for example, Cesar Hidalgo, he wrote this book, The Laws of Knowledge. And even Marvita Chiramuta, she's a philosopher of neuroscience, and she was talking all about, you know, basically that knowledge is protean. So, yeah, I think that knowledge is perspectival. I don't think that knowledge
is protean. So, yeah, I think that knowledge is perspectival. I don't think that knowledge can be this abstract, perspective-free thing that can exist on Wikipedia. And I
also think that knowledge is embodied and it's alive. It's something that exists in us.
And the purpose of an organization is to preserve and evolve knowledge. So when you start delegating cognitive tasks to language models, you actually have this weird paradoxical effect that you erode the knowledge inside the organization. Well, that's true. And that's terrifying. There's often
these arguments online between people who are like, LLMs don't understand anything.
They're just pretending to understand. And then other people are like, don't be ridiculous. Look
what this LLM just did for me. Right. And the funny thing is they're both right. LLMs cosplay understanding things. They pretend to
right. LLMs cosplay understanding things. They pretend to understand things. And this was the interesting thing about the early kind of work with
understand things. And this was the interesting thing about the early kind of work with like cognitive science work with like Daniel Dennett. That's basically what the Chinese room experiment is, right? It is, you've got a guy in a room, who can't speak Chinese
is, right? It is, you've got a guy in a room, who can't speak Chinese at all, but he sure looks like he does because you can feed him questions and he gives you back answers, but all he's actually doing is looking up things in a huge array of books or machines or whatever. The difference between pretending to be intelligent and actually being intelligent is entirely unimportant as long as you're
in the region in which the pretense is actually effective, you know?
So it's actually fine for a great many tasks, that LLMs only pretend to be intelligent because for all intents and purposes, it just doesn't matter until you get to the point where it can't pretend anymore. And then you realize like, oh my God, this thing's so stupid. I'm a fan of Searle, by the
way. So, you know, he said that understanding is causally reducible, but ontologically
way. So, you know, he said that understanding is causally reducible, but ontologically irreducible. And he was saying there was a phenomenal component to understanding, but you don't
irreducible. And he was saying there was a phenomenal component to understanding, but you don't even need to go there. Like the interesting thing about knowledge being protean is this idea that, you know, it's basically this Kantian idea. The world is a complex place.
None of us understand it. It's like the blind men and the elephant. We all
have different perspectives. It's a very complex thing. And so we all do this kind of modeling. But the interesting thing is that the language model, sometimes they seem to
of modeling. But the interesting thing is that the language model, sometimes they seem to understand. And they understand because the supervisor places them in a frame. So inside that
understand. And they understand because the supervisor places them in a frame. So inside that frame, so when you have that perspective of the elephants, they're actually surprisingly coherent. but
we discount the supervisor placing the models in that frame. Yeah, yeah. So that, so Searle versus Dennett, or is it versus Searle and Dennett, was what everybody was talking about back when I was doing my undergrad in philosophy, you know. So I think Consciousness Explained came out about then, probably Chinese Room a little bit before.
And it's interesting because the discussions were the same discussions we're having now, but they've gone from being abstract discussions to being real discussions. It's helpful if people go back to the abstract discussions because it helps you get out of your, you know, it's very distracting at the moment to look at something that's cosplaying intelligence so well and go back to the fundamental question.
So anyway, I just wanted to mention that it's kind of this interesting situation we're now in where it's very easy to really get the wrong idea about what AI can do, particularly when you don't understand the difference between coding and software engineering. Which then takes me to your
point, or your question, about the implications of that for organizations. You know, a lot of organizations are basically betting their
for organizations. You know, a lot of organizations are basically betting their futures on a speculative premise, which is that AI is going to be able to do everything better than humans, or at least everything in coding better than humans. I worry about this a lot, both for the organizations and for the humans. For the humans, when you're not
actively using your design and engineering and coding muscles, you don't grow. You might even wither, but you at least don't grow. And speaking of the CEO of an R&D startup, you know, if my staff aren't growing, we're going to fail. You know,
we can't let that happen. And getting better at the particular prompting skills, whatever details of the current generation of AI, CLI frameworks isn't growing. You know, that's as helpful as learning about the details of some AWS API when you don't actually
understand how the internet works. You know, it's not reusable knowledge. It's
ephemeral knowledge. So like if you wanted to, you can actually use it as a learning superpower. But also it can do the opposite. You know, the natural thing it's going to do is remove your competence
opposite. You know, the natural thing it's going to do is remove your competence over time. I agree that that's the natural thing. So this is especially pertinent for
over time. I agree that that's the natural thing. So this is especially pertinent for you because your career has been around basically educating people to get, you know, technology and AI literacy. So the default behavior is very similar to a self-driving car, but you know, there's this tipping point where at some point you're not engaged anymore. You're
not paying attention and you get this delegation of competence and you get understanding debt.
That's the default thing. So this study from Anthropic a couple of weeks ago, it contradicted Dario completely because it even said that, yeah, there were a few people in the study that were asking conceptual questions that are actually kind of, you know, keeping on top of things. And they had a gradient of learning, but most people didn't.
And my hypothesis about that is, you know, the ideal situation for Gen AI coding is that, like us, we've been writing software for decades. We already have this abstract understanding. We're using it in domains that we know well. And we can specify, we
understanding. We're using it in domains that we know well. And we can specify, we can remove loads of ambiguity. We can track and we can go back and forth and we can stay in touch with the process. But what happens is that the default attractor is for people to just go into this autopilot mode and they've got no idea what's happening and it's actually making them dumber. I created the first deep
learning for medicine company called Denlytic back in, what was that, like 2014. And our
initial focus was on radiology. And a lot of people were worried that this would cause radiologists to become less effective at radiology. And I strongly felt the opposite. which is, and I did quite a bit of research into this, of like what happens when there's like fly-by-wire in aeroplanes or anti-lock brakes in cars
or whatever. If you can successfully automate parts of a task that really are
or whatever. If you can successfully automate parts of a task that really are automatable, you can allow the expert to focus on the things that they need to focus on. And we saw this happen. So in radiology, we found if we could automate identifying the possible parameters nodules
in a lung CT scan. We were actually good at it, which we were. And
then the radiologist then can focus on looking at the nodules and trying to decide if they're malignant or what to do about it. So again, it's one of these subtle things. So if there's things which you can
subtle things. So if there's things which you can fully automate effectively in a way that you can remove that cognitive burden from a human so that they can focus on things that they need to focus on, that can be good, you know. I don't know where we sit in software development, because, you know, I've been coding for
40-ish years, so I've written a lot of code, and I can glance at a screen of code and, you know, unless it's something quite weird or sophisticated, I can immediately tell you what it does and whether it works and whatever.
I can kind of see intuitively things that could be improved, you know, possible things to be careful of. I'm not sure I could have got to that point if I hadn't have written a lot of code. So the people I'm finding who can really benefit from AI right now are either really junior people who can't
code at all, who can now write some apps that they have in their head. And as long as they work reasonably quickly with
head. And as long as they work reasonably quickly with the current AI capabilities, then they're happy. And then really experienced people like me or like Chris Latner, because we can basically do some of our typing for us, you know, and some of our research for us. People in the middle, which is most people, most of the time, It really worries me because how do
you get from point A to point B without typing code? It might be possible, but we don't have a, we have no experience of that. We don't, is it possible? How would you do it? Like, is it kind of like going back to
possible? How would you do it? Like, is it kind of like going back to school where at primary school, we don't let kids use calculators so that they develop their number muscle. Do we need to do that for like first five years as a developer? You have to write all the code yourself. Yeah. I don't
know. But if I was between like two and 20 years of experienced developer, I would be asking that question of myself a lot.
Because otherwise you might be in the process of making yourself obsolete. Yeah.
Well, this is another thing about knowledge that this Cesar Hidalgo guy said. So he
said that knowledge is non-fungible, which means it can't be exchanged. So what he means by that is... the process of learning is in some important sense not reducible. So
you have to have the experience and the experience has to have friction. And when
we build models of the world, we actually learn, like you know there's this phrase, reality pushes back. So we make lots of mistakes and we update our models and we're just placing these coherence constraints in our model and that's how we come to learn. So you use crawl code, And there's so little friction in the process. That's
learn. So you use crawl code, And there's so little friction in the process. That's
exactly what this study from Anthropik said. It said there was so little friction, they didn't learn anything. Right. Yeah, no, exactly. Desirable difficulty is the concept that kind of comes up in education. But even going back to the work of Ebbinghaus, who was the original repetitive space learning guy in the 19th century, and then Pieter Wozniak more recently, we find the same, like, we know that
memories don't get formed, unless it is hard work to form them. So, you know, that's where you kind of get this somewhat surprising result that says revising too often is a bad idea because it comes to mind too quickly. And so with repetitive
space learning, with stuff like Anki and SuperMemo, the algorithm tries to schedule the flashcards just before the moment you're about to forget. So then it's hard work. So I
studied Chinese for 10 years in order to try to learn about learning myself.
And I really noticed this, that I used Anki. And because it was always scheduling my cards just before I was about to forget them, it was always incredibly hard work, you know, to do reviews because... Almost all the cards were once I was on the verge of forgetting. That was absolutely exhausting. But my god, it worked
well. Here I am. I don't really, haven't done any study for 15 plus years
well. Here I am. I don't really, haven't done any study for 15 plus years and I still remember my Chinese. I know. I mean, also, I mean, coming back to your radiology example, like one example people give is call centers. So
we have this notion that in an organization we have high intelligence roles and low intelligence roles. And for me, intelligence is just the adaptive acquisition and synthesis of knowledge.
intelligence roles. And for me, intelligence is just the adaptive acquisition and synthesis of knowledge.
So we assume that the low intelligence roles doing the call center stuff It doesn't adapt, which means we can, you know, there are certain things that an organization does that do not change. So we could automate them and we don't need to update our knowledge. And I think that discounts actually, maybe with the radiology example, that having
our knowledge. And I think that discounts actually, maybe with the radiology example, that having this holistic knowledge, like, you know, in a call center, there are so many weird edge cases that come in, so many weird things happen, and that filters up in the organization and we adapt over time. So when you start to automate things and you actually lose the competence to create the process, which created the thing in the
first place, and you lose the evolvability of that knowledge in the organization, you're actually kind of cutting your legs off. Yeah, absolutely. And so, you know, all I know is in my company, I just, I tell our staff all the time, almost the only thing I care about is how much
your personal human capabilities are growing. You know, I don't actually care how many PRs you're doing, how many features you're doing, like, there's that nice, you know, John Osterhout, the TCL guy, recently released some of his Stanford Friday takeaway lectures, and he has this nice one called
A Little Bit of Slope Makes Up for a Lot of Intercept, which is basically the idea that, you know, in your life, if you can focus on doing things that cause you to grow faster, it's way better than focusing on focusing on the things that you're already good at, you know, that has that high intercept. So the only thing I really care about, and I think is the only
intercept. So the only thing I really care about, and I think is the only thing that matters for my company, is that my team, I'm focusing on their slope.
If you focus on just driving out results at the limit of whatever AI can do right now, you're only caring about the intercept, you know? So I think it's basically a path to obsolescence through both the company and the people who are in it. And so I'm really surprised how many executives of big companies are pushing this
it. And so I'm really surprised how many executives of big companies are pushing this now because it feels like if they're wrong, which they probably are, and they have nowhere to tell if they are because this is an area they're not at all familiar with. They never learned it in their MBAs. They're basically setting up their companies
familiar with. They never learned it in their MBAs. They're basically setting up their companies to be destroyed. Yeah. I'm really surprised that, you know, shareholders would let them do that. You know, set up such an incredibly speculative action.
Yeah, here we are. It feels like a lot of companies are going to fail as a result of the amassed tech debt that causes them to not be able to maintain or build their products anymore. There are loads of folks out there like Francois Cholet. He really gets it. He understands this. And so he's always said that
Francois Cholet. He really gets it. He understands this. And so he's always said that it's about this kind of mimetic sharing of cognitive models about the domain and how we refine it together. On the sharing thing, this is another big scaling problem with Gen.AI coding, right? So the ideal case, I've done this. I know a domain really well. and I can specify it with exquisite detail, and I tell Claude Code, go
well. and I can specify it with exquisite detail, and I tell Claude Code, go and do this thing, and the model's in my mind, doesn't matter. And then you go into an organization, and now I need to share my knowledge with all of the other people, right? And I'm sure you have this in your company as well.
This knowledge acquisition bottleneck is a real serious problem in organizations. So when it's just me, I think I'm probably about 50 times more productive using Claude Code. It's absolutely
magic, and I can see why people are so excited about it. People don't seem to understand the bottleneck and how that doesn't really translate to many real-world organizations.
No one's actually creating 50 times more high-quality software than they were before.
So we've actually just done a study of this, and there's a tiny uptick, tiny uptick in what people are actually shipping. That's the facts. Obviously, I'm an enthusiast of AI and what it can do, but also... My wife,
Rachel, recently pointed out in an article, all of the pieces that make gambling addictive are present in...
Oh, yeah, Dark Flow. I was going to bring that up. Yeah, AI-based coding. It's
this really awkward situation where it's very... Almost everybody I know who got very enthusiastic about AI-powered coding in recent months have...
totally changed their mind about it when they finally went back and looked at like how much stuff that I built during those days of great enthusiasm am I using today are my customers using today am I making money from today almost all the money is being made by influencers you know or by the companies that
produce the tokens the thing about AI based coding is that it's like a slot machine and that you You have an illusion of control. You know, you can get to craft your prompt and your list of MCPs
control. You know, you can get to craft your prompt and your list of MCPs and your skills and whatever. And then, but in the end, you pull the lever, right? You put in the prompt and something comes back and it's like, cherry,
right? You put in the prompt and something comes back and it's like, cherry, cherry. It's like, oh, next time I'll change my prompt a bit. I'll add a
cherry. It's like, oh, next time I'll change my prompt a bit. I'll add a bit more context, pull the lever again, pull the lever again. It's the stochastic thing. You get the occasional win. It's like, oh, I won, I got a feature.
thing. You get the occasional win. It's like, oh, I won, I got a feature.
So it's got all these hallmarks of like, loss described as a win, somewhat stochastic, feeling of control, all the stuff that gaming companies try to engineer into their gaming rooms. Now, none of that means that AI is not useful, but gosh, it's hard to tell. I know. And Rachel, just to be clear, she
also said that one of the hallmarks of gambling is that you kind of delude yourself that you have some awareness of what's going on, but actually you don't. But
let's do the bull case a little bit, though. So I do think in restricted cases, it is very useful. And these are cases where we understand and we can place constraints and specification. But even in those cases, you could argue on the one hand that we're not going to be unemployed anytime soon because you just do more work. On the addiction thing, I've noticed that. So I've had 14-hour Claude Code marathon
work. On the addiction thing, I've noticed that. So I've had 14-hour Claude Code marathon sessions, and I actually feel addicted to it. It's like a slot machine. You know,
it really is. In there too. Absolutely. Yeah, I know. And just I've never felt more drained writing code. I actually need to take a rest afterwards, like a few days rest, because it completely kills me. It feels like crap, you know. Yeah, definitely.
I've had some successes, right? And so, in fact... spent the last couple of years building a whole product based around where we know the successes are going to be, which is when you're working on reasonably small pieces that you can fully understand and that you can design and you can build up your own layers of abstraction to create things that are bigger than the parts that you're building out of. I
had a very interesting situation recently where it was kind of an experiment, basically, which is we rely very heavily on something called iPi kernel, which is the thing that's power's Jupyter notebooks. And there'd been a major version release of IPyKernel from six to seven, and it stopped working. And it stopped working in both
of the products that we were trying to use it with. One was called today NB Classic, which is the original Jupyter notebook. And then it's our own product called Solve-It. They would just randomly crash.
Solve-It. They would just randomly crash.
And IPyKernel's over 5,000 lines of code. It's very complex code, multiple threads, events, locks, interfaces with IPython, you know, with ZMQ, you know, all kinds of different pieces, debug pi. And I couldn't get my head around it, and I couldn't see why it
pi. And I couldn't get my head around it, and I couldn't see why it was crashing. The tests are all passing. I wonder if AI can solve this. You
was crashing. The tests are all passing. I wonder if AI can solve this. You
know, like I'm always interested in the question of like how big a chunk can AI handle on its own right now? The answer turned out to be yes. I
think it can just. It was like, so I spent a couple of weeks.
I didn't develop a lot of understanding about how IPyKernel really worked in the process.
But I did spend quite a bit of time kind of pulling out separate, like, so the answer was in two hours codecs, I think it was part 2 at that time, or maybe 3 had just come out, couldn't do it. Then if I got the $200 a month GPT 5.3 Pro to fix the problems, it could. And so by rolling back
between those two pieces of software, those two models, I could get things working over a couple of weeks period. And like you say, it wasn't at all fun. It was very tiring and it felt stressful because I wasn't really in
all fun. It was very tiring and it felt stressful because I wasn't really in control. But the interesting thing is I now am in a situation where I have
control. But the interesting thing is I now am in a situation where I have the only implementation of a Python Jupyter kernel that actually works correctly, as far as I can tell, with these new version 7 protocol improvements. And now
I'm like, well, this is fascinating because we don't have a kind of a software engineering theory of what to do now. Like, here's a piece of code that no one understands. Yeah. Am I going to bet my company's product on
one understands. Yeah. Am I going to bet my company's product on it? And I, the answer is I don't know because like, I don't, I don't,
it? And I, the answer is I don't know because like, I don't, I don't, like, I don't know what to do now because no one's like being in this situation and like, will it, does it have memory leaks? Will it still work in a year's time if there's some minor change to the protocol? Yeah. Is there some
weird edge case that's going to destroy everything? No one knows because no one understands this code. It's a really curious situation. I mean, first of all, we should acknowledge the pernicious erosion of control. So at the very beginning, you have 10% AI generated code, and then you can just see how it creeps up and up.
And then at some point, six months down the line, the PR comes in and now have you know, 60% of the code is AI generated. And do you see what you see what happens? You slowly become disconnected. But the bull case for this is, you know, in AI, there's this idea called functionalism that, you know, we don't care what the intelligent thing is made out of as long as it does all
of the right things, then we know, you know, we would say it's AI. And
it's the same thing with software. So the bull case is, I understand the domain.
I don't need to know how to write the Quicksort algorithm. I just need to understand it. So I just need to have all of these tests and it needs
understand it. So I just need to have all of these tests and it needs to go into deployment and these things need to happen. And at that point, you know what, I don't actually care. And to be clear, I quite like that framing. But what that actually does is it says, well, software engineering sure is important
framing. But what that actually does is it says, well, software engineering sure is important then. Because software engineering is all about finding what those pieces are and how they
then. Because software engineering is all about finding what those pieces are and how they should behave and then how you can put them together to create a bigger piece and then how you can put them together to create a bigger piece. And if
we do that well, then in 10 years' time, we could have software that is far more capable than anything we could even imagine today.
But you're only going to get that with really great software engineering. Yeah, you want to be careful. I think In the end, like IPyKernel, I'm finding, for example, it's just too big a piece, right? Because in the end, the team that made the original IPyKernel were not able to create a set of tests
that correctly exercised it, and therefore real-world downstream projects, including the original NB Classic, you know, which is what IPyKernel was extracted from, didn't work anymore. So this is kind of where our focus is now on the development
work anymore. So this is kind of where our focus is now on the development side at Answer.ai, is finding the right sized pieces and making sure they're the right pieces. Knowing how to recognize what those pieces are and how to design them and how to put them
together is actually something that normally requires some decades of experience before you're really good at it. Certainly it's true for me. I reckon
I got pretty good at it after maybe 20 years of experience. Yeah, it's a big question. It's like, how do you build these software engineering chops, which are now
big question. It's like, how do you build these software engineering chops, which are now even more important than they've ever been before? They're the difference between somebody who's good at writing computer software and somebody who's not. That feels like a challenging question.
I know. And there's also this notion that there are so many different ways to abstract and represent something. You know, the world is a very complex place. And maybe
the way we've been abstracting and representing software is mostly a reflection of our own cognitive limitations, right? And even in the sciences and in physics, you tend to have a lot of quite reductive methods of modeling the world. And then you've got complexity science, which is just embracing the constructive, dissipative, you know, gnarly nature of things.
And I think a lot of software today, we don't understand, right? So for example, there are many globally distributed software applications that use the actor pattern. And this is just this, it's basically like a complex system, right? And the only way we can understand it is by doing simulations and tests because no one actually knows how all of these things fit together. So you could argue, I guess, as a bull case
that maybe we already are doing this at the top of software engineering and that is what we want to do eventually anyway. Yeah, I'd say probably not. You see
companies like Instagram and WhatsApp dominate their sectors whilst having 10 staff and beating companies like Google and Microsoft in the process. I would
argue this way of building software in very large companies is actually failing. And I think we're seeing a lot of these very large companies becoming increasingly
failing. And I think we're seeing a lot of these very large companies becoming increasingly desperate. And, you know, for example, the quality of
desperate. And, you know, for example, the quality of Microsoft Windows and macOS has very obviously deteriorated greatly in the last five to ten years. You know, back when Dave Cutler was looking at every line of the NT kernel and making sure it
was beautiful, it was an elegant and marvelous piece of software, you know, and I don't think there's anybody in the world who's going to say that Windows 11 is an elegant and marvelous piece of software. So I actually think we do need to find these smaller components that we do fully understand and that we need to build them up. And here's the problem. AI is no good
at that. And so I'll say that empirically. They're really bad at software engineering.
at that. And so I'll say that empirically. They're really bad at software engineering.
And then I think that's possibly always going to be true because, you know, we're asking them to often move outside of their training data. You know, if we're trying to build something that literally hasn't been built before and do it in a better way than has been done before, we're saying, like, don't just copy what was in the training data. So,
and again, this is a confusing point for a lot of people because they see AI being very good at coding, and then you think, like, oh, that's software engineering.
You know, it's like, oh, it must be good at software engineering. But it's, they're different tasks, there's not a huge amount of overlap between them and there's no current empirical data to suggest that LLMs are gaining any competency at software engineering. Every time you look at a piece of software engineering they've
done, like the browser, for example, which Cursor created or the C compiler, which Anthropic created, like I've read the source code of those things quite a bit. Um, Chris Lattner's much more familiar with the compiler example than me, but they're very, very obvious copies of things that already
exist. So that's the challenge, you know, is if you want to build
exist. So that's the challenge, you know, is if you want to build something that's not just a copy, then you can't outsource that to an LLM.
There's no theoretical reason to believe that you'll ever be able to, and there's no empirical data to suggest that you'll ever be able to. Yes, I think the punchline of this conversation is, and I'm sure you would agree with this, that we need to have the combination of AI and humans working together, right? Because the humans provide the understanding and all of the stuff we were saying about knowledge, but we can
still use AIs as a tool. But we need to design operating models or ways of working that make that, you know, we say we don't want to diminish our competence and understanding. So it's a very fine line. That's been our focus. And we
both focus on that for teaching and for our own internal development. The stuff I've been working on for 20 years has turned out to be the thing that makes this all work. Stephen Wolfram should get credit for this. He was the guy that created the notebook interface. Although also lots of ideas kind of go back to Smalltalk
and Lisp and APL. But basically the idea that A human can do a lot more with a computer when the human can manipulate the objects inside that computer in real time, study them and move them around and combine them together. That's what small talk was all about with objects, and APL was
the same with arrays. Mathematica basically is a superpowered LISP, which then also added on this very elegant notebook interface that allowed you to construct a living document out of all this. So I built this thing called NB Dev a few years ago, which
all this. So I built this thing called NB Dev a few years ago, which is a way of creating production software inside these notebook interfaces, inside these rich dynamic environments. And I found that made me dramatically more productive as a programmer. And like today, even though I've never been a
full-time programmer as my job, when you look at my kind of GitHub repo output, I think GitHub produced some statistics about it. And I was like, about the most productive programmer in Australia. You know, like it's working. And a lot of the stuff I build has lots and lots of people use it because it's such a rich,
powerful way to build things. And so it turns out, we've now discovered that if you put AI in the same environment with the human, again in a rich, interactive environment, AI is much better as well, which perhaps isn't shocking to hear, but the normal, like if you use Claude Code, which I know you do,
and it's a very good piece of software, but the environment we give Claude Code is very similar to the environment that people had 40 years ago. You know,
it's a line-based terminal interface. You know, it can use MCP or whatever. Most of the times it just nowadays uses Bash tools, which again, very powerful.
whatever. Most of the times it just nowadays uses Bash tools, which again, very powerful.
I love Bash tools. I use them all the, you know, CLI tools all the time, but it's still just, it's using text files, you know, as it's, as it's interfaced to the world. It's, it's, it's really meager. So,
so we put the human and the AI inside a Python interpreter.
And now suddenly you've got the full power of a very elegant, expressive programming language.
that the human can use to talk to the AI, the AI can talk to the computer, the human can talk to the computer, the computer can talk to the AI. Like, you have this really rich thing, and then we let the human and
AI. Like, you have this really rich thing, and then we let the human and the AI in real time build tools that each other can use.
That's what it's about to me, right? It's about, like, creating an environment where humans can grow and engage and share things It's like for me, when I use Solv-it, it's the opposite of that experience you described with Claude Code. After a couple of hours, I feel energized and
happy and fulfilled. I'll give you my take. I think that the thing that you're pointing to here is there's something magic about having an interactive, stateful environment that gives you feedback. Mm-hmm. And that is because our brains can do a certain unit of work. So we actually think through refining and testing
with reality. And that's why, I mean, during my PhD, I use Mathematica and MATLAB.
with reality. And that's why, I mean, during my PhD, I use Mathematica and MATLAB.
And I agree. So we've got this REPL environment. And here's the matrix. Let's do
an image plot. Do a change. This is what it looks like now. And it's
actually a wonderful way to just refine my mental model about something.
But Claude Code does a lot of this stuff. I think it's mostly a skill issue. I think the people that use Claude Code effectively do this. I've written a
issue. I think the people that use Claude Code effectively do this. I've written a content management system. It's possible. It is possible. It's possible, yeah. So, you know, I've written a content management system called Rescript, and when I'm putting together a documentary video, it can go, it can pull transcripts, and then I can verify the claims. And, you know, part of AI literacy is just understanding the asymmetry of language models, right?
So when you give them a sort of discriminative task, they're actually quite good. So
if I tell it in a subagent to go and verify every individual claim, it's much more accurate than if I was in generation mode and I was generating a bunch of claims and the stateful feedback thing again you know i can have some kind of like schematized xml dump and i can have like an application here on the side which is visualizing and it's like a feedback loop and for me this
is an ai literacy thing like the the good people at ai are already doing this yeah so i don't fully agree with you i agree you can do it in cloud code and i agree it is a ai literacy thing as to whether you can but also cloud code was not designed to do this.
It's not very good at it, and it doesn't make it the natural way of working with it. I don't want to say it's an AI literacy problem, because that's like saying, like, oh, it's a you problem. To me, if a tool is not making it the natural way for a human to become more knowledgeable, more happy, more connected, with a deeper
understanding and a deeper connection to what they're working on, that's a tool problem.
That should be how tools are designed to work. So many models and tools expressly are being evaluated on can I give it a complete piece of work and have it go away and do the whole thing, which feels like a huge mistake to me versus have you evaluated whether a human comes out the other end with
a deep understanding of a topic, you know, so that they can really easily build things in the future. I agree with all of that. But then there's the other interesting angle, which is that there was a famous talk by Joel Groose, and we'll talk about this. And he said that notebooks are terrible. They're really bad from a software engineering point of view. And at the time, and maybe still now to a
certain extent, I agree with him because, you know, I've done ML DevOps. I've
worked in large organizations, you know, like trying to figure out how do we bridge like data science and software engineering. And I flawed code is already more towards the software engineering side. And what that means is it creates idempotent, stateless, repeatable artifacts, right? So as you say, from a pedagogical point of view, it's really good
artifacts, right? So as you say, from a pedagogical point of view, it's really good having this stateful feedback because I can understand what's going on. But then I need to translate that into something which is deployable. And can you tell us the story of, you responded to Joel Groose, didn't you? And it was a bit of a fiasco, wasn't it? But just tell us about that story. He did a really good
video. called I Don't Like Notebooks. It was
video. called I Don't Like Notebooks. It was
hilarious. It was really well done. And yeah, it was totally wrong.
And all the things he said notebooks can't do, they can. And all the things he said you can't do with notebooks, I do with notebooks all the time. So
it was a very good, very amusing, incorrect talk. So then I did a kind of a parody of it called I Like Notebooks. in which I basically copied with credit most of his slides and showed how every one of them was totally incorrect. But I actually think your comment about it
incorrect. But I actually think your comment about it does come down to the heart of it, which is this difference between how software engineering is normally done versus how scientific research and similar things is normally done. And I think, and I agree, there is
a dichotomy there. And I think that dichotomy is a real shame because I think software development is being done wrong. It's being done in this way, which is, yeah, all about reproducibility and these like dead, these dead pieces, you know, it's all dead code, dead files. I will never be able to express
this one millionth as clearly as Brett Victor has in his work. So I would encourage people who haven't, watch Brett Victor to watch him, but, you know, he shows again and again how a direct connection, you know, a direct visual connection with the thing you're doing is all that matters, you know,
and that's his mission is to make sure people have that connection. And that's basically my mission as well. So for me, traditional software engineering is as far from that as it is possible to get. I think it's gross. I
find it disgusting. And I find it sad that people are being forced to work like that. I think it's inhumane. And I just don't think it works very well.
like that. I think it's inhumane. And I just don't think it works very well.
I mean, empirically, it doesn't work very well. And it's much less good for AI as well as it's much less good for humans. It hasn't always been that way.
Like, you know, with Alan Kay and Smalltalk and... Iverson
and APL, you know, Lisp, Wolfram with Mathematica.
To me, these were the golden days when people were focused on the question of how do we get the human into the computer to work as closely with it as possible. You know, that's where the mouse came from, for example. I'd be able to, like, click and drag and
visualize entities in your computer as things you can move around. So I feel like we've lost that. I think it's really sad. Yeah, with Cloud Code and stuff, the default way of working with them is to go super deep into it.
It's like, okay, there's a whole folder full of files. You never even look at them. Your entire interaction with it is through a prompt. Yeah.
them. Your entire interaction with it is through a prompt. Yeah.
It literally disgusts me. Like, I literally think it's inhumane and my mission remains the same as it has been for like 20 years, which is to stop people working like this. I know, but so casting my mind back, I used to work with data scientists. They were using Jupyter Notebooks. And what I found was typically, I mean, back then, if you check them into Git, it wouldn't
look very good. Most of these data scientists didn't know how to use Git. They
would run the cells out of order, which means it wouldn't be reproducible. There are
all sorts of things like that. The thing is, I agree with you that you can use them in this workflow. But it comes back to what I was saying before about, you know, we were talking about the call center and it being like a low intelligence job. You know, the data scientists, the reason why they are doing intelligent work is they are actually actually creating something that doesn't exist. They are figuring
out the contours of a problem. They're actually working in a domain that is poorly understood. But you could argue now the bull case is when the data scientists can
understood. But you could argue now the bull case is when the data scientists can succinctly describe the contours of the problem, maybe we could go to Claude Code and we could implement it properly. But how do we bridge between those two worlds? I
think that'd be a terrible, terrible idea. You don't want to remove people from their exploratory environment, you know.
Research and science is developed by people building insight, you know. Whoever you listen to, you know, whether it be Feynman or whatever, like, you always hear from the great scientists how they build deeper intuition by building mental models, which they get over time by...
interacting with the things that they're learning about. Now, like in Feynman's case, because it was theoretical physics, he couldn't actually pick up a spinning quark, but he did literally study spinning plates, you know? You've got to find ways to deeply interact with what you're working with. Like, so many times I've seen data
science teams, because you're right, data science teams aren't very familiar with Git and aren't very familiar with things that they do need to understand. And so often I've seen a software engineer will become their manager and their fix to this will be to tell them all to stop using Jupyter Notebooks. And now they have to use
all these reproducible blah, blah, virtual, you know, virtual end, blah, blah. They
destroy these teams over and over again. I've seen this keep happening because the solution is, It's not create more discipline and bureaucracy. It's solve the actual problem. So, for example, we
bureaucracy. It's solve the actual problem. So, for example, we built a thing called an NB merge driver, which, so a lot of people don't realize this, but actually notebooks are extremely Git-friendly. It's just that Git doesn't ship with a merge driver for them. So Git only ships with a
merge driver for line-based text files, but it's fully pluggable. And so you can easily plug in one for JSON files instead. And so we wrote one. So now when you get a get diff with our merge driver, you see cell-level diffs. If you get a merge conflict, you get cell-level merge conflicts.
cell-level diffs. If you get a merge conflict, you get cell-level merge conflicts.
The notebook is always openable in Jupyter. NBdime did the same thing. So
it's two independent implementations of this. So yeah, there were problems to solve.
you know, but the solution to it was not throw away Brett Victor's ideas and make people further away from, from their exploratory tools, but to fix the exploratory tools. And I think all software developers should be using exploratory-based programming to deepen their understanding of what they're working with,
so that they end up with a really strong mental model, of the system that they're building and they're working with, and then they can come up with better solutions, more incrementally, better tested. I basically never have to use a debugger because I basically never have bugs. And it's not because I'm a particularly good programmer. It's because I build things up, small little steps, and each step works and I can see it
working and I can interact with it. So there's no room for bugs, you know?
You know, I'm so torn on this because I agree with you. And I'm also skeptical of people who say that organizations, they converge onto ways of doing things and they no longer need to evolve. They no longer need to adapt.
Innovation is adaptivity, right? And we should increase the surface area of adaptivity as much as we possibly can. So we need people that are constantly testing new ideas, finding these constraints. But by the same token, we need to use the cloud. We need to use CICD. We need to get this stuff into production. Yeah.
cloud. We need to use CICD. We need to get this stuff into production. Yeah.
So do you do, but like there's absolutely no, like, so... NB dev ships with out-of-the-box CI integration, and the tests are literally there. Because the source is a notebook, the entire exploration of how does this API work? What does it look like when you call it?
The implementation of the functions, the examples of them, the documentation of them, the tests of them are in one place. So it's much easier to be a software engineer in this environment um so yeah like do do both you know so do you remember there was there was that existential risk should be an urgent priority and it was signed by folks like hinton and
and you responded um basically with a rebuttal and that was with um aravind you know the the snake oil guy tell me about that do you think we should be worried about ai existential risk i mean that was a certain time wasn't it and I feel like things have changed a bit thank God I feel like we we not just me and Aravind but um broadly speaking the
community of which we're a part kind of probably won that now we have other problems to worry about but you know basically at that point um the prevailing narrative was AI is about to become autonomous it could become autonomous at any moment and could destroy the world So it very much comes
from, you know, Alicia Yukowski's work, which I think clearly has been shown to be wrong at many levels at this point.
They would refute that, obviously. Of course they would. Yeah. It's one of those things that they can always refute, just like any doomsday cult, unless you give it a date and the date passes. I've updated a little bit in the sense that I would now say that these models can be said to be intelligent in restricted domains. The ARC challenge showed that. So if you place constraints into the problem, you
domains. The ARC challenge showed that. So if you place constraints into the problem, you can go faster towards a known goal. Even agency, you can put a planner on that, and if you know where you're going, you can go there faster, but that doesn't help you. You can have all the intelligence and agency in the world, but if you don't have the knowledge and the constraints, then you're going in the wrong
direction faster. And I think they don't seem to appreciate that these models actually know
direction faster. And I think they don't seem to appreciate that these models actually know the world. Like, none of that was even relevant to Arvind and my point, which
the world. Like, none of that was even relevant to Arvind and my point, which was and is that it's misunderstanding where the actual danger is, which is that when you have a dramatically more powerful technology entering the world that can
make some people dramatically more powerful, people who are in love with power, will seek to monopolize that technology.
And the more powerful it is, the more strong that urge from those power-hungry people will be. So to ignore people—so here's the problem. If you're like, I don't care about any of that, all I care about is autonomous AI taking off, you know, singularity, paperclip, nano goo, whatever. The obvious solution to that is,
oh, let's centralize power, which is what we kept seeing, particularly at that time. Let's
give either very rich technology companies or the government or both all of this power and make sure nobody else has it. In my
threat model, that's the worst possible thing you can do because you've centralized ability to control in one place and therefore these people who are desperate for power just have to take over that thing. Could we distinguish though what you mean by power? Because
we've just spent some of this conversation talking about how it's not actually as powerful as people think it is. But I'm not even, but mine is an even if thing, right? So like I just say, even if it turns out to be
thing, right? So like I just say, even if it turns out to be incredibly powerful, right? Like I don't even want to argue about whether it's going to be powerful because that's speculative. Even if it's going to be incredibly powerful, you still shouldn't centralize all of that power in the hands of one company
or the government. Because if you do, all of that power is going to be monopolized by power-hungry people and used to destroy civilization, basically. You'll end up with a case where of that wealth and power will be centralized with the kinds of people who want it
centralized. So, like, society for hundreds of
centralized. So, like, society for hundreds of years have faced this again and again and again, you know. So when it's like, you know, writing used to be something that only the most exclusive people had access to knowing about writing, and The
same arguments were made. If you let everybody write, they're going to use it to write things that we don't want them to write, and it's going to be really bad, you know? Ditto with printing. Ditto with the vote.
And again and again, society has to fight against this natural predilection of the people that have the status quo power to be like, no, this is a threat. So...
when we're saying like, okay, what if AI turned out to be incredibly powerful?
Would it be better for society to be that to be kept in the hands of a few or spread out across society? My
argument was the latter. Now, there's also an argument which is like, eh, don't worry about it. It's not going to be that powerful anyway. I just didn't want to
about it. It's not going to be that powerful anyway. I just didn't want to go there because it's not... an argument
that's easy to win because you can't really say what's going to happen. We're all
just guessing. But I can very clearly say, well, if it happens, would it be a really good idea to only let Elon Musk have it? Or would it be a good idea to only let Donald Trump have it? Dan Hendricks spoke about this offense-defense asymmetry. So it's actually very important for us to have countervailing
offense-defense asymmetry. So it's actually very important for us to have countervailing defenses. But let's just take that as a given for a minute. Because obviously when
defenses. But let's just take that as a given for a minute. Because obviously when we look at something like Meta and Facebook, it's quite clear what the power imbalance is. They control all of our data. They know what we're doing. With something like
is. They control all of our data. They know what we're doing. With something like OpenAI and Clawed, so it's not as good as we thought it was because actually humans still need to be involved. But for example, they have all of our data, right? And you might be working on some new innovative technology and you're using Clawed
right? And you might be working on some new innovative technology and you're using Clawed and you're sending all of your information up there and they can now copy you.
I mean, what kind of risks are you talking about to be more concrete? Yeah,
no, I mean, so I was not talking about any of those things, right? So
at the time I was talking about this speculative question of what if AI gets incredibly powerful? I mean, like now, for example, they say that this is the new
incredibly powerful? I mean, like now, for example, they say that this is the new means of production. And that seems completely hyperbolic to me. But like in your best estimation now, if there are risks, what are they? If there are risks with the current state of technology, I mean, I think some of them are the ones we've discussed, which is people... enfeebling themselves
by basically losing their ability to become more competent over time.
That's the big risk I worry about the most. The privacy risk, it's there, but I'm not sure it's much more there than it was for Google and Microsoft before. You know, you used to work at Microsoft, you know how much data they have about average Outlook, Office, etc. user.
Ditto for Google, you know, the average Google Workspace or Gmail user.
Those privacy issues are real, although I think there are bigger privacy issues around these companies which the government can outsource data collection to. So back in the day, it used to be companies like ChoicePoint and Axiom. Nowadays, it's probably more companies like Palantir.
The US government is actually prohibited from building large databases about US citizens, for example. But it's not prohibited. Companies are not prohibited from doing so, and the government's not prohibited from contracting things to those companies. So,
I mean, that's a huge worry, but I don't think it's one that AI is uniquely creating. It's certainly, it's like you're in the UK. As you know, in the
uniquely creating. It's certainly, it's like you're in the UK. As you know, in the UK, surveillance has been universal, unfortunately. quite a while now. It certainly
makes it easier to use that surveillance, but a sufficiently well-resourced organization could just throw a thousand bodies at the problem. So yeah, I'm not sure these are new privacy problems as maybe more common ones than they used to be. Yeah. Jeremy, I've just noticed the time. I
need to get to the airport. All right. This has been amazing. Thank you, sir.
Thank you for coming. Thank you so much. Yeah. Hope you had a nice trip.
Thank you so much.
Loading video analysis...