Podcast: Peter Scott:
By professorsethi
Summary
Topics Covered
- Computer Science Equals No More Programming
- Metacognition Monitors Rereading Failures
- System Two Triggers Fix System One Errors
- LLM Entropy Mimics EEG Confusion Signals
- Dialectical LLM Clusters Detect Deception
Full Transcript
Ricky say it's wonderful to welcome you to AI and you. We are going to talk about some fascinating subjects uh disinformation and especially artificial
metacognition.
uh that it's such a loaded term. So much
to explain there and but maybe we could start out with you talking about just how you got into these fields. There's
not a whole lot of um job openings last time I checked that said artificial metacognition researcher. What was the
metacognition researcher. What was the path that you followed that took you to where you are now?
>> Oh, perfect. Well, thank you for having me first. It's a pleasure to be here.
me first. It's a pleasure to be here.
Um, and I think the path that took me to this was I teach a course at um, Fitchburg, which is our FYE course. It's
a first year experience for computer science students. And so, one of the
science students. And so, one of the things I start to do in there is to give them an idea of what computer science is. They usually come in equating it
is. They usually come in equating it with the idea of programming. And of
course as Dystra and I think before that purpose is the person that originally made this uh or had this quote he said that computer science is no more about programming than astronomy is about
telescopes. It's a tool that we use. So
telescopes. It's a tool that we use. So
to give them a sense of the mathematical nature of computer science I sort of delved into these mathematical ideas and wrapping your brain around it is a little tough. One of my part of my
little tough. One of my part of my background is when I was in undergrad, I was a one of my majors was neurobiology.
So I knew that thinking is hard and one of the things we emphasize in FYE are things like habits of mind and in it there's one of the components is
metacognition and metacognition is this idea that's often um related as being said thinking about thinking but and it you know it involves some degree of
self-reflection obviously but it also reflects some degree of control and so uh you know for example if you're sitting there and you find yourself rereading some sentence that you're
studying five times and you you realize you just kind of blanked out on it.
Becoming aware, self-aware that, hey, I blanked out. I'm rereading this
blanked out. I'm rereading this sentence. That's the monitoring aspect,
sentence. That's the monitoring aspect, the self-awareness aspect of metacognition in humans and other non-human animals as well. And then
there's a control aspect where you say, "Okay, I'm going to change something that I do in my behavior." And this is what we wanted to instill in our FYE students. they come in, they're coming
students. they come in, they're coming into university for the first time and thinking is a tough thing to do. So I
show them this video also. It's a
Veritassium video called um thinking is hard and it talks about system one and system two the sort of the dual process framework that we now subscribe to in in
neurobiology. And one of my students in
neurobiology. And one of my students in that FYE class, Josh Yakobani, he took another class with me a couple of years later and he was in the honors program and he said, "Hey, look, Professor Sati,
I'm want to do a project with LLMs. This is in like 2024 and I was wondering if you wanted to be my adviser for it." And
I said, "Sure." And so I came in as an adviser. I mean, I knew about large
adviser. I mean, I knew about large language models, but I wasn't part of my research thrust at the time. And then as he was sort of presenting as we continued in the project he talked about
different ideas that were um being explored in large language models as they were coming to the forefront. One
of the things he happened to mention is that people are exploring metacognition in large language models. And I kind of perked up and I said what the heck is this? Tell me kind of more about it. And
this? Tell me kind of more about it. And
he he told me what he knew but it was limited. So I thought okay I got to look
limited. So I thought okay I got to look into it. And as I looked into it, it
into it. And as I looked into it, it sort of clicked with my background. Uh
my background in neuroscience, my background with argumentation frameworks um that I'd done when I was when my in as part of my research in facteing misinformation and I'd also done an MBA.
I so you clicked with some of these ideas of organizational behavior as well. And so all of these things sort of
well. And so all of these things sort of came together and I started to explore the idea of how machines large language models in particular might exhibit or
utilize u metacognition and that's the field of artificial metacognition.
>> Wow. And I wonder whether scratch that.
This sounds to me like we're talking about what many people would think of as a a kind of shibilith uh that differentiates
humans and machines. I was irresistibly reminded of the one of the initial scenes of Dune movie in the book where
the reverend mother uh has Paul Atrades put his hand in the box that imposes incredible levels of pain.
And the test is to see whether he can override the impulses to withdraw his
hand. And uh she says this is to see
hand. And uh she says this is to see whether you are human. And that
uh description of metacognition thinking about thinking makes me wonder whether the whether you can distinguish between
humans and AI whether AIs are capable of thinking about thinking. Do you approach this from a computer science angle or a neuroscience angle or a psychology angle
or any or all of the above or something else?
>> That's a great question and I think like one of the things I definitely want to clarify is we come at it from this multidisciplinary approach. Um I have a
multidisciplinary approach. Um I have a very multi-disiplinary background. You
know it's in neurobiology, it's in physics. I did my mers in physics. Then
physics. I did my mers in physics. Then
I did my PhD in artificial intelligence.
um I was also an English major so I have like these sort of very disperate um approaches and it turns out when I looked at this problem it was hard not
to look at it holistically. It's hard if you're trying to actually look at the problem as a whole to look at it from just one narrow slice of an artificial
human delineation of a field. So say I'm just going to look at this mathematically from computer science.
That's an artificial delineation that nature doesn't um have. we impose it because we have a hard time getting these larger perspectives. But it turns
out that by looking at it from this holistic um multidisciplinary angle, not only can we maybe address some of these emergent properties better, but by
approaching it also rigorously, the framework that I developed um in about a year ago now, it turns out it was more comprehensive than I realized at the
time. any new research that we'd kind of
time. any new research that we'd kind of find in the subsequent year up to even today, it turns out it kind of fits or gets subsumed into this framework that
we have because we approach it rigorously from these multiple disciplines. But but in in in no way,
disciplines. But but in in in no way, you know, would we say that, you know, when you say AI, just to sort of distinguish that a little bit, it's the
current geni systems from maybe more um AGI systems down the line. So right now we're dealing with these genai systems and for them what we're giving is a
control structure. So we're not making
control structure. So we're not making claims about consciousness and these emergent properties right now. At best
what we'd be doing is providing we hope the first step in that ladder that will take us to that point. But this first step is very narrow and precise in its
claims. um we're talking about geni systems and how they can have a both a self assessment and a control structure or mechanism at this point. Is
metacognition something that we can say every human has? And um I I I was reminded of of uh Julian James the origin of consciousness and the
breakdown of the bicamal mind. And he
goes in history of consciousness back to the Iliad and the Odyssey and uh asserts some kind of change in the structure of our mind which is a sort of
metacognitive uh shift some thousands of years ago.
And uh that makes me think two things.
One, when did metacognition start? The
ability to know that you're thinking about your thoughts that you have a theory of mind for yourself. And two,
uh, are there people who that hasn't occurred to that meditation or anything like that is a foreign concept? Uh, and
that they are on a different side of some line that we might draw than than others.
Yeah, and that that's another great question. So, in this case, the
question. So, in this case, the metacognition we're talking about in that self-awareness isn't self-awareness in necessarily a spiritual sense. It's
self-awareness in a very physical sense.
And so, human animals have that. We've
categorized it. We've quantified it.
Non-human animals also have it. And I
think there might be a reference to like the triune theory like we've got, you can think of us as having three brains.
a reptilian brain, a mamalian brain, and then a primate brain, the neoortex. And
so you might question where in these sort of three levels does the idea of metacognition arise. And so definitely
metacognition arise. And so definitely in the mamalian brain, you see that for sure. Um, and so in non-human animals,
sure. Um, and so in non-human animals, so we happen to be, our species happens to be a vocal learner. And there are other species, non-human species that are also vocal learners. And vocal
learners have this abstraction in the in the neoortex or or the equivalent in these other species where the neurons permeate motor neurons that generate
sounds. And so for us there seems to be
sounds. And so for us there seems to be a big connection between this abstract neural representation and the physical
sort of realization thereof. And so when we start to look at metacognition in humans, we can articulate what we're doing. But in non-human animals, you
doing. But in non-human animals, you have to go by these behavioral studies.
So for us, what we did is like some other research, we started to pull upon this idea of looking at large language models as nonhuman animals in a way. And
so we were look we're examining them also from a behavioral context, but also from a structural context. And it turns out other researchers like Acriman recently, he's done a great job
publishing in this in this area. He was
also inspired by how non-human animals exhibit metacognition and how it's quantified and um what kinds of experiment experiments are done with non-human animals. And he applied
non-human animals. And he applied similarly to large language models as well that idea but metacognition and the self-awareness of it. Um not necessarily
the one we'd associate with spirituality or even consciousness. Consciousness has
no definition. But to my mind there's no doubt that there must be some connection. What precisely that
connection. What precisely that connection is, I think is unknown at this point.
>> So I know about uh some self-awareness tests like uh the mirror test and I believe crows passed that. Can you tell that the thing you're seeing in the
mirror is you and not another animal remarkably similar?
>> I think we got disconnected there.
>> Oops. Not on my end. Sorry. Um, but that part was was recorded. All right. Let me
do it again so you hear it.
>> So I I I know about some of the self-awareness tests like the mirror test which supposedly crows pass. the uh
ability to know that what you see in a mirror is you and not another animal that looks like you
is well can you compare and contrast those tests with the ones for metacognition? Do they overlap
metacognition? Do they overlap completely or what? Yeah. And and and absolutely, you know, the self-awareness of your own being and physical representation is a component thereof.
But maybe a good example would be um you know, there's this there are these vignettes that people do when they're looking at these cognitive reflection tests, these CRTs, and there are these two or three vignettes that are
popularly used. And so I I'll do it. I
popularly used. And so I I'll do it. I
use it in my FOE class as well. So I
I'll do it here. Don't worry about how you actually do on it. But suppose I told you that a bat and a ball cost a
$110 and I told you that the bat cost $1 more than the ball.
>> How much does the ball cost?
>> Um, yeah. Sorry, I've heard this one before, so I know the answer is 5 cents.
>> Beautiful.
You can't you can't trick someone who's knowledgeable about it, but you're right. And so in this case, you relied
right. And so in this case, you relied on your previous experience that you had. You did this experiential matching
had. You did this experiential matching and you were like, "Oh, I know it." The
first system, so you know, this is where we distinguish between the dual process theory, system one and system two.
System two is what we think of as ourselves and system one is sort of this subconscious larger component that automates most of our behaviors and it's
it works great except when it doesn't.
And when it doesn't, it's wrong. And so
already when you heard what I was saying and you were like, "Ah, this triggers this matching that I had, what we call in our context experiential matching.
Hey, I've heard this problem before and I know how it goes." That kicks you out of system one mode and engages system two. If you do that enough, it becomes a
two. If you do that enough, it becomes a learned behavior. You never need to drop
learned behavior. You never need to drop out of system one behav uh system one mode at all.
>> Exactly. Yeah. Like learning a sport.
and and and and there's some degree of anchoring in in that question as well, right? Something that my uh daughters
right? Something that my uh daughters played on me when they were young was this uh uh trick where they would say,
"Say the word silk." Silk. Say the word silk. Say the word silk. Silk. Silk.
silk. Say the word silk. Silk. Silk.
What does a cow drink?
>> Yeah, exactly.
>> Right. Milk. Milk. No. Cow drinks water.
What are you talking about? Well, calves
drink milk, but to cows, cows drink water, right? So, that's that that is
water, right? So, that's that that is triggering a system one thinking um to our our detriment. And maybe we should just visit the system one system two
definition here for the benefit of people not familiar with carnament.
>> Oh, perfect. Yeah, you even made the correct reference. They're absolutely
correct reference. They're absolutely right. Yeah. And so as as he said
right. Yeah. And so as as he said there's this idea and he actually had a triphasic theory um later on that we go to but but in this case you think of your brain as having or your mind as
having these two parts a system one and a system two and a system two system one does the automatic things that you've learned. So when you're exactly like you
learned. So when you're exactly like you know some of the examples that you brought up are like when you're first learning how to tie your shoelaces right you have to really work at it. You might
even work out a song with it that you might have had with your daughters for example. And as you learn it, it's a
example. And as you learn it, it's a difficult task that you're not able to do automatically priori. So you go through it mechanistically over time.
You learn the complexity. And once
you've done it enough times, it gets transferred over to system one. And once
system one takes over, then when you're tying your shoe consciously, you don't even think about it. And it turns out system two is the smaller component in the mind. system one is the larger but
the mind. system one is the larger but we do something very similar in our framework we make these ensembles of large language models so we don't just
look at one um LLM by itself we imagine a bunch of LLMs and we divide them up we divide them up into similar categories like this system one and system two but
our division's a little more complex and it turns out that when you divide it up in this way um it also builds upon the way that currently like when you have like a large language model you know you
hear about they have billions and billions of parameters that they have to learn but they also make smaller versions of these which are a little less powerful and the way they do that is they use a teacher student
distillation model where in this case a larger teacher helps fine-tune or teach a student LLM and then you can release a model with just millions of parameters
rather than hundreds of billions of parameters and so We approach it in this way. That was our sort of initial
way. That was our sort of initial connection through line between what's happening on the computational side and what's happening on the neuroscience side of things. Neuroscience we have
system one and system two and you can look at it similarly. System one when it gets stuck when whatever input from the world triggers internally that hey
exactly what you said what I'm doing isn't working. When that gets triggered,
isn't working. When that gets triggered, the mind shifts to system two thinking.
And system two, which is kind of lazy, says, "All right, let me get on this. I
don't want to be engaged, but now that I'm engaged, let me figure out how to solve this problem correctly." And so then once it learns the correct way to
solve it, it transfers that learning to system one, the larger model. Same thing
we do on the LLM side in the teacher student distillation. In this case, we
student distillation. In this case, we have a small LLM with just a few millions of parameters and it wants to learn how to do maybe a specific task.
You want to make a model for some specific task like maybe um you know drafting scripts for movies. And so you train on this larger model with billions
and billions of parameters that solves all kinds of tasks and it helps fine-tune the smaller model to solve that problem better. and then you can
run it with fewer compute resources. So
it turns out the framework that we make is kind of a generalization of what people were doing there already and it's inspired very directly by the system one
system two paradigm. Well, let's
contrast the that metacognitive process in humans with the one in LLMs in particular at the moment because we can
undergo as humans a a very continuous uh rapid cycle of improvement. If you and I, for instance, start swapping those stories like the bat and the ball
and the milk and the silk, uh, we'd pretty soon realize the class of riddle that we're telling each other and and be
alerted to I'd better kick in system two next time I hear something rather than go with my reflex because there's a pattern here of the reflex being wrong.
That strikes me as the kind of thing that an LLM is not going to do even emergently.
H and that in particular it's not going to be able to channel that realization back into its training weights. They
just don't. It it can um it can be part of the context buffer. It can be even part of fine-tuning, but it's not going to override its basic instincts. humans,
we can do that in in moments. And and
does that would you agree does this lead to any ways of reliably testing that I'm talking to an LLM as opposed to a human being for instance?
>> The the imitation game. Yeah. So that
that of course, you know, is a different thing which which we've passed here before, but you're you're absolutely right. You're spot on with this. LLMs
right. You're spot on with this. LLMs
don't nowadays reflect methodically and rigorously. There are huristics and ad
rigorously. There are huristics and ad hoc approaches that people use all the time to do exactly that kind of thing.
Um, and there are aspects of the model itself that you can characterize as metacognitive. It's sort of the
metacognitive. It's sort of the metacognitive quality or quantity of a model. So some of the work that was done
model. So some of the work that was done and it happened to tie into our framework as well. It's one of these sort of beautiful instances when you come up with an idea and you think someone else has a different idea and it
turns out it's part of the same idea that you've sort of worked out. It's
it's very it's a very not new thing.
It's a very cool thing when you get that experience in life. But um there's this idea Maniscalo and others looked at human beings and started to quantify the amount of metacognition that a that a
human being has and there were these researchers in China Wang at all who then took that work and said can we quantify the overall characteristics of
metacog like how much metacognition a particular model might have. What we did in our larger framework is say exactly what you're saying. You know, right now these LLMs even with huristics and and
these ad hoc approaches, they're not doing it in a regimented rigorous manner. So like for humans when we
manner. So like for humans when we switch sometimes it's a conscious decision like you said, hey, I recognize this paradigm like you said and so I better be on alert. Sometimes that
happens when you when it matches your experience. Other times it might be I'm
experience. Other times it might be I'm not so confident about what I'm saying.
Maybe that comes in later with the bat and the ball. What you often see and in the Veritasium video he actually goes around asking people in Australia about this and they'll very confidently which
might remind you of LLMs and their ability to very confidently give an answer and say the bat costs a dollar, right? or you they'll very confidently
right? or you they'll very confidently give you that and then they'll get some maybe cues from the world around them or maybe just pause for a bit and some of
that confidence will leak out and when the confidence leaks out then they engage system two not as a conscious mechanism. This is what the mind does
mechanism. This is what the mind does automatically once it's aware of some disconnect from the world and its experience therein. So we give these
experience therein. So we give these kinds of sensors to large language models and we have like these five what we call five sensors. It's actually a metacognitive state vector and we
quantify it and which so these are these black boxes in which we can put whatever the state-of-the-art sort of approaches for these components. We base them
rigorously in both neuroscience and cognitive um cognitive science ideas.
And what we do is in these and we're we're still sort of plugging in the state-of-the-art and at a base level you can always ask for self-reporting which
isn't reliable but it's a good first approximation. So we have each of these
approximation. So we have each of these models in our ensemble of um large language models. Each of them evaluates
language models. Each of them evaluates for every query this fivedimensional vector. So it it has sort of this
vector. So it it has sort of this awareness and it uses that awareness whatever the numbers or quantification are of each of the dimensions of that
MSV of that metacognitive state vector it then figures out hey is is is one of these dimensions like if my confidence dimension is really low or if I'm
getting conflicting information and that's really high and it's above some threshold that I've either learned or that some expert has said is correct or
reasonable in these contexts. Do I need to trigger a deeper form of thinking?
And so they make this this decision is made across this ensemble of LLM in our framework. We have a control graph
framework. We have a control graph theory that helps do that as well.
>> Wow. Lot to talk about there. Is this
measurement of these uh five vectors? uh is that done based purely on the observables, the
conversation, the uh output of the LLM or does it involve some kind of um LLM equivalent of electronogram where you
are looking at uh signals inside.
>> Beautiful question. I like how you put that with the with the with the EEG, you know, cuz I was already when you started off to say, "Oh, yes, it's it's just what's there." Then when you said that
what's there." Then when you said that picking up something like EG that's exactly what we do. So internally when it has these tokens and it assigns probabilities to those tokens um let's
say there's a multiplechoice answer right what's the capital of India this is one of the examples that we use and there's new dilly there's Paris and there's um let's say Washington DC and
so it might have different levels of confidence that is associates with each of these options if the confidence in one of these dimensions is super high
then the entropy is very low And so what we're looking for is when it's confused and that's when the entropy will be high. So if all the answers it was like
high. So if all the answers it was like it could be New Delhi 33%, it could be Paris 33% and it could be Washington DC 33%. That's a terrible LLM. But suppose
33%. That's a terrible LLM. But suppose
it does that in this case the entropy is very high. This is equivalent to the
very high. This is equivalent to the EEG. We can then if a model allows us
EEG. We can then if a model allows us and open- source models allow us to do it. these some of these closed source
it. these some of these closed source ones don't but open source models let us look at what these logits or these internal probabilities that were assigned were and once we get those we
can calculate the entropy and then we can make a conclusion of what it must be thinking in fact in some of our experiments which build on what Acriman
did when he's trying to figure out um whether an LLM exhibits metacognition is in some of them in these open source models that allow you to get the logits
He looks at the entropy and says this must be an indicator of some underlying process that it's doing. What the
mechanism of that process is I don't know. So very equivalent. I love the EEG
know. So very equivalent. I love the EEG example because in an EEG there's so many electrical signals and even in in a in in the magnetic version right when you get this uh the the the magnetic
stimulation that can be a little more targeted. But the EEG signal in the
targeted. But the EEG signal in the brain discerning and deciphering which parts correspond to which neural substrates is a very tough problem and but it it's indicative of something
there and similarly here >> you mentioned self-reporting is that the like chain of thought when you can see what the LLM is quote saying to itself
in the process of answering a question >> that that's that's exactly a part of it and it's also the most unreliable one.
So lots of studies which we cite show that this is a terrible way to do it but as a first approximation it's a reasonable entree into the problem domain and seeing what you can do but
you're right in this case you ask the LLM hey how confident so we have these five dimensions might be confidence the correctness valuation we're like how confident are you that the capital of
India is New Delhi and it says um 75%.
And then you say okay do you have any conflicting information and said yeah I also saw that it could be Paris and you're like well how how conflicting is that? Oh about 20%. And so we ask it in
that? Oh about 20%. And so we ask it in this way. Does it match you know some
this way. Does it match you know some experience you've had before? We go
along these five dimensions and we have it self-report values. Right now the beautiful thing about our framework is that's not what we're relying upon.
We're going we're plugging in state-of-the-art systems and this is what we're doing for the paper that we're submitting to Europe where we're doing this um in-depth quantitative experimentation now and we're plugging
in state-of-the-art approaches that other people are using to look at one aspect of what's in our framework. So
some researchers might have been looking at a lot of people actually look at confidence which equates to our correctness evaluation and so as they look at it they come up with different techniques to measure confidence and
correctness and the entropy is one of the easy ones but there are other approaches we can plug them all in and see hey which ones of these are performing the best and we've actually got three subdimensions. So when I said
five dimensions each of those five dimensions have three subdimensions that we look at as well. So is it true to say that in order to do this research, you
have to be working with models that are willing to expose to you their levels of confidence or entropy?
>> That's a great point. And so what we do is we accommodate whether they do or not. So if they don't then we fall back
not. So if they don't then we fall back on these less reliable but the only available approaches that that we have.
So for example, if a model exactly as you say exposes its logits to us, beautiful. we can get a sense of some
beautiful. we can get a sense of some underlying um aspect of it sort of like the EEG. If it doesn't, that's okay. We
the EEG. If it doesn't, that's okay. We
can go with the less reliable behavioral approaches as well to get a sense of what it is. Will it be as precise or as correct as the one that gives us full
insight? No, not at all. But it's a
insight? No, not at all. But it's a first approximation even for closed models. Sort of like with non-human
models. Sort of like with non-human animals. So when we do studies with
animals. So when we do studies with non-human animals, there can't be any verbal um you know sort of exchange that they can do. Depending on the species here, there might be some symbolic
communication, but in a lot of these you can't have that exchange. So you rely on these behavioral models to assess metacognition in these non-human
animals. And is that super precise as in
animals. And is that super precise as in humans? No. Even in humans, you know, I
humans? No. Even in humans, you know, I I would argue that we don't have like, you know, we don't have a handle on everything either here.
something you've mentioned several times there, logic. What's that?
there, logic. What's that?
>> Oh, so this is sort of like the likelihood of something happening. So
it's sort of like a probability. It's
something that we transform into a probability. And once we have it into a
probability. And once we have it into a probability, then we can transform that into an into an entropy determination across a bunch of them. So it's just a
precursor to a probability.
Is it possible to derive measurements like this that are same dimensions for human beings?
>> Yeah. So in fact our dimensions are built upon uh cognitive science and neurobiology research. So the dimensions
neurobiology research. So the dimensions that we sort of impose or find computational coralates for um in these large language modeled instances come
from rigorous cognitive science and neurobiology work. They don't
neurobiology work. They don't necessarily always list them as these five particular dimensions but any departure that we do or any sort of um
combination that we do is principled and grounded in the literature as far as we can see. So we ground it in this
can see. So we ground it in this neurobiology and cognitive um science literature and the five dimensions that we have like the confidence, the conflicting information, the
experiential matching, the emotional response that's in there and the prioritization. These are things that we
prioritization. These are things that we see in human and non-human animals as well in the biological research.
So now the $64 million question. Can I
see somewhere in your research a five-dimensional chart with hopefully a cluster for humans, a cluster for LLMs,
a cluster for dolphins or uh um maybe doesn't work for non-verbals, but >> is that available?
>> So close. So our research is specifically in um the context of LLMs, but we've got our demo. We've got all our source code and it does exactly what you say. You can have radar charts that
you say. You can have radar charts that show the breakdown. We have bar plots. I
can you know fire up the demo and show you, but it's in our papers as well. So
both in our scum paper in our the webc paper and we just submitted a p paper to um kai and we've got the whole demo site with that set up and so you can play
with the code itself and it'll generate in fact one of the um one of the contributions of our work is that when you look at large language models one of the big complaints is there are these
opaque boxes you can't see the reasoning that's going on inside them. One of the things that we do is we help visualize that. So we also contribute to
that. So we also contribute to explainable AI and we visualize it with both these graphical representations but also these numeric details of what's
going on along these dimensions that we have. So we have a breakdown if you want
have. So we have a breakdown if you want just the math of it the the numbers that are associated with it. I should say all of our research is only LLM though. So
we don't have any you know application of far fivedimensional framework to dolphins that's you know beyond scope for us but possibly something you know some biological researcher would want to
do I don't have expertise in that I know about it but that's not an area that I have expertise in and so you know some future collaborator might want to say let's look at these five dimensions in
the context of dolphins definitely because they can also you know do a little bit of self-reporting as well but also crows and and other non-human animals. Sure.
animals. Sure.
>> Wouldn't wouldn't turn down that funding opportunity that came along. Uh
let's take a different tack here because some of your uh other research concerns disinformation and I want to look here
at the social impacts of uh AI's disinformation and the intersection with metacognition
and the place where I connect those is looking at the chain of thoughts uh from large language when you ask a question and it starts
talking to itself or it produces a an imitation of talking to itself. But this
seems one of the places where the imitation may not be as good as the real thing. I think we've all had this
thing. I think we've all had this experience of talking to an LLM. It
grossly hallucinates the answer. not
just getting something factually wrong but completely going off the rails as to what uh you are asking about or uh what
sort of answer you wanted. You castigate
it. Uh it gravels profusely and then does exactly the same thing again uh over and over. And you can look at the chain of thought and see it in that
alleged chain of reasoning doing something that uh your lowliest intern wouldn't think of. And
it it clearly seems to defy metacognition at that point because you would say anyone with a brain who was looking at what they were writing would realize that they had just contradicted what they had had said earlier. I wish I
had a an actual example of this to hand, but uh we've all had this kind of experience, so I think we can relate to
it. Um there and and we'll develop this
it. Um there and and we'll develop this into the uh intersection with disinformation.
But does that chain of thought uh does that help you in your metacognition
research on LLMs to uh measure uh or in some way detect
whether it's uh being unhelpful, got the wrong idea, or something more sinister.
>> And and that's a that's a great point.
So you know we the intentionality is always an issue. So when you say does it have something more sinister or your intern who would know better but perhaps intentionally doesn't do that
>> right Russian agent or something.
>> Yeah. Yeah. So we don't we don't address the int intentionality aspect of it. In
fact my background is very theoretical.
So in my group one of the things I tell them I started telling them this about a year ago is I said look we're not doing engineering work here. We're not doing something that what we do now next year
will be in the next claude model that's released. We're doing fundamental
released. We're doing fundamental scientific research. We're trying to
scientific research. We're trying to address these fundamental ideas. Now, I
said that a year ago and I repeated it constantly and then as we see more papers coming out, I don't know if the pace of innovation is so rapid or my estimation of it was so poor, but it
seems like some of these ideas are now leaking into engineering or technological applications. That being
technological applications. That being said, my interests tend to stay in the purely theoretical realm. Um and and some of my um you know experience with the fact-checking misinformation sort of
led me to that because eventually I came to the conclusion over there with my fact-checking misinformation was that you know there's this we we talked about this idea the backfire effect and when
given you know evidence to the contrary to some position that you have most people would change their position slightly uh at least but some people tend to double down and that's that
backfire effect and it turned out there were emotional component that was combined with that epistemic knowledge and so the emotion that's associated
when you when you crystallize information and one of our collaborators was is Roger Asavdo at UCF and so he's our big self-regulated learning expert
and so you know he sort of had this initial insight as well and he said that that emotion is the thing that you cannot um you know sort of disentangle
from that so in our fivedimensal ional approach. What we can tell you is that
approach. What we can tell you is that this is what the we're characterizing what the LM is doing and we're making it in a sense self-aware of it. But in our
current approach, it's just a control framework. But what it means in terms of
framework. But what it means in terms of actual awareness, that kind of thing, these are these are unknown these are unknowns as far as I can tell. Um and
and we don't have answers for that. So
for us this is a scientific work that has an engineering framework is the way that I would put it at this point.
>> What is the role of the emotion in the computation of that vector if it's a component there? We know that we can see
component there? We know that we can see the imitation of emotion in the outputs of LLMs. It's usually some sort of uh
sycopantic politeness, but you can change that setting in some LLMs that you can switch it to cynical and
sarcastic or uh some other change. Uh
but that is generally the extent of it.
I've not seen an LLM get angry or depressed. Uh what sort of emotional
depressed. Uh what sort of emotional veillances are you looking at?
>> Beautiful and you use the correct word.
So what we want to distinguish here is we're not saying that the LLM has emotion. It might very well, but that's
emotion. It might very well, but that's not what we characterize. We have an affective characterization. We call it
affective characterization. We call it emotion because it's equivalent to what humans do. So if it were examining say
humans do. So if it were examining say um documents that in which you know you can use things like NRC lex andol to gauge emotions that are in emotional content of works along these eight
dimensions. These are the same eight
dimensions. These are the same eight dimensions, by the way, that we looked at in our fact-checking disinformation.
And we tried to characterize emotions that people had when they encountered complex questions or politically charged questions, you know, which they have to sort of expose their biases to them to
see if that maybe motivated them to um mitigate or diminish the the level of emotional connection that they have. And
in some it works, in some it doesn't.
But anyway, similarly here, what we end up doing is characterizing the emotional content of material and then the veilance associated with it. But it's an
affective assessment, not saying the intentional like how it chooses to respond in a cynical way or or you know in in the kinds of responses that it
does that we leave it up to the model or the way that the model is sort of um um uh characterized to do.
>> That's just kind of a gloss on the way it expresses itself. Right.
Um so can you give me an example of uh that emotional measurement that you were talking about?
>> Yeah. So in this case let's say you have some material like you know one of the examples that we use is uh in in our papers is did Donald Trump have a bigger
inauguration crowd than Barack Obama in the 2017 inauguration that they had. So
there's lots of evidence that's sort of there that you can sort of bring to bear things like MTBA receipts, you know, wrerships on subways, things like that.
And so people can get a sense of what kind of evidence there is. But it turns out people will have some kind of emotional um connection to it. These
dimensions, the emotional dimensions in this case are things like anger, happiness, sadness. So there are these
happiness, sadness. So there are these eight dimensions that you have and the the way we characterize them is we use something called Plutick's radar chart.
Um and so just a radar chart with like these emotions and it shows the values of each of those emotions that's assessed either self-reported by people.
Um and in our interface for that they had it in LLM we do this NRC lex or other emotional go emotion or something like that to characterize these eight emotions. And then you can see um you I
emotions. And then you can see um you I can probably show you an example of that as well. I can well it's in the paper as
as well. I can well it's in the paper as well but it it shows you like it's this multi-angled chart which shows you like okay it's peeking around say anger and so if it's peeking around anger and
there's you know the next thing is like sadness that's associated with it you can characterize that emotional content and say is this material then something that's emotionally charged or maybe
ethically charged. Are there ethical
ethically charged. Are there ethical considerations also that are often expressed through the emotional content of the the of the semantic material that's produced?
>> But is the material you're talking about there an information source that it is using to answer the question?
>> Exactly. Or maybe even in the prompt itself or you know so when the interaction with the user maybe maybe the user is expressing that but in general you're absolutely right. It's
the store the the information storehouse that it's pulling upon and the and the and the documents within that to see hey what's the assessment of the emotional veilance here. So there could be a
veilance here. So there could be a distinction there between a national parks service report uh here's what uh 15 observers reported and the uh mean
and and variance uh versus uh a bunch of tweets signed uh DJT make America great
again um largest crowd ever uh and emotional veilance there but does that lead you in a direction of characterizing the reliability of the
information >> possibly. And so it would be one of the
>> possibly. And so it would be one of the things that you would put in there and and that's a great question because one of the ways that we have these weights which determine how important is the
emotional component here? How important
is the experiential matching here? And
it can be determined either based on context or use case. So maybe you're having political discussion it has one set of weights but maybe you're having a
discussion about hey how should we architecture the space shut the next you know spacecraft that we have because people's lives depend on that. So how
much you weight the emotional content might vary depending on the use case the context the domain expertise and it might also be learned um components that
you have for these situations as well.
Right now, we put them in as heristics and leave it up to domain experts to give context dependent weights. But
you're right, these are these are dials that we can adjust um based on what we wanted to do.
>> And you talked about your work being pure research.
trying to build a wall there. It sounded
between that and um applications, not get uh sucked into uh uh applying it, but just the work for it its own sake.
Those walls have a habit of being torn down when some uh development occurs and that gives it relevance. And so a mathematician may get a a knock on the
door from the NSA or uh psychologist from the CIA and so forth.
When we're talking about disinformation especially, one can think of numerous applications that are ranging from how
do I uh operationalize this myself to take control of uh some narrative in the public conversation to
uh political campaigns and what is the how do you express the the goal of the work you're doing with
respect to those kinds of interactions.
>> I I first of all I think you're exactly right by the way. So one of the reasons that I characterize this is just scientific work that I didn't think would have immediate direct application is we were dealing with you know a year
plus ago when we first started this with large ensembles of LLM at a point when you barely ran one or two LLMs. So just through by by dent of necessity just the
resources weren't there to do this. So I
thought, well, there's there's no way this is actual engineering that we're doing. So we're contributing at a
doing. So we're contributing at a fundamental science level. And you're
right, those barriers seem to drop. And
now we have these uh multi- aent systems. Initially, we were calling our our work multi-agent systems and we changed it to the to the lower level ensemble of LLMs because agentic AI
meant something else then. Now, agentic
AI has come back. There's now separation of multi-agent systems between deliberative systems. what we do with multi- aents and agentic AI which is task oriented and also agents and
reinforcement learning. So now this
reinforcement learning. So now this whole sort of sub field is beginning to develop and our approach now turns out to be contribution seems to be to multi-
aent deliberative LLM systems and so in this case those barriers do seem to be coming down and the work does seem to be
accelerating at a pace that I certainly didn't anticipate. Um but it is but it
didn't anticipate. Um but it is but it is there. Uh, I I keep coming back in my
is there. Uh, I I keep coming back in my head to the applicability of this research to AI safety and detecting
whether AIs are about to go rogue. And
I'm not inviting you to put on an engineer hat here, but to analyze how that work uh could extend to
the context of AI safety. We've seen
uh studies of LLMs where they have exhibited um on the face of it highly alarming behavior where they determine that in
order to satisfy the goal they've got to undertake some kind of subdiff lie and sometimes that is even for a a purpose that wasn't the one that was
explicitly given but it's for uh self-preservation even all the kind of things that trigger
our Skynet reflex, our hell 9000 uh reflex that that we're going um what could possibly go wrong? What
was your first clue? You know, that we would look back after an apocalypse and say, "Did you not realize when you you saw it lying about uh these answers?"
And is is there uh something that perhaps can be done to detect that a sort of um AI lie detector?
>> Beautiful. And so this is also part of our framework. One of the things we
our framework. One of the things we bring in from the argumentation work that I done is this dialectical sequence. And one of the criticisms of
sequence. And one of the criticisms of these large language models in terms of the safety exactly what you say is that it relies on a single model to self assess and make determinations. In our
framework, we don't because we have multiple models. We have large language
multiple models. We have large language models. We actually arrange them in a
models. We actually arrange them in a dialectical argumentation sequence. So,
we have things like an expert, a critic, an evaluator, a synthesizer, and a generalist. These aren't just separate.
generalist. These aren't just separate.
They can be they can be separate individual models or they can be clusters, which is how we um generalize them. They're clusters of large language
them. They're clusters of large language models. and they do this argumentation.
models. and they do this argumentation.
And so the domain expert makes an assertion perhaps it's not so safe and so it it feeds into in this dialectical sequence to a critic cluster and the
critic cluster bunch of these LLM look at the result that they say and say you guys are lying right and so then conflicting information priority um
dimensional um quantification all of these spike and so then it triggers like hey check this then the next entity in that dialectical sequence is an evalu valuator and the evaluator is another
cluster of LLMs which is which has its dimension set. So it does this kind of
dimension set. So it does this kind of um sort of you know this this this concatenous sort of um uh um assessment of the information and so it says hey
the the critic has a good point the expert has this point I think the critic's right this guy might be trying to start Skynet and then it goes to a synthesizer which is another cluster of
LLMs and that cluster makes a final conclusion synthesizes all that's happened and along the way there's another cluster called the generalist which is at this meta level. So in in
these argumentation frameworks, you have both an object level where you're considering the argument itself and a meta level where you're looking at the structure of the argument. So the
generalist has been observing this structure and then comes in at the end and says yes this makes sense and this also one of the biggest criticisms in
safety in AI is that you rely on a single model and it lies.
Now in our framework automatically we have multiple models multiple agents multiple clusters and with the dialectical cluster we have the
traditional argumentation framework for coming to a better now you could imagine a place where all these models and imagine like in these clusters this is how I looked at it which is why I said
it's science and not technology at the time is I'm imagining thousands of models in each of these clusters there's no way we have the compute power for that today but we will and those things
it'll be very hard to have these large numbers of models all be manipulated um to support Skynet and if it does to our
future overlords I love you always love you always play it safe there it's reminding me of the conversation I had with Craig
Kaplan uh in episodes 285 and 286 with his concept of democratic AI a kind of
um society of mind of uh of models where they would uh form some kind of uh constitutional uh arrangement where uh
they would keep each other in check. Um
is is that a direction that you're going here and and are you working with the um the explicitly uh existential risk
oriented organizations like uh machine intelligence research institute or future of life institute or center for study of existential risk that that are
um focused on mitigating the risk of well focused on the AI control and alignment problems. >> Oh, I'm afraid I'm not familiar with
them. So, I guess short answer would be
them. So, I guess short answer would be no, I'm not working with them. I'm not
I'm not as familiar with that.
>> They want they may want to get familiar with you. I I think I will we'll we'll
with you. I I think I will we'll we'll uh talk to some people there and and say this work here is surely of applicability. And would you agree there
applicability. And would you agree there that you're describing something that appears to be aimed at the alignment
problem to a certainly aimed at the control problem that you you've you've got a kind of um homeostatic mechanism
there for ensuring that for for using multiple LLMs or agents to ensure that one of them doesn't go bananas or that
if it does it it it it won't have um consequences.
>> Yeah, it definitely is flagged and this was definitely one of the um ideas that I had in mind when I did it. You know,
as as I said, like you know, the I have multiple disciplinary perspectives on it, but I don't have expertise in each of those areas. So, as we came up with this framework, as I sort of developed
it, I'd research. So like you know like the area that you're saying familiar with the work in there in a similar way I had the roles I organized it from what we had in organizational behavior theory
and team science and cognitive science and then I found out that other people were doing role-based behavior or role-based research and then as I explored that and then you start to
master it because it's a new area in LLMs it turns out it subsumed in the framework that we had because we started from first principles. So in a similar way when I'm thinking of safety you know
I mean that's exactly one of the use cases that we have in mind but I'm not familiar with the current work in there and it could be that we're a small part
of that or perhaps you know that can also be something that's connected in that's been my experience and it's something really unique in my scientific
career um where the more connections I make or the more things I find out that are outside of the thing it turns out they're actually in the thing.
>> So that's been a very unique experience with this particular research.
>> What do you find yourself most frequently wishing for in terms of resources? You have described some
resources? You have described some research that involves large numbers of large language models. Do you have enough compute or would you like to have
a 100 times that? Something that a a Google could deploy for you instantly for instance. definitely more compute
for instance. definitely more compute and you know we've got Google cloud which we've got we've also we have an application with Amazon to sort of see if we can you know use their framework as well people are also important like
my colleagues my collaborators Charles Kushin and Kof Chu they've been invaluable and sort of like especially in implementing as a theory guy of course I know you know I mean I I teach
you know computer science so I know about that but there's something different about actually doing the implementations and having more scientists that also are good at implementing and
are aware of um you know sort of the cutting edge state-of-the-art which we might not be aware of. That's super
helpful as well. There's only so much you can do with grad students, you know, like grad students are great and some of them are super innovative and we've got some great grad students in our group as
well, Mina Fami and Christa and all but um you know getting like senior researchers in the application domain there's a feedback loop even though
we're fundamental science when you see something implemented and then you can then you can think of how to extend the framework um beyond that and that's what that's what we had the experience with
over the last year as well that framework was there but it wasn't fully formed. It's sort of like what George
formed. It's sort of like what George Whitesides says or maybe Robert Frost right the road not taken at the end you come out and you come up with a story about oh yeah this was all a linear
progression but along the way these are sort of these choices that you're making. In our case, we had sort of this
making. In our case, we had sort of this hybrid thing where the central framework was there, but extending and delineating the details of that framework is
something that happened in this sort of um gestalt manner which was very very neat and unique.
>> Are you teaching undergrad or just grad?
>> Uh both undergrad and grad. So on the undergrad I have my FYE which I really like and database modeling but I do mainly machine learning uh and data science on the grad side. How have you
found the attitudes, beliefs and goals of the students at those levels to have shifted over the last 3 years since the
AI conversation has exploded?
>> That's a great question and it went I think from it went kind of like full circle. It went from desperation and
circle. It went from desperation and abandonment to like oh okay it's not as bad as we thought. So a lot of them especially when these large language models came along they thought why are
we doing any of this stuff um you know we're we're redundant we're not necessary why are we studying this why are we learning about it and you know at a certain point you say okay I I don't necessarily have the answer but I know
here are the fundamentals and let's see how things evolve with time and over the course of I think you're right those three years about a year or two years in two years in is when everyone started to
realize and I have colleagues in our research group as well who an industry like Charles, but also we have um other colleagues that I've talked to both in industry and in government and um you
find that the uptake of AI as I think a lot of people are finding this is all anecdotal at this point um from my perspective is that it increases the
workload on senior and expert people and reduces the number of junior people that you have >> and reduction of the junior level people they produce work which needs
verification and validation. So this is a new area that we're looking into and I think what's going to be really needed um for the uptake of large language
models is traditional specifications verification and validation.
Verification is like hey are we building the thing right and validation is are we building the right thing. A lot of LLMs do the verification relatively well. The
validation on the other hand is up to humans to do and that's where these things are failing and these humans are just taking the output. So like a good
example of this is like someone says hey draft me um a report on X and you do it you send it to your boss your boss is super happy and you're you're happy. Hey
LLM helped me out. I did it passed it on. Next day you go into the meeting and
on. Next day you go into the meeting and the boss says hey by the way on lines 15 through 20 what exactly did you mean over there? and you're like, "Oh crap."
over there? and you're like, "Oh crap."
>> Right? And then the boss is like, "Oh, you had an LLM do it. Well, we can get rid of you." And I just ask the LLM the same question. Uh I I I work a lot with
same question. Uh I I I work a lot with students uh particularly at the high school level uh on these kind of
questions. And I'm wondering how their
questions. And I'm wondering how their uh attitudes towards and interest in the research that you're doing is evolving
and and and then maybe could also talk about well how can they get more involved?
>> Yeah, absolutely. So in fact like this started off with like one of my undergrads at Fitchburg and then we extended with one of my grad students at WPI who came in. That's where the
research thrust is sort of um in in this um artificial metacognition sort of blossomed. But one of the things I also
blossomed. But one of the things I also tell my students is that you know any applied math discipline like computer science is a lot like cooking. And so
whenever someone hires you or you you present yourself as an expert, you're saying I'm a chef. I can cook the best eggs benedict in the world. I can I can cook the best lasagna in the world. But
these large language models are tools and they're like microwaves. So if you use a microwave as a supplement to your cooking, beautiful. Heat up a sauce,
cooking, beautiful. Heat up a sauce, heat up some water. But if all you do is take frozen food, dump it into the microwave, and then present that as the dish that you made, and you're the chef,
you're going to be in for a very rude awakening relatively soon.
But it sounded as though you were describing uh an effect that is cutting off the supply of your future senior researchers. Is is that what's
researchers. Is is that what's happening?
So So what I what I would envision it as saying is that those senior researchers become even more valuable and that these junior researchers I think what they need to do is use the microwave as a
tool. It's just a tool like a LM is just
tool. It's just a tool like a LM is just a tool like a calculator, a microwave, anything else. If you just use it with
anything else. If you just use it with blind dependence and no validation, >> definitely the verification can help you with you're not going to be helpful in
that pipeline. And you're right and that
that pipeline. And you're right and that then all you need but but the only way to get the senior researcher is to have junior researchers.
>> Well, right. it it was it the case that the senior researchers were basically using junior researchers to do the microwaving
>> and so yes and so in academia that's true um in industry so there the models deviate slight all based on anecdotal experience by the way so in industry it's different and depending on the kind
of organization you're looking if it's a mp shop it's probably closer to academia where things are very loosey goosey and messy in large measure again depending on the But if you're talking with a
reasonablesized organization, medium to large, they're even the junior re is engineers in that case or researchers would have to verify and validate what
they're doing before they pass it on.
And so the cognitive load on the senior researchers was decreasing. You're
right. Then they're like a a microwave tool. But I would say that that was true
tool. But I would say that that was true in say medicine and law where the senior lawyers and and surgeons were tasking the juniors with uh the menial tasks
picking up sponges uh uh doing document uh discovery um and uh that they the juniors would absorb the uh senior uh
expertise by just being around and observing. But when those seniors could
observing. But when those seniors could get those tasks automated, then they didn't want uh the the juniors around
any longer. And that's very true. But
any longer. And that's very true. But
here the tasks extend beyond just the menial tasks which are easily automatable but also easily verified and
validated to tasks that demand greater cognitive or creative >> components which you do not verify or more importantly valid. Some
verification does occur and large language models espec es especially as you see with clawed code or something in the context of coding does some local verification really well validation is
way so verification are we building the thing right validation are we building the right thing that's the part where there's a disconnect and these things
aren't able to do possibly our framework would be able to help with that um because we have this dialectical argumentation structure with multiple
clusters of um of um uh LLM models and so it can come in with the generalist and say is this matching up with what we thought and a lot of that we're doing.
So I would imagine in the future right now I think senior researchers are needed. What I would imagine in the
needed. What I would imagine in the future is that these things would be probably able to do that part as well.
But that's why I call it future. No,
nothing imminent as far as I can tell.
But I've been wrong many times before.
>> It's fascinating. our time is limited and uh so to draw to close here you're talking about the future what's in the future for you uh future lines of
research that uh are imminent or potential directions you would like to go and where do you see your field going in the next few years
>> and that's that that's a great point so definitely personally for me we're going very in-depth into the framework and working it out quantitatively and rigorously the different elements of it
As I said, we're I also have for about two years been thinking of how to sort of extend it to this idea of SVV specification verification and validation. And I think that's the next
validation. And I think that's the next component that I'm that that we're going to be putting into this. And where it goes, I think, you know, I think this
gets us to things that then seem to simulate what humans do better. And I
think that's way down the line because I think you require cluster. I mean you're just having those neural connections scale up has been huge to have these deep learning networks with these
billions of parameters. You in our brain we have a 100red billion neurons but we have a 100red trillion synapses between them. You know we have about 40 million
them. You know we have about 40 million neurons in our gut and about 10,000 in our heart. But those are even even like
our heart. But those are even even like the interaction with the with the gut biome of the of the gut neurons is huge in its um behavioral consequences and
the emergent properties. And so now we've got like these these these individual neural connections into the billions. But once we bring in clusters
billions. But once we bring in clusters um and look at these multiple like thousands upon thousands of these agents working together and we have the compute power for that I think we you'd be
hardressed to say we won't have some emergent behavior that's um analogous to what human animals do.
>> Wow. We need not just a theory of mind but a theory of stomach >> that I'm building.
This this is is it's just been so fascinating. Uh really appreciate
fascinating. Uh really appreciate this conversation, learned so much and our listeners have too. Uh your passion
for this field is evident. Uh we will have uh links to your site in and homepage. uh anything else you want to
homepage. uh anything else you want to leave uh listeners with as a parting shot for uh how they could get involved in this field themselves or what they
should be thinking about uh as they head into the future with AI?
>> Well, you know, I mean, the field is rife and open and there's the the the pace of progress is insane and just keeping up with that is very hard. Um
but it also gives you an opportunity to get into it at a relatively ground level still this is I'd say still at its nent stages and so I think this is something that people can come in on and
definitely contribute to and there's a lot to be done and it's giving us insight into those fundamental questions that all of us wrestle with either you know late at night or in our private
moments you know what is this idea of existence and what is this idea of consciousness or or whatever it might be. Um, you know, it it it of course
be. Um, you know, it it it of course delves into like the physics of things as well, uh, the observer effect, etc. But it's it's these fundamental questions that humans have wrestled with
for years. This might be another sort of
for years. This might be another sort of tool that helps give us some fundamental insight into that.
>> Wrestling with fundamental questions is one of the things that most makes us human. So, thank you for demonstrating
human. So, thank you for demonstrating that, uh, Ricky Sati, and thank you for coming on AI and you.
>> Thank you so much, Peter. This was a real pleasure. I enjoyed it thoroughly.
real pleasure. I enjoyed it thoroughly.
Loading video analysis...