Podcast: Peter Scott:

By professorsethi

Summary

Topics Covered

Computer Science Equals No More Programming
Metacognition Monitors Rereading Failures
System Two Triggers Fix System One Errors
LLM Entropy Mimics EEG Confusion Signals
Dialectical LLM Clusters Detect Deception

Full Transcript

Ricky say it's wonderful to welcome you to AI and you. We are going to talk about some fascinating subjects uh disinformation and especially artificial

metacognition.

uh that it's such a loaded term. So much

to explain there and but maybe we could start out with you talking about just how you got into these fields. There's

not a whole lot of um job openings last time I checked that said artificial metacognition researcher. What was the

metacognition researcher. What was the path that you followed that took you to where you are now?

>> Oh, perfect. Well, thank you for having me first. It's a pleasure to be here.

me first. It's a pleasure to be here.

Um, and I think the path that took me to this was I teach a course at um, Fitchburg, which is our FYE course. It's

a first year experience for computer science students. And so, one of the

science students. And so, one of the things I start to do in there is to give them an idea of what computer science is. They usually come in equating it

is. They usually come in equating it with the idea of programming. And of

course as Dystra and I think before that purpose is the person that originally made this uh or had this quote he said that computer science is no more about programming than astronomy is about

telescopes. It's a tool that we use. So

telescopes. It's a tool that we use. So

to give them a sense of the mathematical nature of computer science I sort of delved into these mathematical ideas and wrapping your brain around it is a little tough. One of my part of my

little tough. One of my part of my background is when I was in undergrad, I was a one of my majors was neurobiology.

So I knew that thinking is hard and one of the things we emphasize in FYE are things like habits of mind and in it there's one of the components is

metacognition and metacognition is this idea that's often um related as being said thinking about thinking but and it you know it involves some degree of

self-reflection obviously but it also reflects some degree of control and so uh you know for example if you're sitting there and you find yourself rereading some sentence that you're

studying five times and you you realize you just kind of blanked out on it.

Becoming aware, self-aware that, hey, I blanked out. I'm rereading this

blanked out. I'm rereading this sentence. That's the monitoring aspect,

sentence. That's the monitoring aspect, the self-awareness aspect of metacognition in humans and other non-human animals as well. And then

there's a control aspect where you say, "Okay, I'm going to change something that I do in my behavior." And this is what we wanted to instill in our FYE students. they come in, they're coming

students. they come in, they're coming into university for the first time and thinking is a tough thing to do. So I

show them this video also. It's a

Veritassium video called um thinking is hard and it talks about system one and system two the sort of the dual process framework that we now subscribe to in in

neurobiology. And one of my students in

neurobiology. And one of my students in that FYE class, Josh Yakobani, he took another class with me a couple of years later and he was in the honors program and he said, "Hey, look, Professor Sati,

I'm want to do a project with LLMs. This is in like 2024 and I was wondering if you wanted to be my adviser for it." And

I said, "Sure." And so I came in as an adviser. I mean, I knew about large

adviser. I mean, I knew about large language models, but I wasn't part of my research thrust at the time. And then as he was sort of presenting as we continued in the project he talked about

different ideas that were um being explored in large language models as they were coming to the forefront. One

of the things he happened to mention is that people are exploring metacognition in large language models. And I kind of perked up and I said what the heck is this? Tell me kind of more about it. And

this? Tell me kind of more about it. And

he he told me what he knew but it was limited. So I thought okay I got to look

limited. So I thought okay I got to look into it. And as I looked into it, it

into it. And as I looked into it, it sort of clicked with my background. Uh

my background in neuroscience, my background with argumentation frameworks um that I'd done when I was when my in as part of my research in facteing misinformation and I'd also done an MBA.

I so you clicked with some of these ideas of organizational behavior as well. And so all of these things sort of

well. And so all of these things sort of came together and I started to explore the idea of how machines large language models in particular might exhibit or

utilize u metacognition and that's the field of artificial metacognition.

>> Wow. And I wonder whether scratch that.

This sounds to me like we're talking about what many people would think of as a a kind of shibilith uh that differentiates

humans and machines. I was irresistibly reminded of the one of the initial scenes of Dune movie in the book where

the reverend mother uh has Paul Atrades put his hand in the box that imposes incredible levels of pain.

And the test is to see whether he can override the impulses to withdraw his

hand. And uh she says this is to see

hand. And uh she says this is to see whether you are human. And that

uh description of metacognition thinking about thinking makes me wonder whether the whether you can distinguish between

humans and AI whether AIs are capable of thinking about thinking. Do you approach this from a computer science angle or a neuroscience angle or a psychology angle

or any or all of the above or something else?

>> That's a great question and I think like one of the things I definitely want to clarify is we come at it from this multidisciplinary approach. Um I have a

multidisciplinary approach. Um I have a very multi-disiplinary background. You

know it's in neurobiology, it's in physics. I did my mers in physics. Then

physics. I did my mers in physics. Then

I did my PhD in artificial intelligence.

um I was also an English major so I have like these sort of very disperate um approaches and it turns out when I looked at this problem it was hard not

to look at it holistically. It's hard if you're trying to actually look at the problem as a whole to look at it from just one narrow slice of an artificial

human delineation of a field. So say I'm just going to look at this mathematically from computer science.

That's an artificial delineation that nature doesn't um have. we impose it because we have a hard time getting these larger perspectives. But it turns

out that by looking at it from this holistic um multidisciplinary angle, not only can we maybe address some of these emergent properties better, but by

approaching it also rigorously, the framework that I developed um in about a year ago now, it turns out it was more comprehensive than I realized at the

time. any new research that we'd kind of

time. any new research that we'd kind of find in the subsequent year up to even today, it turns out it kind of fits or gets subsumed into this framework that

we have because we approach it rigorously from these multiple disciplines. But but in in in no way,

disciplines. But but in in in no way, you know, would we say that, you know, when you say AI, just to sort of distinguish that a little bit, it's the

current geni systems from maybe more um AGI systems down the line. So right now we're dealing with these genai systems and for them what we're giving is a

control structure. So we're not making

control structure. So we're not making claims about consciousness and these emergent properties right now. At best

what we'd be doing is providing we hope the first step in that ladder that will take us to that point. But this first step is very narrow and precise in its

claims. um we're talking about geni systems and how they can have a both a self assessment and a control structure or mechanism at this point. Is

metacognition something that we can say every human has? And um I I I was reminded of of uh Julian James the origin of consciousness and the

breakdown of the bicamal mind. And he

goes in history of consciousness back to the Iliad and the Odyssey and uh asserts some kind of change in the structure of our mind which is a sort of

metacognitive uh shift some thousands of years ago.

And uh that makes me think two things.

One, when did metacognition start? The

ability to know that you're thinking about your thoughts that you have a theory of mind for yourself. And two,

uh, are there people who that hasn't occurred to that meditation or anything like that is a foreign concept? Uh, and

that they are on a different side of some line that we might draw than than others.

Yeah, and that that's another great question. So, in this case, the

question. So, in this case, the metacognition we're talking about in that self-awareness isn't self-awareness in necessarily a spiritual sense. It's

self-awareness in a very physical sense.

And so, human animals have that. We've

categorized it. We've quantified it.

Non-human animals also have it. And I

think there might be a reference to like the triune theory like we've got, you can think of us as having three brains.

a reptilian brain, a mamalian brain, and then a primate brain, the neoortex. And

so you might question where in these sort of three levels does the idea of metacognition arise. And so definitely

metacognition arise. And so definitely in the mamalian brain, you see that for sure. Um, and so in non-human animals,

sure. Um, and so in non-human animals, so we happen to be, our species happens to be a vocal learner. And there are other species, non-human species that are also vocal learners. And vocal

learners have this abstraction in the in the neoortex or or the equivalent in these other species where the neurons permeate motor neurons that generate

sounds. And so for us there seems to be

sounds. And so for us there seems to be a big connection between this abstract neural representation and the physical

sort of realization thereof. And so when we start to look at metacognition in humans, we can articulate what we're doing. But in non-human animals, you

doing. But in non-human animals, you have to go by these behavioral studies.

So for us, what we did is like some other research, we started to pull upon this idea of looking at large language models as nonhuman animals in a way. And

so we were look we're examining them also from a behavioral context, but also from a structural context. And it turns out other researchers like Acriman recently, he's done a great job

publishing in this in this area. He was

also inspired by how non-human animals exhibit metacognition and how it's quantified and um what kinds of experiment experiments are done with non-human animals. And he applied

non-human animals. And he applied similarly to large language models as well that idea but metacognition and the self-awareness of it. Um not necessarily

the one we'd associate with spirituality or even consciousness. Consciousness has

no definition. But to my mind there's no doubt that there must be some connection. What precisely that

connection. What precisely that connection is, I think is unknown at this point.

>> So I know about uh some self-awareness tests like uh the mirror test and I believe crows passed that. Can you tell that the thing you're seeing in the

mirror is you and not another animal remarkably similar?

>> I think we got disconnected there.

>> Oops. Not on my end. Sorry. Um, but that part was was recorded. All right. Let me

do it again so you hear it.

>> So I I I know about some of the self-awareness tests like the mirror test which supposedly crows pass. the uh

ability to know that what you see in a mirror is you and not another animal that looks like you

is well can you compare and contrast those tests with the ones for metacognition? Do they overlap

metacognition? Do they overlap completely or what? Yeah. And and and absolutely, you know, the self-awareness of your own being and physical representation is a component thereof.

But maybe a good example would be um you know, there's this there are these vignettes that people do when they're looking at these cognitive reflection tests, these CRTs, and there are these two or three vignettes that are

popularly used. And so I I'll do it. I

popularly used. And so I I'll do it. I

use it in my FOE class as well. So I

I'll do it here. Don't worry about how you actually do on it. But suppose I told you that a bat and a ball cost a

$110 and I told you that the bat cost $1 more than the ball.

>> How much does the ball cost?

>> Um, yeah. Sorry, I've heard this one before, so I know the answer is 5 cents.

>> Beautiful.

You can't you can't trick someone who's knowledgeable about it, but you're right. And so in this case, you relied

right. And so in this case, you relied on your previous experience that you had. You did this experiential matching

had. You did this experiential matching and you were like, "Oh, I know it." The

first system, so you know, this is where we distinguish between the dual process theory, system one and system two.

System two is what we think of as ourselves and system one is sort of this subconscious larger component that automates most of our behaviors and it's

it works great except when it doesn't.

And when it doesn't, it's wrong. And so

already when you heard what I was saying and you were like, "Ah, this triggers this matching that I had, what we call in our context experiential matching.

Hey, I've heard this problem before and I know how it goes." That kicks you out of system one mode and engages system two. If you do that enough, it becomes a

two. If you do that enough, it becomes a learned behavior. You never need to drop

learned behavior. You never need to drop out of system one behav uh system one mode at all.

>> Exactly. Yeah. Like learning a sport.

and and and and there's some degree of anchoring in in that question as well, right? Something that my uh daughters

right? Something that my uh daughters played on me when they were young was this uh uh trick where they would say,

"Say the word silk." Silk. Say the word silk. Say the word silk. Silk. Silk.

silk. Say the word silk. Silk. Silk.

What does a cow drink?

>> Yeah, exactly.

>> Right. Milk. Milk. No. Cow drinks water.

What are you talking about? Well, calves

drink milk, but to cows, cows drink water, right? So, that's that that is

water, right? So, that's that that is triggering a system one thinking um to our our detriment. And maybe we should just visit the system one system two

definition here for the benefit of people not familiar with carnament.

>> Oh, perfect. Yeah, you even made the correct reference. They're absolutely

correct reference. They're absolutely right. Yeah. And so as as he said

right. Yeah. And so as as he said there's this idea and he actually had a triphasic theory um later on that we go to but but in this case you think of your brain as having or your mind as

having these two parts a system one and a system two and a system two system one does the automatic things that you've learned. So when you're exactly like you

learned. So when you're exactly like you know some of the examples that you brought up are like when you're first learning how to tie your shoelaces right you have to really work at it. You might

even work out a song with it that you might have had with your daughters for example. And as you learn it, it's a

example. And as you learn it, it's a difficult task that you're not able to do automatically priori. So you go through it mechanistically over time.

You learn the complexity. And once

you've done it enough times, it gets transferred over to system one. And once

system one takes over, then when you're tying your shoe consciously, you don't even think about it. And it turns out system two is the smaller component in the mind. system one is the larger but

the mind. system one is the larger but we do something very similar in our framework we make these ensembles of large language models so we don't just

look at one um LLM by itself we imagine a bunch of LLMs and we divide them up we divide them up into similar categories like this system one and system two but

our division's a little more complex and it turns out that when you divide it up in this way um it also builds upon the way that currently like when you have like a large language model you know you

hear about they have billions and billions of parameters that they have to learn but they also make smaller versions of these which are a little less powerful and the way they do that is they use a teacher student

distillation model where in this case a larger teacher helps fine-tune or teach a student LLM and then you can release a model with just millions of parameters

rather than hundreds of billions of parameters and so We approach it in this way. That was our sort of initial

way. That was our sort of initial connection through line between what's happening on the computational side and what's happening on the neuroscience side of things. Neuroscience we have

system one and system two and you can look at it similarly. System one when it gets stuck when whatever input from the world triggers internally that hey

exactly what you said what I'm doing isn't working. When that gets triggered,

isn't working. When that gets triggered, the mind shifts to system two thinking.

And system two, which is kind of lazy, says, "All right, let me get on this. I

don't want to be engaged, but now that I'm engaged, let me figure out how to solve this problem correctly." And so then once it learns the correct way to

solve it, it transfers that learning to system one, the larger model. Same thing

we do on the LLM side in the teacher student distillation. In this case, we

student distillation. In this case, we have a small LLM with just a few millions of parameters and it wants to learn how to do maybe a specific task.

You want to make a model for some specific task like maybe um you know drafting scripts for movies. And so you train on this larger model with billions

and billions of parameters that solves all kinds of tasks and it helps fine-tune the smaller model to solve that problem better. and then you can

run it with fewer compute resources. So

it turns out the framework that we make is kind of a generalization of what people were doing there already and it's inspired very directly by the system one

system two paradigm. Well, let's

contrast the that metacognitive process in humans with the one in LLMs in particular at the moment because we can

undergo as humans a a very continuous uh rapid cycle of improvement. If you and I, for instance, start swapping those stories like the bat and the ball

and the milk and the silk, uh, we'd pretty soon realize the class of riddle that we're telling each other and and be

alerted to I'd better kick in system two next time I hear something rather than go with my reflex because there's a pattern here of the reflex being wrong.

That strikes me as the kind of thing that an LLM is not going to do even emergently.

H and that in particular it's not going to be able to channel that realization back into its training weights. They

just don't. It it can um it can be part of the context buffer. It can be even part of fine-tuning, but it's not going to override its basic instincts. humans,

we can do that in in moments. And and

does that would you agree does this lead to any ways of reliably testing that I'm talking to an LLM as opposed to a human being for instance?

>> The the imitation game. Yeah. So that

that of course, you know, is a different thing which which we've passed here before, but you're you're absolutely right. You're spot on with this. LLMs

right. You're spot on with this. LLMs

don't nowadays reflect methodically and rigorously. There are huristics and ad

rigorously. There are huristics and ad hoc approaches that people use all the time to do exactly that kind of thing.

Um, and there are aspects of the model itself that you can characterize as metacognitive. It's sort of the

metacognitive. It's sort of the metacognitive quality or quantity of a model. So some of the work that was done

model. So some of the work that was done and it happened to tie into our framework as well. It's one of these sort of beautiful instances when you come up with an idea and you think someone else has a different idea and it

turns out it's part of the same idea that you've sort of worked out. It's

it's very it's a very not new thing.

It's a very cool thing when you get that experience in life. But um there's this idea Maniscalo and others looked at human beings and started to quantify the amount of metacognition that a that a

human being has and there were these researchers in China Wang at all who then took that work and said can we quantify the overall characteristics of

metacog like how much metacognition a particular model might have. What we did in our larger framework is say exactly what you're saying. You know, right now these LLMs even with huristics and and

these ad hoc approaches, they're not doing it in a regimented rigorous manner. So like for humans when we

manner. So like for humans when we switch sometimes it's a conscious decision like you said, hey, I recognize this paradigm like you said and so I better be on alert. Sometimes that

happens when you when it matches your experience. Other times it might be I'm

experience. Other times it might be I'm not so confident about what I'm saying.

Maybe that comes in later with the bat and the ball. What you often see and in the Veritasium video he actually goes around asking people in Australia about this and they'll very confidently which

might remind you of LLMs and their ability to very confidently give an answer and say the bat costs a dollar, right? or you they'll very confidently

right? or you they'll very confidently give you that and then they'll get some maybe cues from the world around them or maybe just pause for a bit and some of

that confidence will leak out and when the confidence leaks out then they engage system two not as a conscious mechanism. This is what the mind does

mechanism. This is what the mind does automatically once it's aware of some disconnect from the world and its experience therein. So we give these

experience therein. So we give these kinds of sensors to large language models and we have like these five what we call five sensors. It's actually a metacognitive state vector and we

quantify it and which so these are these black boxes in which we can put whatever the state-of-the-art sort of approaches for these components. We base them

rigorously in both neuroscience and cognitive um cognitive science ideas.

And what we do is in these and we're we're still sort of plugging in the state-of-the-art and at a base level you can always ask for self-reporting which

isn't reliable but it's a good first approximation. So we have each of these

approximation. So we have each of these models in our ensemble of um large language models. Each of them evaluates

language models. Each of them evaluates for every query this fivedimensional vector. So it it has sort of this

vector. So it it has sort of this awareness and it uses that awareness whatever the numbers or quantification are of each of the dimensions of that

MSV of that metacognitive state vector it then figures out hey is is is one of these dimensions like if my confidence dimension is really low or if I'm

getting conflicting information and that's really high and it's above some threshold that I've either learned or that some expert has said is correct or

reasonable in these contexts. Do I need to trigger a deeper form of thinking?

And so they make this this decision is made across this ensemble of LLM in our framework. We have a control graph

framework. We have a control graph theory that helps do that as well.

>> Wow. Lot to talk about there. Is this

measurement of these uh five vectors? uh is that done based purely on the observables, the

conversation, the uh output of the LLM or does it involve some kind of um LLM equivalent of electronogram where you

are looking at uh signals inside.

>> Beautiful question. I like how you put that with the with the with the EEG, you know, cuz I was already when you started off to say, "Oh, yes, it's it's just what's there." Then when you said that

what's there." Then when you said that picking up something like EG that's exactly what we do. So internally when it has these tokens and it assigns probabilities to those tokens um let's

say there's a multiplechoice answer right what's the capital of India this is one of the examples that we use and there's new dilly there's Paris and there's um let's say Washington DC and

so it might have different levels of confidence that is associates with each of these options if the confidence in one of these dimensions is super high

then the entropy is very low And so what we're looking for is when it's confused and that's when the entropy will be high. So if all the answers it was like

high. So if all the answers it was like it could be New Delhi 33%, it could be Paris 33% and it could be Washington DC 33%. That's a terrible LLM. But suppose

33%. That's a terrible LLM. But suppose

it does that in this case the entropy is very high. This is equivalent to the

very high. This is equivalent to the EEG. We can then if a model allows us

EEG. We can then if a model allows us and open- source models allow us to do it. these some of these closed source

it. these some of these closed source ones don't but open source models let us look at what these logits or these internal probabilities that were assigned were and once we get those we

can calculate the entropy and then we can make a conclusion of what it must be thinking in fact in some of our experiments which build on what Acriman

did when he's trying to figure out um whether an LLM exhibits metacognition is in some of them in these open source models that allow you to get the logits

He looks at the entropy and says this must be an indicator of some underlying process that it's doing. What the

mechanism of that process is I don't know. So very equivalent. I love the EEG

know. So very equivalent. I love the EEG example because in an EEG there's so many electrical signals and even in in a in in the magnetic version right when you get this uh the the the magnetic

stimulation that can be a little more targeted. But the EEG signal in the

targeted. But the EEG signal in the brain discerning and deciphering which parts correspond to which neural substrates is a very tough problem and but it it's indicative of something

there and similarly here >> you mentioned self-reporting is that the like chain of thought when you can see what the LLM is quote saying to itself

in the process of answering a question >> that that's that's exactly a part of it and it's also the most unreliable one.

So lots of studies which we cite show that this is a terrible way to do it but as a first approximation it's a reasonable entree into the problem domain and seeing what you can do but

you're right in this case you ask the LLM hey how confident so we have these five dimensions might be confidence the correctness valuation we're like how confident are you that the capital of

India is New Delhi and it says um 75%.

And then you say okay do you have any conflicting information and said yeah I also saw that it could be Paris and you're like well how how conflicting is that? Oh about 20%. And so we ask it in

that? Oh about 20%. And so we ask it in this way. Does it match you know some

this way. Does it match you know some experience you've had before? We go

along these five dimensions and we have it self-report values. Right now the beautiful thing about our framework is that's not what we're relying upon.

We're going we're plugging in state-of-the-art systems and this is what we're doing for the paper that we're submitting to Europe where we're doing this um in-depth quantitative experimentation now and we're plugging

in state-of-the-art approaches that other people are using to look at one aspect of what's in our framework. So

some researchers might have been looking at a lot of people actually look at confidence which equates to our correctness evaluation and so as they look at it they come up with different techniques to measure confidence and

correctness and the entropy is one of the easy ones but there are other approaches we can plug them all in and see hey which ones of these are performing the best and we've actually got three subdimensions. So when I said

five dimensions each of those five dimensions have three subdimensions that we look at as well. So is it true to say that in order to do this research, you

have to be working with models that are willing to expose to you their levels of confidence or entropy?

>> That's a great point. And so what we do is we accommodate whether they do or not. So if they don't then we fall back

not. So if they don't then we fall back on these less reliable but the only available approaches that that we have.

So for example, if a model exactly as you say exposes its logits to us, beautiful. we can get a sense of some

beautiful. we can get a sense of some underlying um aspect of it sort of like the EEG. If it doesn't, that's okay. We

the EEG. If it doesn't, that's okay. We

can go with the less reliable behavioral approaches as well to get a sense of what it is. Will it be as precise or as correct as the one that gives us full

insight? No, not at all. But it's a

insight? No, not at all. But it's a first approximation even for closed models. Sort of like with non-human

models. Sort of like with non-human animals. So when we do studies with

animals. So when we do studies with non-human animals, there can't be any verbal um you know sort of exchange that they can do. Depending on the species here, there might be some symbolic

communication, but in a lot of these you can't have that exchange. So you rely on these behavioral models to assess metacognition in these non-human

animals. And is that super precise as in

animals. And is that super precise as in humans? No. Even in humans, you know, I

humans? No. Even in humans, you know, I I would argue that we don't have like, you know, we don't have a handle on everything either here.

something you've mentioned several times there, logic. What's that?

there, logic. What's that?

>> Oh, so this is sort of like the likelihood of something happening. So

it's sort of like a probability. It's

something that we transform into a probability. And once we have it into a

probability. And once we have it into a probability, then we can transform that into an into an entropy determination across a bunch of them. So it's just a

precursor to a probability.

Is it possible to derive measurements like this that are same dimensions for human beings?

>> Yeah. So in fact our dimensions are built upon uh cognitive science and neurobiology research. So the dimensions

neurobiology research. So the dimensions that we sort of impose or find computational coralates for um in these large language modeled instances come

from rigorous cognitive science and neurobiology work. They don't

neurobiology work. They don't necessarily always list them as these five particular dimensions but any departure that we do or any sort of um

combination that we do is principled and grounded in the literature as far as we can see. So we ground it in this

can see. So we ground it in this neurobiology and cognitive um science literature and the five dimensions that we have like the confidence, the conflicting information, the

experiential matching, the emotional response that's in there and the prioritization. These are things that we

prioritization. These are things that we see in human and non-human animals as well in the biological research.

So now the $64 million question. Can I

see somewhere in your research a five-dimensional chart with hopefully a cluster for humans, a cluster for LLMs,

a cluster for dolphins or uh um maybe doesn't work for non-verbals, but >> is that available?

>> So close. So our research is specifically in um the context of LLMs, but we've got our demo. We've got all our source code and it does exactly what you say. You can have radar charts that

you say. You can have radar charts that show the breakdown. We have bar plots. I

can you know fire up the demo and show you, but it's in our papers as well. So

both in our scum paper in our the webc paper and we just submitted a p paper to um kai and we've got the whole demo site with that set up and so you can play

with the code itself and it'll generate in fact one of the um one of the contributions of our work is that when you look at large language models one of the big complaints is there are these

opaque boxes you can't see the reasoning that's going on inside them. One of the things that we do is we help visualize that. So we also contribute to

that. So we also contribute to explainable AI and we visualize it with both these graphical representations but also these numeric details of what's

going on along these dimensions that we have. So we have a breakdown if you want

have. So we have a breakdown if you want just the math of it the the numbers that are associated with it. I should say all of our research is only LLM though. So

we don't have any you know application of far fivedimensional framework to dolphins that's you know beyond scope for us but possibly something you know some biological researcher would want to

do I don't have expertise in that I know about it but that's not an area that I have expertise in and so you know some future collaborator might want to say let's look at these five dimensions in

the context of dolphins definitely because they can also you know do a little bit of self-reporting as well but also crows and and other non-human animals. Sure.

animals. Sure.

>> Wouldn't wouldn't turn down that funding opportunity that came along. Uh

let's take a different tack here because some of your uh other research concerns disinformation and I want to look here

at the social impacts of uh AI's disinformation and the intersection with metacognition

and the place where I connect those is looking at the chain of thoughts uh from large language when you ask a question and it starts

talking to itself or it produces a an imitation of talking to itself. But this

seems one of the places where the imitation may not be as good as the real thing. I think we've all had this

thing. I think we've all had this experience of talking to an LLM. It

grossly hallucinates the answer. not

just getting something factually wrong but completely going off the rails as to what uh you are asking about or uh what

sort of answer you wanted. You castigate

it. Uh it gravels profusely and then does exactly the same thing again uh over and over. And you can look at the chain of thought and see it in that

alleged chain of reasoning doing something that uh your lowliest intern wouldn't think of. And

it it clearly seems to defy metacognition at that point because you would say anyone with a brain who was looking at what they were writing would realize that they had just contradicted what they had had said earlier. I wish I

had a an actual example of this to hand, but uh we've all had this kind of experience, so I think we can relate to

it. Um there and and we'll develop this

it. Um there and and we'll develop this into the uh intersection with disinformation.

But does that chain of thought uh does that help you in your metacognition

research on LLMs to uh measure uh or in some way detect

whether it's uh being unhelpful, got the wrong idea, or something more sinister.

>> And and that's a that's a great point.

So you know we the intentionality is always an issue. So when you say does it have something more sinister or your intern who would know better but perhaps intentionally doesn't do that

>> right Russian agent or something.

>> Yeah. Yeah. So we don't we don't address the int intentionality aspect of it. In

fact my background is very theoretical.

So in my group one of the things I tell them I started telling them this about a year ago is I said look we're not doing engineering work here. We're not doing something that what we do now next year

will be in the next claude model that's released. We're doing fundamental

released. We're doing fundamental scientific research. We're trying to

scientific research. We're trying to address these fundamental ideas. Now, I

said that a year ago and I repeated it constantly and then as we see more papers coming out, I don't know if the pace of innovation is so rapid or my estimation of it was so poor, but it

seems like some of these ideas are now leaking into engineering or technological applications. That being

technological applications. That being said, my interests tend to stay in the purely theoretical realm. Um and and some of my um you know experience with the fact-checking misinformation sort of

led me to that because eventually I came to the conclusion over there with my fact-checking misinformation was that you know there's this we we talked about this idea the backfire effect and when

given you know evidence to the contrary to some position that you have most people would change their position slightly uh at least but some people tend to double down and that's that

backfire effect and it turned out there were emotional component that was combined with that epistemic knowledge and so the emotion that's associated

when you when you crystallize information and one of our collaborators was is Roger Asavdo at UCF and so he's our big self-regulated learning expert

and so you know he sort of had this initial insight as well and he said that that emotion is the thing that you cannot um you know sort of disentangle

from that so in our fivedimensal ional approach. What we can tell you is that

approach. What we can tell you is that this is what the we're characterizing what the LM is doing and we're making it in a sense self-aware of it. But in our

current approach, it's just a control framework. But what it means in terms of

framework. But what it means in terms of actual awareness, that kind of thing, these are these are unknown these are unknowns as far as I can tell. Um and

and we don't have answers for that. So

for us this is a scientific work that has an engineering framework is the way that I would put it at this point.

>> What is the role of the emotion in the computation of that vector if it's a component there? We know that we can see

component there? We know that we can see the imitation of emotion in the outputs of LLMs. It's usually some sort of uh

sycopantic politeness, but you can change that setting in some LLMs that you can switch it to cynical and

sarcastic or uh some other change. Uh

but that is generally the extent of it.

I've not seen an LLM get angry or depressed. Uh what sort of emotional

depressed. Uh what sort of emotional veillances are you looking at?

>> Beautiful and you use the correct word.

So what we want to distinguish here is we're not saying that the LLM has emotion. It might very well, but that's

emotion. It might very well, but that's not what we characterize. We have an affective characterization. We call it

affective characterization. We call it emotion because it's equivalent to what humans do. So if it were examining say

humans do. So if it were examining say um documents that in which you know you can use things like NRC lex andol to gauge emotions that are in emotional content of works along these eight

dimensions. These are the same eight

dimensions. These are the same eight dimensions, by the way, that we looked at in our fact-checking disinformation.

And we tried to characterize emotions that people had when they encountered complex questions or politically charged questions, you know, which they have to sort of expose their biases to them to

see if that maybe motivated them to um mitigate or diminish the the level of emotional connection that they have. And

in some it works, in some it doesn't.

But anyway, similarly here, what we end up doing is characterizing the emotional content of material and then the veilance associated with it. But it's an

affective assessment, not saying the intentional like how it chooses to respond in a cynical way or or you know in in the kinds of responses that it

does that we leave it up to the model or the way that the model is sort of um um uh characterized to do.

>> That's just kind of a gloss on the way it expresses itself. Right.

Um so can you give me an example of uh that emotional measurement that you were talking about?

>> Yeah. So in this case let's say you have some material like you know one of the examples that we use is uh in in our papers is did Donald Trump have a bigger

inauguration crowd than Barack Obama in the 2017 inauguration that they had. So

there's lots of evidence that's sort of there that you can sort of bring to bear things like MTBA receipts, you know, wrerships on subways, things like that.

And so people can get a sense of what kind of evidence there is. But it turns out people will have some kind of emotional um connection to it. These

dimensions, the emotional dimensions in this case are things like anger, happiness, sadness. So there are these

happiness, sadness. So there are these eight dimensions that you have and the the way we characterize them is we use something called Plutick's radar chart.

Um and so just a radar chart with like these emotions and it shows the values of each of those emotions that's assessed either self-reported by people.

Um and in our interface for that they had it in LLM we do this NRC lex or other emotional go emotion or something like that to characterize these eight emotions. And then you can see um you I

emotions. And then you can see um you I can probably show you an example of that as well. I can well it's in the paper as

as well. I can well it's in the paper as well but it it shows you like it's this multi-angled chart which shows you like okay it's peeking around say anger and so if it's peeking around anger and

there's you know the next thing is like sadness that's associated with it you can characterize that emotional content and say is this material then something that's emotionally charged or maybe

ethically charged. Are there ethical

ethically charged. Are there ethical considerations also that are often expressed through the emotional content of the the of the semantic material that's produced?

>> But is the material you're talking about there an information source that it is using to answer the question?

>> Exactly. Or maybe even in the prompt itself or you know so when the interaction with the user maybe maybe the user is expressing that but in general you're absolutely right. It's

the store the the information storehouse that it's pulling upon and the and the and the documents within that to see hey what's the assessment of the emotional veilance here. So there could be a

veilance here. So there could be a distinction there between a national parks service report uh here's what uh 15 observers reported and the uh mean

and and variance uh versus uh a bunch of tweets signed uh DJT make America great

again um largest crowd ever uh and emotional veilance there but does that lead you in a direction of characterizing the reliability of the

information >> possibly. And so it would be one of the

>> possibly. And so it would be one of the things that you would put in there and and that's a great question because one of the ways that we have these weights which determine how important is the

emotional component here? How important

is the experiential matching here? And

it can be determined either based on context or use case. So maybe you're having political discussion it has one set of weights but maybe you're having a

discussion about hey how should we architecture the space shut the next you know spacecraft that we have because people's lives depend on that. So how

much you weight the emotional content might vary depending on the use case the context the domain expertise and it might also be learned um components that

you have for these situations as well.

Right now, we put them in as heristics and leave it up to domain experts to give context dependent weights. But

you're right, these are these are dials that we can adjust um based on what we wanted to do.

>> And you talked about your work being pure research.

trying to build a wall there. It sounded

between that and um applications, not get uh sucked into uh uh applying it, but just the work for it its own sake.

Those walls have a habit of being torn down when some uh development occurs and that gives it relevance. And so a mathematician may get a a knock on the

door from the NSA or uh psychologist from the CIA and so forth.

When we're talking about disinformation especially, one can think of numerous applications that are ranging from how

do I uh operationalize this myself to take control of uh some narrative in the public conversation to

uh political campaigns and what is the how do you express the the goal of the work you're doing with

respect to those kinds of interactions.

>> I I first of all I think you're exactly right by the way. So one of the reasons that I characterize this is just scientific work that I didn't think would have immediate direct application is we were dealing with you know a year

plus ago when we first started this with large ensembles of LLM at a point when you barely ran one or two LLMs. So just through by by dent of necessity just the

resources weren't there to do this. So I

thought, well, there's there's no way this is actual engineering that we're doing. So we're contributing at a

doing. So we're contributing at a fundamental science level. And you're

right, those barriers seem to drop. And

now we have these uh multi- aent systems. Initially, we were calling our our work multi-agent systems and we changed it to the to the lower level ensemble of LLMs because agentic AI

meant something else then. Now, agentic

AI has come back. There's now separation of multi-agent systems between deliberative systems. what we do with multi- aents and agentic AI which is task oriented and also agents and

reinforcement learning. So now this

reinforcement learning. So now this whole sort of sub field is beginning to develop and our approach now turns out to be contribution seems to be to multi-

aent deliberative LLM systems and so in this case those barriers do seem to be coming down and the work does seem to be

accelerating at a pace that I certainly didn't anticipate. Um but it is but it

didn't anticipate. Um but it is but it is there. Uh, I I keep coming back in my

is there. Uh, I I keep coming back in my head to the applicability of this research to AI safety and detecting

whether AIs are about to go rogue. And

I'm not inviting you to put on an engineer hat here, but to analyze how that work uh could extend to

the context of AI safety. We've seen

uh studies of LLMs where they have exhibited um on the face of it highly alarming behavior where they determine that in

order to satisfy the goal they've got to undertake some kind of subdiff lie and sometimes that is even for a a purpose that wasn't the one that was

explicitly given but it's for uh self-preservation even all the kind of things that trigger

our Skynet reflex, our hell 9000 uh reflex that that we're going um what could possibly go wrong? What

was your first clue? You know, that we would look back after an apocalypse and say, "Did you not realize when you you saw it lying about uh these answers?"

And is is there uh something that perhaps can be done to detect that a sort of um AI lie detector?

>> Beautiful. And so this is also part of our framework. One of the things we

our framework. One of the things we bring in from the argumentation work that I done is this dialectical sequence. And one of the criticisms of

sequence. And one of the criticisms of these large language models in terms of the safety exactly what you say is that it relies on a single model to self assess and make determinations. In our

framework, we don't because we have multiple models. We have large language

multiple models. We have large language models. We actually arrange them in a

models. We actually arrange them in a dialectical argumentation sequence. So,

we have things like an expert, a critic, an evaluator, a synthesizer, and a generalist. These aren't just separate.

generalist. These aren't just separate.

They can be they can be separate individual models or they can be clusters, which is how we um generalize them. They're clusters of large language

them. They're clusters of large language models. and they do this argumentation.

models. and they do this argumentation.

And so the domain expert makes an assertion perhaps it's not so safe and so it it feeds into in this dialectical sequence to a critic cluster and the

critic cluster bunch of these LLM look at the result that they say and say you guys are lying right and so then conflicting information priority um

dimensional um quantification all of these spike and so then it triggers like hey check this then the next entity in that dialectical sequence is an evalu valuator and the evaluator is another

cluster of LLMs which is which has its dimension set. So it does this kind of

dimension set. So it does this kind of um sort of you know this this this concatenous sort of um uh um assessment of the information and so it says hey

the the critic has a good point the expert has this point I think the critic's right this guy might be trying to start Skynet and then it goes to a synthesizer which is another cluster of

LLMs and that cluster makes a final conclusion synthesizes all that's happened and along the way there's another cluster called the generalist which is at this meta level. So in in

these argumentation frameworks, you have both an object level where you're considering the argument itself and a meta level where you're looking at the structure of the argument. So the

generalist has been observing this structure and then comes in at the end and says yes this makes sense and this also one of the biggest criticisms in

safety in AI is that you rely on a single model and it lies.

Now in our framework automatically we have multiple models multiple agents multiple clusters and with the dialectical cluster we have the

traditional argumentation framework for coming to a better now you could imagine a place where all these models and imagine like in these clusters this is how I looked at it which is why I said

it's science and not technology at the time is I'm imagining thousands of models in each of these clusters there's no way we have the compute power for that today but we will and those things

it'll be very hard to have these large numbers of models all be manipulated um to support Skynet and if it does to our

future overlords I love you always love you always play it safe there it's reminding me of the conversation I had with Craig

Kaplan uh in episodes 285 and 286 with his concept of democratic AI a kind of

um society of mind of uh of models where they would uh form some kind of uh constitutional uh arrangement where uh

they would keep each other in check. Um

is is that a direction that you're going here and and are you working with the um the explicitly uh existential risk

oriented organizations like uh machine intelligence research institute or future of life institute or center for study of existential risk that that are

um focused on mitigating the risk of well focused on the AI control and alignment problems. >> Oh, I'm afraid I'm not familiar with

them. So, I guess short answer would be

them. So, I guess short answer would be no, I'm not working with them. I'm not

I'm not as familiar with that.

>> They want they may want to get familiar with you. I I think I will we'll we'll

with you. I I think I will we'll we'll uh talk to some people there and and say this work here is surely of applicability. And would you agree there

applicability. And would you agree there that you're describing something that appears to be aimed at the alignment

problem to a certainly aimed at the control problem that you you've you've got a kind of um homeostatic mechanism

there for ensuring that for for using multiple LLMs or agents to ensure that one of them doesn't go bananas or that

if it does it it it it won't have um consequences.

>> Yeah, it definitely is flagged and this was definitely one of the um ideas that I had in mind when I did it. You know,

as as I said, like you know, the I have multiple disciplinary perspectives on it, but I don't have expertise in each of those areas. So, as we came up with this framework, as I sort of developed

it, I'd research. So like you know like the area that you're saying familiar with the work in there in a similar way I had the roles I organized it from what we had in organizational behavior theory

and team science and cognitive science and then I found out that other people were doing role-based behavior or role-based research and then as I explored that and then you start to

master it because it's a new area in LLMs it turns out it subsumed in the framework that we had because we started from first principles. So in a similar way when I'm thinking of safety you know

I mean that's exactly one of the use cases that we have in mind but I'm not familiar with the current work in there and it could be that we're a small part

of that or perhaps you know that can also be something that's connected in that's been my experience and it's something really unique in my scientific

career um where the more connections I make or the more things I find out that are outside of the thing it turns out they're actually in the thing.

>> So that's been a very unique experience with this particular research.

>> What do you find yourself most frequently wishing for in terms of resources? You have described some

resources? You have described some research that involves large numbers of large language models. Do you have enough compute or would you like to have

a 100 times that? Something that a a Google could deploy for you instantly for instance. definitely more compute

for instance. definitely more compute and you know we've got Google cloud which we've got we've also we have an application with Amazon to sort of see if we can you know use their framework as well people are also important like

my colleagues my collaborators Charles Kushin and Kof Chu they've been invaluable and sort of like especially in implementing as a theory guy of course I know you know I mean I I teach

you know computer science so I know about that but there's something different about actually doing the implementations and having more scientists that also are good at implementing and

are aware of um you know sort of the cutting edge state-of-the-art which we might not be aware of. That's super

helpful as well. There's only so much you can do with grad students, you know, like grad students are great and some of them are super innovative and we've got some great grad students in our group as

well, Mina Fami and Christa and all but um you know getting like senior researchers in the application domain there's a feedback loop even though

we're fundamental science when you see something implemented and then you can then you can think of how to extend the framework um beyond that and that's what that's what we had the experience with

over the last year as well that framework was there but it wasn't fully formed. It's sort of like what George

formed. It's sort of like what George Whitesides says or maybe Robert Frost right the road not taken at the end you come out and you come up with a story about oh yeah this was all a linear

progression but along the way these are sort of these choices that you're making. In our case, we had sort of this

making. In our case, we had sort of this hybrid thing where the central framework was there, but extending and delineating the details of that framework is

something that happened in this sort of um gestalt manner which was very very neat and unique.

>> Are you teaching undergrad or just grad?

>> Uh both undergrad and grad. So on the undergrad I have my FYE which I really like and database modeling but I do mainly machine learning uh and data science on the grad side. How have you

found the attitudes, beliefs and goals of the students at those levels to have shifted over the last 3 years since the

AI conversation has exploded?

>> That's a great question and it went I think from it went kind of like full circle. It went from desperation and

circle. It went from desperation and abandonment to like oh okay it's not as bad as we thought. So a lot of them especially when these large language models came along they thought why are

we doing any of this stuff um you know we're we're redundant we're not necessary why are we studying this why are we learning about it and you know at a certain point you say okay I I don't necessarily have the answer but I know

here are the fundamentals and let's see how things evolve with time and over the course of I think you're right those three years about a year or two years in two years in is when everyone started to

realize and I have colleagues in our research group as well who an industry like Charles, but also we have um other colleagues that I've talked to both in industry and in government and um you

find that the uptake of AI as I think a lot of people are finding this is all anecdotal at this point um from my perspective is that it increases the

workload on senior and expert people and reduces the number of junior people that you have >> and reduction of the junior level people they produce work which needs

verification and validation. So this is a new area that we're looking into and I think what's going to be really needed um for the uptake of large language

models is traditional specifications verification and validation.

Verification is like hey are we building the thing right and validation is are we building the right thing. A lot of LLMs do the verification relatively well. The

validation on the other hand is up to humans to do and that's where these things are failing and these humans are just taking the output. So like a good

example of this is like someone says hey draft me um a report on X and you do it you send it to your boss your boss is super happy and you're you're happy. Hey

LLM helped me out. I did it passed it on. Next day you go into the meeting and

on. Next day you go into the meeting and the boss says hey by the way on lines 15 through 20 what exactly did you mean over there? and you're like, "Oh crap."

over there? and you're like, "Oh crap."

>> Right? And then the boss is like, "Oh, you had an LLM do it. Well, we can get rid of you." And I just ask the LLM the same question. Uh I I I work a lot with

same question. Uh I I I work a lot with students uh particularly at the high school level uh on these kind of

questions. And I'm wondering how their

questions. And I'm wondering how their uh attitudes towards and interest in the research that you're doing is evolving

and and and then maybe could also talk about well how can they get more involved?

>> Yeah, absolutely. So in fact like this started off with like one of my undergrads at Fitchburg and then we extended with one of my grad students at WPI who came in. That's where the

research thrust is sort of um in in this um artificial metacognition sort of blossomed. But one of the things I also

blossomed. But one of the things I also tell my students is that you know any applied math discipline like computer science is a lot like cooking. And so

whenever someone hires you or you you present yourself as an expert, you're saying I'm a chef. I can cook the best eggs benedict in the world. I can I can cook the best lasagna in the world. But

these large language models are tools and they're like microwaves. So if you use a microwave as a supplement to your cooking, beautiful. Heat up a sauce,

cooking, beautiful. Heat up a sauce, heat up some water. But if all you do is take frozen food, dump it into the microwave, and then present that as the dish that you made, and you're the chef,

you're going to be in for a very rude awakening relatively soon.

But it sounded as though you were describing uh an effect that is cutting off the supply of your future senior researchers. Is is that what's

researchers. Is is that what's happening?

So So what I what I would envision it as saying is that those senior researchers become even more valuable and that these junior researchers I think what they need to do is use the microwave as a

tool. It's just a tool like a LM is just

tool. It's just a tool like a LM is just a tool like a calculator, a microwave, anything else. If you just use it with

anything else. If you just use it with blind dependence and no validation, >> definitely the verification can help you with you're not going to be helpful in

that pipeline. And you're right and that

that pipeline. And you're right and that then all you need but but the only way to get the senior researcher is to have junior researchers.

>> Well, right. it it was it the case that the senior researchers were basically using junior researchers to do the microwaving

>> and so yes and so in academia that's true um in industry so there the models deviate slight all based on anecdotal experience by the way so in industry it's different and depending on the kind

of organization you're looking if it's a mp shop it's probably closer to academia where things are very loosey goosey and messy in large measure again depending on the But if you're talking with a

reasonablesized organization, medium to large, they're even the junior re is engineers in that case or researchers would have to verify and validate what

they're doing before they pass it on.

And so the cognitive load on the senior researchers was decreasing. You're

right. Then they're like a a microwave tool. But I would say that that was true

tool. But I would say that that was true in say medicine and law where the senior lawyers and and surgeons were tasking the juniors with uh the menial tasks

picking up sponges uh uh doing document uh discovery um and uh that they the juniors would absorb the uh senior uh

expertise by just being around and observing. But when those seniors could

observing. But when those seniors could get those tasks automated, then they didn't want uh the the juniors around

any longer. And that's very true. But

any longer. And that's very true. But

here the tasks extend beyond just the menial tasks which are easily automatable but also easily verified and

validated to tasks that demand greater cognitive or creative >> components which you do not verify or more importantly valid. Some

verification does occur and large language models espec es especially as you see with clawed code or something in the context of coding does some local verification really well validation is

way so verification are we building the thing right validation are we building the right thing that's the part where there's a disconnect and these things

aren't able to do possibly our framework would be able to help with that um because we have this dialectical argumentation structure with multiple

clusters of um of um uh LLM models and so it can come in with the generalist and say is this matching up with what we thought and a lot of that we're doing.

So I would imagine in the future right now I think senior researchers are needed. What I would imagine in the

needed. What I would imagine in the future is that these things would be probably able to do that part as well.

But that's why I call it future. No,

nothing imminent as far as I can tell.

But I've been wrong many times before.

>> It's fascinating. our time is limited and uh so to draw to close here you're talking about the future what's in the future for you uh future lines of

research that uh are imminent or potential directions you would like to go and where do you see your field going in the next few years

>> and that's that that's a great point so definitely personally for me we're going very in-depth into the framework and working it out quantitatively and rigorously the different elements of it

As I said, we're I also have for about two years been thinking of how to sort of extend it to this idea of SVV specification verification and validation. And I think that's the next

validation. And I think that's the next component that I'm that that we're going to be putting into this. And where it goes, I think, you know, I think this

gets us to things that then seem to simulate what humans do better. And I

think that's way down the line because I think you require cluster. I mean you're just having those neural connections scale up has been huge to have these deep learning networks with these

billions of parameters. You in our brain we have a 100red billion neurons but we have a 100red trillion synapses between them. You know we have about 40 million

them. You know we have about 40 million neurons in our gut and about 10,000 in our heart. But those are even even like

our heart. But those are even even like the interaction with the with the gut biome of the of the gut neurons is huge in its um behavioral consequences and

the emergent properties. And so now we've got like these these these individual neural connections into the billions. But once we bring in clusters

billions. But once we bring in clusters um and look at these multiple like thousands upon thousands of these agents working together and we have the compute power for that I think we you'd be

hardressed to say we won't have some emergent behavior that's um analogous to what human animals do.

>> Wow. We need not just a theory of mind but a theory of stomach >> that I'm building.

This this is is it's just been so fascinating. Uh really appreciate

fascinating. Uh really appreciate this conversation, learned so much and our listeners have too. Uh your passion

for this field is evident. Uh we will have uh links to your site in and homepage. uh anything else you want to

homepage. uh anything else you want to leave uh listeners with as a parting shot for uh how they could get involved in this field themselves or what they

should be thinking about uh as they head into the future with AI?

>> Well, you know, I mean, the field is rife and open and there's the the the pace of progress is insane and just keeping up with that is very hard. Um

but it also gives you an opportunity to get into it at a relatively ground level still this is I'd say still at its nent stages and so I think this is something that people can come in on and

definitely contribute to and there's a lot to be done and it's giving us insight into those fundamental questions that all of us wrestle with either you know late at night or in our private

moments you know what is this idea of existence and what is this idea of consciousness or or whatever it might be. Um, you know, it it it of course

be. Um, you know, it it it of course delves into like the physics of things as well, uh, the observer effect, etc. But it's it's these fundamental questions that humans have wrestled with

for years. This might be another sort of

for years. This might be another sort of tool that helps give us some fundamental insight into that.

>> Wrestling with fundamental questions is one of the things that most makes us human. So, thank you for demonstrating

human. So, thank you for demonstrating that, uh, Ricky Sati, and thank you for coming on AI and you.

>> Thank you so much, Peter. This was a real pleasure. I enjoyed it thoroughly.

real pleasure. I enjoyed it thoroughly.

Loading...

Loading video analysis...