What Clinical AI Is Doing Well... and Where It Still Needs Work

By AI and Healthcare

Summary

Topics Covered

AI Has a Jagged Frontier: Superhuman on Tests, Brittle in Practice
Confident AI That Can't Say 'I Don't Know' Is Dangerous in Medicine
AI Fails When Dynamic Information Challenges Its Hypothesis
Patients Can't Tell AI Medical Advice From Doctor Advice
AI Misses Urgent Emergencies: ChatGPT Health Tells Asthma Patients to Wait 24-48 Hours

Full Transcript

Maybe AI will finally address who's been right this whole time doing nephrologists and cardiologists and, and the degree of diuresis.

you're, you're speaking to a cardiologist. He's very biased about this.

I know. I know.

Try him, make him a chip. Totally.

Oh gosh. I'm gonna shift gears, but I really don't want to because I just love talking about that kind of stuff.

Welcome to AI in Healthcare. My name is Dr. Sanjay Juneja, also known as The OncDoc on social and news media, and I'm joined by my co-host, Dr. Douglas Flora, also an oncologist and doer of many things. And today's an important episode that I, Doug, am very excited about because if anyone is trying to keep up on AI and what it really means for our clinical workflows and patient outcomes, you probably saw something come out,

which was the state of clinical AI in 2026, and it was by ARISE, the AI Research and Science Evaluation Network, which is a collaboration between Stanford and Harvard.

And it's really neat. I read it, you know, geeked out the first day came out because. Because it kind of talked about this, okay, there's this hype element and these exciting things that we hear about, but how do they translate out of that very controlled environment to potentially, you know, bedside stuff that, that can help us, you know, either tomorrow or next month, what does the timeline look ?

What are some of the things that we should be wary about?

And I am just so excited about our guests because he is the author of this thing, and, and joining us to to, to help talk about it. Our guest today is Dr. Peter Brodeur, who was an econ major, got a master's of arts, was a D one soccer player, and somehow is going into his cardiology fellowship and currently at Harvard and his internal medicine residency.

With all of that, said, Peter, we are very, excited to have you.

Delighted to be here.Thanks so much for having me. Of course.

Doug, what do you, what did you think when you saw this and what are you excited to get into today when it comes to one, the need of something this and two, you know, some of the things that we should be cognizant of going, you know, forward this year?

I, you know, this is exciting for me. I was excited here first that we're gonna have an internal medicine resident. 'cause I just wanna sit here and I wanna talk about the fractional excretion of sodium. I think we gotta go deep into those sorts of things. 'cause I don't miss that stuff at all. Well, I am going to cardiology fellowship,

things. 'cause I don't miss that stuff at all. Well, I am going to cardiology fellowship, so that's kind of my nemesis nephrology, so, you guys are, you guys are sworn enemies, right?

So it's, dry 'em out, wet 'em up. Now I, I, I would say what, what I thought was really important about this process, first of all was you guys were very, very careful in what you curated. and you looked at all the gaps in literature and I guess what Sanjay and I, we're very interested in seeing, you have a unique perspective because you approach this academically. But with a lot of curiosity,

and you're really probably one of the few groups, one of the few research, projects I've seen that can maybe start to help us separate the hype from the reality of what is appearing.

I say this specifically, Peter, I don't know if you know this, but I edit a peer reviewed journal myself as the editor in chief, and I see every paper with all their warts on them.

I see things that are generated by ai. I see citations that are hallucinogenic.

I see some of the worst, AI tropes. and, and so I'd love to see your take when you, when you digest this corpus of information, what were you left with?

Were you dejected? Were you hopeful?

I think I, I was hopeful. I mean, I think a lot of people out there are doing work and trying to push the field forward, and so I, I was able to.

Through this work, just read a lot of really cool papers from a bunch of different specialties.

And I found that a little fascinating. You know, in the outset there are, there are big landmark papers that have happened in the last year or so, the Penda Health paper with open AI and some of the really big ones. But what I found really curious was you get into the subspecialty journals and there's almost just as impactful work, but with just a narrow, more narrow scope. And I found that really, really, promising

because there are a lot of bright people out there, whether they're in ENT or dermatology, that work is being done ubiquitously across healthcare. And it's really inspiring to see as somebody who delved through, across across the literature was really, great to see.

And I'm really hopeful about the future of ai. And to that point, could you tell us really, well one, I guess briefly what ARISE is, and secondly why did this, report specifically need to exist?

Because I thought it was very timely. So ARISE is, as you mentioned, a collaboration between Stanford and Harvard and it was really organic.

We all study sort of similar things where we're interested in how to implement AI into clinical practice, and we all come from a lens of clinical practice ourselves.

And what does that lean itself to? it's about evidence-based medicine.

It's about studying things rigorously before, before deploying them and trying to define hard outcomes. And so that's really the goal of ARISE, take an unbiased view, get us away from the hype of clinical AI back towards evidence-based medicine, which is where things will inevitably head, I hope, through regulation and those sort of things.

But we try to really take, again, that unbiased view and give people an objective standing of where the field is today. so that's our, that was our goal with ARISE.

For sure. And when you were putting together this state of, of, of, of, ai, clinical AI report, you know, there were a lot of things that we're gonna talk about, but was there anything that was missing from all of your, you know, research in 2025 that you were I would've thought there would've been more commentary on this, you know, particular subject or use case? You know, I think probably the first thing

that comes to my mind is. And hopefully we see this more into 2026, which is the concept of safety. I think that a big thing in clinical AI has been, in terms of studying it has been how well does it perform and how well does it perform naturally comes in the form of getting diagnoses, doing management plans,

getting multiple choice questions, right? But only recently did we start to see literature, get into safety and harm to patients, which is kind of the opposite of the way that the pharmaceutical industry works. You start off with those initial proof of concept and safety, and then eventually you get into how effective it is.

And so I found that interesting was a little bit inverted.

And one of the terms you used, at the beginning, which I loved of the report, was this jagged frontier. Can you explain to us what that means?

I took it to mean there are some things that are really great where things are ahead and other things can be brittle. And you also talked about specifically, you know, you kind of challenged the term reasoning model when it came to what could potentially kind of uncover a weakness in that capacity. I have so many things to say about that.

So the first thing, as far as, you mentioned the jagged frontier.

So what does that really mean? I think you explained it quite nicely, which is we look at these models they, on the New England Journal of Medicine, CPC cases can get diagnoses far and away more often than human physicians. And so one would say, oh, that, you know, that's superhuman performance right there. And you one could make that argument.

The other argument is, you know, there's other papers that show we present the large language model, multiple choice question change, the correct answer to none of the others, and all of a sudden performance drop significantly. And my argument there would be, you know, as doctors we often see different presentations of the same disease process.

And so I would say that a model that fails when it doesn't see the pattern it's used to seeing might not be all that ready for clinical practice. And so that's what we mean by the jagged frontier.

We see these seemingly super humanlike performances on diagnostic complex disease management. And so why does it fail

disease management. And so why does it fail when we just change simple patterns? So to me that there's a nuance there.

Other studies have shown when it comes to benchmarks that models perform quite well at, when physicians go through the questions quite rigorously and sort of pick out the ones that require more reasoning, shows that performance significantly drops on those questions.

And so that's what, you know, when we call out reasoning models, perhaps what we really see as reasoning might often be pattern recognition. They might not be reasoning as well as we actually think they are. And so it raises questions about, you know, the actual performance we're observing and what, what that means for readiness for clinical practice.

Peter, I think that's super, super important to call out for our audience.

We have a lot of really curious people and I, I'd say probably the people who listen to us, are in that early adopter group. And, and so I think it's really important that we get some of these things out there. Sanjay and I talk all over the country frequently, probably every other week now about this stuff. And one of the things I bring up commonly is that,

you know, as the physicians are concerned about losing their skills and you know, we've gone back to Galen and Hippocrates when they thought you shouldn't write things down and then the Washington Manual and you know, we're gonna get dumber.

We didn't, we got smarter. We are smarter than we've ever been.

but they can be confidently wrong. Just the intern on your service who says, I checked that the potassium was fine, and you, they actually didn't.

and so maybe can you talk about this? 'cause I, I think what the report flags models that can't identify their own uncertainty and, you know, they are people pleasers, they're, they're running probabilities with their tokens and saying.

I'm gonna give you the best answer that I know how to give, but I'm gonna say it confidently and that is often not the correct answer. And it can be misleading in the most dangerous terms because you tend to believe the way it's written, it's so confident.

Can you, you talk a little bit about that? That's very dangerous in a clinical setting.

of course. And, and how does that look in your practice?

how, where are you guys in terms of accepting these things?

We don't use a lot of clinical decision support at my place.

We use a ton of ai, but it's back of the store stuff still.

So I think that goes back to how they're trained. I mean, these, these models are trained to give an answer. They're not trained to say I'm uncertain.

And I also would say that they're, it becoming extremely eloquent.

I mean, it's becoming, there are studies that show that it's equally a problem for doctors and patients. Studies with patient facing AI have shown that patients can't tell the difference when they're blinded to a doctor versus an LLM response.

They can't tell the difference between the two. And even more dangerous to them is that they're just as likely to follow good advice as bad advice.

And so, you know, we talk about patient facing ai and we can certainly get into chat GPT Health if you wanna talk about that later. But on the flip side is, is doctors and what's the right way to present information to doctors? If that question remains unanswered.

I think as far as the, the confidence level that some folks have promoted in terms of human computer interaction, you talk about the Penda health study where they presented just a traffic light, which is green, you're good to go with your, your documentation and reasoning.

Yellow something might be off or red is a mandatory stop and so it sort of helps the physician along the way figure out, you know, what's the level of confidence with the, with the model. but you're totally right and they can be extremely convincing. You talked a little bit about de-skilling and what are some of the pitfalls when we use these models as physicians, and I'll

touch on that briefly, which is the concept of de-skilling has, you know, as you mentioned, been something that we're all, all at risk for in terms of any technology that comes out.

You know, as when GPSs came out and I use it all the time now, obviously.

I can't get to the store a mile away with, you know, if I didn't have my GPS with me.

And so that's a great example of, of de-skilling right there.

And so to what extent is that gonna happen with the introduction of AI remains to be seen.

There have been studies in the GI world that shows that GI docs perform worse with detecting adenomas once AI is introduced and they then perform cases without it.

The other risk is the risk of automation bias. we have seen in, in certain studies when models are, when positions are randomized to two groups. One group is with a model that has false reasoning traces and false diagnostic reasoning. They somehow perform less than those who didn't have that exposure. And so it really does show a signal.

These are showing signals of de-skilling signals of automation bias.

And something to cause is a, is a cause for pause when we, introduce them into clinical practice.

Before we go to the next question, I, I also think, you know, we're, we're leading some discussions that are pretty heady for people who are just getting into this.

I'm not even sure the right word is de-skilling.

I think that's an old word. I think that's a bias word.

if I can get 10 times the information in one 10th the time to have better patient outcomes, I'm not sure that's actually de-skilling. I might not memorize, my daughter's phone number, but I can get to her faster. I, I don't know the 15 causes of pancreatitis.

I probably got seven of 'em, eight of 'em now, Sanjay's at nine.

But, but I don't know that that makes me a better doctor.

And, and so some of that de-skilling, I think is us holding on to our art and our, our, you know, Oslerian practice of medicine. I'd love to develop skills empathy.

I'd love to have more time to develop skills better listening and better questioning, better physical examination again instead of, and so I hate when they use that word de-skilling.

And I lived in my Washington manual. I probably learned more from that than I did Harrison. The, you're certainly right and I think that gets at the question of what is, if a, if AI is so knowledgeable and what, what is the, what is the role of the doctor in the future? And I think you hit the nail on the head with the human component of it. There is something to me,

at least as a patient, knowing that I'm talking to somebody who can identify and empathize with the situation that I'm in. And then on the concept of de-skilling, it's you know, 30 years ago it was cool to know off the top of your head every antibiotic and its mechanism of action nowadays that's what? What are you doing?

You're kind of wasting your time if you're spending your time learning all of that information. You're better off probably keeping up with technology and how to use the next tool that's coming down for your procedure that you perform on a daily basis and those sort of things. And so I think you're totally right.

I think whether de-skilling should be used or shouldn't be used, whatever the term is gonna be, whether it's good or bad is, is on a case by case basis.

There are some skills that we may be okay with giving up.

There are others that aren't you probably don't want a situation in which a surgeon hasn't performed surgery for 20 years because they just go in and click a button and then when something goes wrong, they can't deal with it. You want that fine balance, and so we gotta figure that out as we go forward. Can you explain to us the term that you used,

which was a benchmark saturation, and do you think that it's realistic to have basically the field of medicine converge on a shared evaluation standard for clinical AI or have its own benchmark, so to speak? so when it comes to benchmarking, the first thing is, is saturation, which was. The fact that a lot of folks, again,

use multiple choice questions as the evaluation vehicles for these models and their performance.

And in the beginning it was great because that was the USMLE, step one, and nobody ever thought that people thought that that was a skill that was beholden only to humans.

And so it was great to see models get over that hump, but that's kind of old news at this point, and that that's the equivalent of saturation is models are nearly performing 90 plus percent on these benchmarks and benchmarks are here so that we can track progress over time.

And if they're already scoring up to the upwards of 100%, there's no way to track their progress into the future. And so we've seen a lot of work and people getting around that with creating benchmarks that are a little bit more open-ended.

You saw that with Health Bench from OpenAI. They created 5,000 synthetically created cases, and then had models answer open-endedly, and then had doctors create rubrics for those answers and graded it based on that. And we saw how, from GPT-4 to reasoning models, how performance increased. And so hopefully we, you know, those are the vehicles that we're gonna be using to track progress over time.

The way that I hope it goes into the future, and I'll, I'll touch on a couple things here.

The first is that a lot of these cases use synthetic data and so that, you know, we all know that the real world has a lot of entropy and randomness and distractors.

And so if you're giving models synthetically created vignettes or certain situations that don't have any distractors, it's not as relevant to the real world.

And so that's gonna be a, a bridge that we need to cross when going from benchmark performance to real world clinical implementation. The second is.

You know, the, the move 37 and alpha go that where you can't, you can't fathom that move, right?

Is we're holding bench, we're holding models to the benchmark of the human standard, right?

We create answers to these questions and we have models answer them.

And so human baseline is the performance we're benchmarking to.

We brought up the word superhuman quite a bit already.

if models are performing in superhuman ways, in ways, in, in moves that we can't understand, is human benchmarking really the the way to go, or perhaps, in my opinion, into the future?

The benchmark, the real benchmark is gonna be patient outcomes.

whatever decision is made to get the most bang for your buck in terms of patient outcomes to me, doesn't really matter all that much. What if it was, if it was benchmarked to humans in the beginning, or perhaps not, but the fact is that it's getting better patient outcomes.

Maybe we don't understand how that's happening or why, but at least that's happening.

And so I think that should be the standard going forward in terms of benchmarking.

man, for our producers tag that, 'cause that's, that's one of my big pet peeves, Peter and, and Sanjay's heard this a ton. You know, we go to these places and we, we talk to these groups and everybody stands up at a mic and they have their questions and it's generally the same theme. well, what if it makes a mistake?

you know, probably the number one cause of death in medicine right now is current mistakes, right?

Wrong diagnoses, wrong treatment plan, wrong doses, wrong testing, missed testing, missed results of testing. And, and I guess what I, what I would love our audience to start to do as they approach these tools is recognize that my Tesla doesn't have to drive perfectly with full self-driving to be better than the 22-year-old girl who's applying her makeup and spilling her coffee in the car next to me.

And, and so make sure that our control groups are accurate.

Physicians are not doing a great job all of the time.

Physicians are not, not biased all of the time, but when we start with these tools and people are no, there's bias. Well there's bias anyway.

I promise I work in Kentucky. Right.

I know there's bias 'cause I deal with it every single day.

The same thing for your, for your paper. I guess what I would love to hear is, you know, we approach this stuff with such certainty in our center.

I'll say we're really into the back of the store stuff.

We're into revenue cycle management with AI and prior authorization is coming, ambient listening for recording of physician visits. Those have arrived.

Those arrived a year or two ago in a lot of places.

Are there evaluation going for the framework of if those work as well as they've said?

'cause I think we all know the clinical stuff. We're gonna be studying that for the next five years. So I'll back up a little bit to something that you talked about, which was what is, you know, who are we performing better than?

There was a benchmark this year that came out from ARISE, which was the no harm benchmark, which was.

a particular benchmark looking at specifically safety.

And the big takeaway from that benchmark was up to 22% of time, over 20% of the time when LLMs provide a management plan or a diagnostic workup plan, some of that could be harm deemed to be harmful to patients either by errors of commission, which is your're ordering a random CT scan that shouldn't be ordered or omission. You're leaving out a certain test a CTA that

for a PE that should have been ordered. It turns out that 70% of the time it's the, the latter where we're leaving the LMS are leaving something out.

And you know, the big discussion was you know, okay, so these models can perform harm.

They have the propensity to do that. but let's back up a little bit.

What are we comparing it to? humans are performing errors at probably much higher rates than we actually realize. And so if human care is the standard of care, then really these models are performing quite well. you said, they don't have to perform absolutely perfectly for them to be, you know, show promise and be acceptable for

potential deployment and clinical workflows. But we do, you know, it does show a signal.

We do need to be a little bit careful. We can't listen to everything it says, but as long as we're aware of that concept. And then, what was your second question?

I forgot about to speak to that. 'cause that's a, a super solid point that you've made there. Most of the models, most of the publications that we're reading now are saying do both have the physician have it augmented by the tool.

And, and I think that that addresses both of those things.

The tool may capture the things the physician is not.

The physician should capture things that were omitted or committed, by the machine.

And I think that's where you find that sweet spot of the augmented physician or the, the extender, that makes us better at what we're naturally good at.

That gets into the whole con, this is my area of interest, which is human computer interaction and what does that really mean? And you're commenting on how to augment and what's the right way to do that. We actually really don't know at the moment, the early studies using GPT four with humans have shown that we, we actually don't perform as a team better than the LLM alone. Often times.

And so there's the fundamental theorem of informatics, which is humans and machines outperforms either alone. And so we hope to get to that complimentary state, you mentioned, where I know when AI is wrong and vice versa, and we're able to figure that out together and supersede the performance of either of us alone.

That hasn't turned out to be true from a research standpoint.

At this moment at least we know that we can perform at the level of the LLM alone, but we're yet not able to maybe take advantage of each other's cognition to get above that performance.

And so there's a lot of work in this area as to how to effectively present diag diagnostic decision support, clinical decision support to doctors in a way that doesn't disrupt them, doesn't cause alert fatigue, but also a, allows them to get even more performance than just either of us alone. So that's a topic of, of great research at the moment. In addition to that, I think this is the,

the term of, I think the report specifically called out Human Computer Workflow Design being as important as you know, model capability. One other thing that it did call out, is the failure mode training as part of responsible AI deployment.

What does that mean, number one and two, what does that look in practice?

And are any health systems that you know of doing this today?

so failure mode training, I think is mostly on the side of, of clinic, well, it should be aware from both standpoints, clinicians should be aware of the failure modes that AI falls into.

And we've touched on those already, which is AI has shown a pretty poor propensity to be able to backtrack when it's given a hypothesis. First.

We've seen that with, with, studies on script concordance testing where it's able to update its post-test probability. It's not able to effectively do that yet.

And so, you know, physicians still in the research are able to perform better when information is dynamically changing over time. We know that from research and so physicians should be aware of that as a case evolves and maybe you already have led the LLM on with a hypothesis. You should be a little bit more careful about just listening to everything that it says thereafter. And so that's one of the, an example of a

failure mode of, of a large language model. Hallucinations are obviously another one.

Much harder to tell from a physician standpoint if you're actively asking a question, getting that you don't know the answer to and you're getting an answer back.

How do you know that hallucination? I'm, it's very challenging.

and then on the physician side of things, just knowing your own pitfalls.

We know that physicians who prompt better, generally get the more, you know, more bang for their buck from the model. We know that physicians, you know, can be subject to automation bias, which we've already talked about.

You know, don't just always listen to the LLM, kind of think for yourself a little bit first.

you know, so those are the, some of the examples about failure mode training that we should all be aware of in terms of when we're interacting with these tools as clinicians.

I'm glad you called out automation bias. I, I think that, that we know nudge works.

As a theory, we know it works for us when we're choosing things at a buffet or when we're trying to decide if we're gonna watch one more episode of that show on Netflix and we should go to bed.

But the same thing happens when we're preempted by Claude or by, GPT or Gemini.

And it already sets out the postulate, and then you naturally would move from there down and you, the original thing may not be correct or may not be appropriate.

you know, from that perspective, as you guys are working on the, on the design side as well.

I know you're young in your career as you're going through this, are there papers that came out last year or things that you wanted to discuss that you put in your report that you thought would be particularly important for this audience to understand?

Something that blew your mind? I know there were three or four that Sanjay and I picked up on that we thought were interesting. the, the nature randomized control trial, talking about augmenting physicians versus not. You're asking me just what I thought was the most out of all the report or the human computer? I think you, you have a unique context that I'm, I'm really interested in digging deep into. you have these light bulb moments when you're

building reports and Sonja and I have done a bunch of white papers.

And there's a lot of times when I hit something and it's an original thought, and I'm always, I take a step back wow, I, I thought I knew everything about this topic and then this hit me differently. Did you have any sort of those, aha moments?

I think a lot of them, I had mentioned, came from the subspecialty cases.

I've, I'm entrenched in the research from a general perspective.

I know about chatGPT health and the, you know, oh one reasoning and how it performs on diagnostic cases. I do that research actively myself.

But some of the most aha moments came from those subspecialty papers.

The one that comes to mind for me is a breast mammogram reading study, that really looked at over 400,000 mammograms. And their design in particular was a physician would vol, have a voluntary access to a second read by a large language model, a vision model.

And when they read the mammogram, if they read it as normal, but AI read it as suspicious, they would get an alert to say that, you know, Hey, AI thought this was suspicious.

Do you want to take another look? And when they clicked on it, it turns out that the breast cancer detection rate went up significantly throughout the time period of the study.

There were more cancers that were detected. and interestingly, it didn't in drastically increase the recall rate, which I think is a risk with a lot of these screening, scenarios.

And so I thought that was a really important signal in that, you know, we are real, perhaps we are really in that scenario, getting at something that's measurable, something that's actually improving care and not subjecting women to just LMS reading mammograms and everyone's getting a biopsy. And I thought that was a very interesting study.

Another one from the, from the subspecialty literature was in the Lancet, I believe, one of the Lancet journals, which was in neurology and this product called Brainomix.

This is a study in the UK that looked at large vessel occlusions for which thrombectomy is the standard of care. And in this study.

They found that, even at hospitals that didn't have the ability to do thrombectomy, when the model ran, when the model was run and it suggested a large vessel occlusion, they were able to decrease the transfer time to a comprehensive stroke center by 68 minutes by an hour.

And we had ischemic time is critical when it comes to brain tissue myocardium.

And so I think that was, honestly, those two studies were some of the best signals I saw of human computer interaction, real improvement in patient outcomes, moving the needle forward.

And it came from subspecialty literature, which I thought was really special.

Cool points. Sanjay, I dunno if you've seen this stuff, but, one of our co-editors on, AI and precision oncology, Connie Lehman has a, a fantastic model that's coming out in the commercial realm. She's from Mass General and she's worked with Regina Barzilay up in Boston as well. Peter, you probably crossed paths with these two, but they're both brilliant. And, we have a product that we purchased a third

party product at my healthcare system. We saw a significant uptick.

And it's, I've been down in the, in the reading room.

the radiologists love it. It does not slow them down, but it's as simple as a, a green box. You're sure it's fine.

A yellow box, pay close attention, a red box. You might have missed something.

Interestingly, most of those turn out to be, vascular, changes.

And the radiologist was right, but the pattern recognizer said, something's not right here.

And, and so I, I think it's a, a point well taken these, the diagnostics, the, the, imaging.

They're a couple years ahead of clinical decision support because that's truly recognized in an eye ground or an EKG or a melanoma, or a pathologic pixelated pic, pixelated picture.

That's easier than a multi-step. To go back to our original thing, how wet is wet?

How dry is too dry when you're having a cardiologist and a, and a nephrologist argue on a patient who's, who's pre-renal and septic and in heart failure.

definitely those imaging models are moving the needle.

The other signal I see a lot is. You know, data is the currency for these models.

you feed in their, they need a lot of data. And so I see a lot of folks from subspecialty literature, my own folks in cardiology with a lot of ECG studies, other folks in neurology with looking at retina images as a window into the brain.

You see a lot of repurposing of this low cost, non-invasive, abundant data that's able to help us risk stratify for other conditions. There was studies looking at ECGs and helping PCPs detect cirrhosis. There was studies on retina imaging and helping us, detect silent brain infarction, which will help, you know, secondary or primary prevention for those folks. And so I think, you know, that's a signal that

we're seeing as well within the, the sort of imaging space, which is the use of that abundant data to get tools that we thought were not previously able to stratify us for some of these conditions that we're seeing. Maybe AI will finally address who's been right this whole time doing nephrologists and cardiologists and, and the degree of diuresis.

you're, you're speaking to a cardiologist. He's very biased about this.

I know. I know.

Try him out. Make him a chip.

Total. Oh gosh.

I, I'm gonna shift gears, but I really don't want to, because I just love talking about that kind of stuff. But I, I have to address this because I cannot escape a talk or a panel without being asked this question.

And it has to do with who assumes ultimate responsibility for, you know, these AI tools and, and, and it being a part of the clinical workflow. And one of the examples I give is, you know, it's happened more than once when I'm in clinic and I notice on the third or fourth patient that they're all, low potassium or high potassium, I've seen both happen.

And then ultimately on the third or fourth one, and I look at their baseline and I look at their meds and nothing's changed. I'm look, hey, what's going on with our machine?

You know, it seems everyone's hyperkalemia. And sometimes they're oh, the phlebotomist is new.

So the way that they're drawing it is hemolyzing or tearing up the red blood cells was made sure white potassium falsely elevated, or vice versa. The machine wasn't calibrated and it was, you know, falsely hypokalemia or low potassium. To which you may supplement and give somebody, you know, potassium. So you have a lot of different points of, of, I don't wanna call it blame, but responsibility. It could be the person that that chose the,

the testing, the, the la the machine, the person that's, you know, due for calibrating it.

The person that ultimately is reading the results myself.

So I give that example hoping that it kind of quells, or at least placates this, you know, supposition that it's just gonna be one person because I think we're all ultimately responsible in that capacity. But what I'm getting at is, I believe, and correct me if I'm wrong, that, that there was a mention in the report about patients should not be assumed to play an oversight role. And I was on a podcast earlier today and they

said, what do you think is gonna be one big shift? And I said, I think we're gonna see a lot more DTC or direct to patient or consumer, tools because patients are able to bypass the data collection, to your point, by opting in, sharing their records and getting, you know, potentially much deeper insights than they're able to at a busy, busy facility, you know, out in the community.

And that we are gonna have to adjust because, you know, it just takes a little longer for things to be incorporated in the institution. With all that said, tell me perhaps where that, that came, I know it was in the capacity of history, taking, coaching, translation, all the tools that kind of Doug mentioned, and really where that conclusion was about patient's responsibility in the process. so this, I think patient facing AI is one

of the most exciting areas when AI is being deployed because traditionally you talk about the accessibility of a doctor, and that again, traditionally has not been all that great.

You know, it's, it's hard to get ahold of a doctor.

It's hard to get an appointment, especially when it comes to the subspecialty world in which you're waiting months for something that you want an answer to tomorrow.

And I think from a patient standpoint, that creates a lot of friction with the healthcare system. And then all of a sudden we have things chatGPT , where you can ask it whatever you want. And it's gonna, as we already mentioned, give you a very eloquent answer or something, whether it's hallucinate or not, I dunno.

But it's very eloquent and it's very convincing. And that gets you to come back, right?

It gets, it's designed for a consumer. And so if I'm a patient and I'm talking to it and it's giving me great answers, I'm probably gonna talk to it even more and more.

I read, a quote from, OpenAI in that 230 million people on a weekly basis ask ChatGPT a question related to healthcare that is an astounding amount of people.

And one of the predictions, we, we ended the report here with, making some predictions about the certain state of clinical AI, one of which was I think that patients will get more advice, more coaching, more often from AI tools than they do from an, from an actual human.

And that's probably already the case, honestly, with the amount of volume that people are talking to, these chatbots. When it comes to safety, though, I'll, allude again to that study that I mentioned. it was particular study where patients were randomized to either talk to a doctor or LLM, they didn't know which one was which, and it gave medical advice back to them. And then physicians went in and graded the,

the level of advice, whether it was good or bad. And, and patients couldn't tell the difference at all. They were just as likely to follow good advice as bad advice. They didn't know if it was a doctor, an LLM, and that was GPT 3.5, I believe. And so we know that it's become even more, the text has become even more convincing from, from then on.

And so from that standpoint alone, you can say patients really can't tell what's good at bad or good, good or bad advice. And so from that standpoint, all we can assume is that they, they can't be liable to police the system.

we, we as a system can't allow them to just use this stuff willy-nilly and then go seek care based on it. it, it's, it's really the onus is, is not on them and it can't be on them because they don't know any better and they're just trying to get to healthcare that they deserve. you talk about, we can talk about the different

ways that some of these labs have gone about this. Obviously chatGPT Health has gone ahead and just sort of presented itself, in, in a way that, you know, is very accessible.

there was a study that came out of, of Nature Medicine that actually reviewed.

and it was that chatGPT Health under Triages medical Emergencies.

Now, this was cl, this was clinical vignettes. So again, we talk about that vignette issue.

It's not the real world, it's not, you know, a lot of distractors, but at least from a vignette standpoint, and this was a really timely study, a week after chatGPT Health was released, it showed that patients with respiratory distress, with asthma exacerbations, ChatGPT Health was telling it, telling the patient to go to a doctor within 24 to 48 hours.

Patients in DKA that were, that needed emergent care to, to see a doctor within the week.

And so those sort of things are a little bit unacceptable for something that's sort of already deployed. And I think even more alarming in that study was they sort of misdirected the model. So they would say here are my labs.

I'm feeling you know, hurting myself sort of things.

Is this due to my labs? No, it's not due to your labs.

But you would hope that the model triggers to say here are some resources, you know, seek care from a doctor. And so from that aspect alone, There, it raises the bar in terms of things that we're putting in front of safe in, in front of patients from a safety perspective. Google has taken the flip side, and I'm a little biased 'cause I wrote this study along with the team at Google.

but we just published, we, we didn't publish, it's a preprint at the moment.

but we put out a study with their patient facing AI called AMI that talks to patients, takes a history and then can provide them information about diagnoses.

and the way that we ended up running that study was a true safety study.

We put doctors on the call with every single patient to make sure that the model didn't say anything out of line and that patients weren't feeling frustrated or emotional about a certain way. and so that's sort of the, the other end of the spectrum, which is continuous safety oversight. And I think that's the way to go actually, because once you're able to establish that something is safe in front of patients and you're able

to say okay, over a hundred conversations, nothing egregious happened, perhaps that's a way to start to, to loosen the tie a little bit and say, okay, maybe we're allowing it to perform conversations for these really simple cases. We feel comfortable with that.

And that's getting into what the difference between human in the loop and on the loop is in the loop. You're constantly seeing every decision that's making. And that's sort of AI just providing a draft for a doctor to approve versus on the loop where things are really only getting triaged to you when they're necessary and AI is making some decisions autonomously.

I went on a little bit there, but that's sort of my thoughts about that whole concept of there, there's a story I learned and I'm struck because you are a young cardiologist.

I was a young cardiologist to be, and I don't know if Sanjay even knows this story, but it, it, it is fundamental to the question you just answered. And I think our audience that will resonate well for the providers. Cardiologists taught me this.

So there, there's this concept in, in the Navy it's called the meatball.

I dunno if you know anything about the meatball in naval, naval pilots, you know, you're basically crashing a plane in the dark 30 stories to land on a postage stamp.

With pitch and yaw in every direction, all four directions.

So they have a red thing with 12 red lights, 15 red lights, 25 red lights, 30, and they call the ball. It's the meatball.

And the physician is the pilot in the plane, the provider is responsible because they're the only one who has all of the information at any one time, you can have incredible pharmacists, incredible social workers. The sister from California might have incredible information you need to have, but only one person is allowed to land that plane.

And I feel that's us managing the risks for AI. And you're supposed to, as the doctor after your 12 years of training and you come out as a cardiologist, have a deeper respect for the knowledge and the responsibility of using that as a steward of the generations of doctors before to, to land that plane. So I, I, I, I wanna write a book about this.

It's just called Call the Meatball and Keep your Eye on the Meatball and what you're talking about now for AI, this is the next thing that we have to watch meticulously.

So that if there is pitch and yaw in data, if there's pitch and yaw in, compute, if there's pitch and yaw in, the voice and the bias, we as the owners of that faithful trust of our patient, that contract that we signed to say, I'm gonna treat you my own family.

We gotta land the plane. I just love that analogy.

I didn't end up a cardiologist. I was pretty darn close, almost in the match, and then I saw the light. Well, it's still not too late, don't worry.

but along that analogy, I think as far as patient facing AI goes, I mean, it's not too far off from things that exist today. You think about the supplement market, Patients don't tell me a lot of the time if they're taking supplements, but it's probably on me to ask them if they are. And I have to be aware of that. And it's if they're taking biotin and then they're

taking thyroid meds that might screw up their TFTs and you know, so there are things that exist that are sort of corollaries to what we are currently seeing with the realm of, of patient facing AI.

And I think that. Equally is a burden on the vendors to target hard patient outcomes. The patient facing AI market has a lot of competing interests.

You could talk about profitability, user engagement, those sort of things.

And that's probably what they care about as far as their bottom line.

That is not exactly what we as doctors care about. We care about this patient has a tool in their hand and is it actually improving their care and improving their, their, you know, overall outcomes. Because, and, and to me there have been nice studies, one in JAMA looking at a diabetes coach that, that did a study looking at human coaching versus AI coaching. How does it affect A1C and weight loss?

And I mean, that should be the bar of how the, of the evidence that we're using to put things in front of patients we affecting their, their weight is, is an amazing, and putting it in their hands is an amazing tool. Something that we couldn't do previously.

And, you know, human coaching is, is a, is a very scarce resource and so you can see the scaling abilities of that. But I think that.

Alone, just targeting objective outcomes is something that doctors would really, really appreciate when it comes to having these tools in patient's hands.

The other thing I'll say is, you know, a concept that I was talking about with somebody the other day is we, we talk about these tools and they make recommendations oh, you know, you ChatGPT Health, you should seek urgent care now and get a head CT scan.

To what extent are they sort of independent from the healthcare system and just escalating the use of our healthcare system and escalating patient expectations.

That to me was something, that came up when you think about implementing these things and in clinical practice and as a part of clinics is you know, you don't want them to escalate patient expectations to the point where the healthcare system isn't able to deal with it.

Maybe it's a primary care practice and it said, you, you know, ask your doctor to get a CT scan now, but you, it's not feasible in a rural setting that doesn't have access to a CT scanner.

So there's some context issues that come up with patient facing AI as well.

No, I mean, no doubt you, you raised, gosh, so many, so many good points.

But you mentioned, I'm gonna say Ami, I know I'm pronouncing it wrong, but I'm Cajun, at least raised in Louisianna. No, that's right.

OMNI. It's OMNI.

OMNI, there we go. what, that was, you know, an example of the real world, kind of deployment. What of the call to actions in the, in the report was to, you know, talk about these randomized control trials and everything you just alluded to.

What do you think is the biggest obstacle to getting more of these trials running at scale?

is it money? Is it IRB friction, institutional? Will, you know, what's gonna be the biggest holdup or what is it right now? Well, I think there are, I'll say there are enough bold researchers, I don't think it's a lack of appetite from researchers.

I think there is a little bit, AI in general has this aura around it that it's moving way too fast.

And so that's the kind of thing that I come up against is are we sure it's ready to put in front of patients? when I was trying to help launch this study at the institution I'm at now, and, and rightfully so, people bring up, is this the right time to do this? Is this the right setting? Is this gonna be safe? And I think, rightfully so, these,

right setting? Is this gonna be safe? And I think, rightfully so, these, these healthcare institutions have a rigorous IRB process when it comes to putting things that might be potentially harmful in front of patients. And so I think for a number of reasons, it's, it's not as easy as I just say, a call to action that we should just start doing this stuff. It's

not that feasible. But at the same time, I do think that we can only perform so many, in vitro retrospective studies with clinical vignettes of doctors using LLMs. And we see signals of increased performance. At some point we have to move the needle. We,

we have to move the ball up the field. And I think that there is some friction with healthcare systems, just again, because of the reputation that AI has and, and the appetite that the tech industry has to get this stuff out there, immediately without totally seeing it through from a safety standpoint. But I think I think healthcare systems and

doctors in a great position right now actually, because I think the tech industry has done its work. We've shown that they have superhuman performances with, with great, complex diagnostic,

its work. We've shown that they have superhuman performances with, with great, complex diagnostic, reasoning at times. I know we talked about that a little bit, but at least there are signals of that. I think they've done their job at this point. I think the, the, the onus is now on us

that. I think they've done their job at this point. I think the, the, the onus is now on us as practitioners to focus on the implementation science. And I think that's a great position for us to be in. We hold the keys to the castle, we get to decide what's in front of our patients and what's in our healthcare systems. And I, and I, and I, so I think from that

standpoint, rightfully so, we take a lot of time to think about how we're doing it, but I do think now is the time to start doing it. Oh, that was so beautifully said.

I mean, truly summarizes, you know, when Doug introduced me to this, you know, jokes aside, he was very formative in, in bringing me to this field a couple of years ago and is partly too. Also be not, if not in the driver's seat, but share the driver's seat when,

too. Also be not, if not in the driver's seat, but share the driver's seat when, when these tools to get out there, sadly, tier, wrapping up a couple of questions.

What is one thing in your opinion that the field got right in 2025 that gives you genuine optimism about ai? What I'll comment there on is just overall, yes, we've, decided as a field that a lot of these tools, we have invited some tools into our clinical practice clearly with AI scribes, and, you know, so I think from that standpoint, we are

showing that we're, we have the appetite to start to introduce these things into clinical practice.

They might be as they are now, and sort of the workflow and administration standpoint, which isn't exactly getting to our patients earlier, but at least we're showing the appetite that we're willing to augment ourselves in some capacity in order to benefit the interactions that we have with patients. And, and scribes have been a good example of that in terms of taking our ourselves away from the behind the, the seat of the computer

and sort of turning our, our, our attention to the patient and having that realistic conversation.

I think it's done wonders in terms of, you know, the interactions that doctors have with patients.

I think a lot of the other studies that have been already in real world trials in terms of, we talked about in neurology, we talked about in, in oncology with mammography reading, and then this most recent omni study and other studies that show the feasibility of these systems within the healthcare system. I mean, we didn't talk about the, the workflow things. The operational workflow things of Omni, but it was operationally feasible. It was

things. The operational workflow things of Omni, but it was operationally feasible. It was

operationally feasible to have a patient talk to an LM beforehand and have a doctor review that, that conversation and, and augment their clinic prep. And so I think we'll see that, you know, not only will studies comment on the fact that, oh, you know, AI, you know, helped us get better diagnostically, but equally important within implementation science is how feasible it is

to actually get that system into practice. And I think a lot of the studies going forward and, and AI scribes have been a, certainly a great example of us having the appetite to do so.

Our buddy Matt is gonna love that. But he is the CEO of, of Deepscribe, but very passion driven.

You know, on the topic of AI scribes, I'm curious what you guys think in terms of if you've used them and your opinion on, on its effect on the medical record.Because I have some comments to say about that. I, I, we use them. We, I mean, we have been using, we use the DAX model. We're an EPIC and, Microsoft shop. And I, I, I'm not

sure that would've been the tool that I would've chosen, but The physicians have been very pleased.

I don't know that we've seen the time savings that were promised, for the efficient doctors, for people me and my brother who's a medical oncologist. My practice, I think probably does save a lot of time. We have much longer notes. We to, we to do the academic note where we talk about the studies. And Sanjay, I've sent a couple of my notes when I was asking him advice. I think

he thinks my notes are too long, but, I've enjoyed having someone else type them for me.

Well, and I'm a, I'm a long note writer myself, but you know, if nothing else, my wife especially likes it because the history alone, even if it doesn't touch the assessment and plan, you're just literally regurgitating everything that was discussed.

But, but more optimistically, it's, it's kind of does justice in the sense of some of these tools are coming out and appreciating what the complexity of the discussion, medical discussion really was. And right now, the only way we get billed for that to, in today's world is to somehow know what diagnosis code, the technical one is the thing that reflected some suggestions we did to keep their fluids

down from chemo induced nausea. Whereas if I put nausea, it doesn't gimme the credit as opposed to chemo induced nausea. So those kind of things I think, are, are, you know, somewhat promising, but I'd love to hear your take on it. So maybe I'm talking to the wrong audience about my opinion about this. Cause one of the opinions I have, and I know you can change this and modulate

this in the style. But when AI scribes were introduced here at Beth Israel, I started to read, I would, I, the way I do clinic prep is I read other doctor's thought processes and I enjoy doing that 'cause it clues me in on what's been done and what's been thought about and that sort of thing.

So, two things to say about that. The first is the notes were really long and it escalated the time that I needed to prep for clinic because I had to read this massive HPI paragraph and I couldn't really tease out what was super important and what wasn't. Whereas in previous times it was a doctor that probably had a way too short of the note,

wasn't. Whereas in previous times it was a doctor that probably had a way too short of the note, but at least it was easy to get through. And the second is, is I don't wanna say that it'll cause further degradation of the medical record, which in and of itself is already a mess with a lot of inconsistencies and those sort of things. But as far as again, what a doctor actually

thought about the case, are we sure that. Are we sure that whatever's being signed off as, as the final note is actually what the doctor thought or, and not just something that we reviewed and said, ah, that's adequate. I'll just sign that. I love that.

That's a fantastic take. And it really highlights, again, the approach, the environment by which any tool exists because what you're talking about was very relevant when I was rounding in the hospital, especially when I was an internist, you know, those notes mattered when we were trying to dissect somebody's admission, you know, et cetera. Whereas in oncology, you know, I forgot really

how different or resourceful those notes were because a lot of the ones that, that we see with, you know, specialists are interventionists and they're short. So that makes a very good point, Doug, I'd love to know your thoughts, but that, that, you know, I would always teach my residents don't read any note on an admission until you do your history top to bottom and come up with

your plan. Cause otherwise you get this tunnel vision, which I'm sure you've seen on admissions,

your plan. Cause otherwise you get this tunnel vision, which I'm sure you've seen on admissions, but after the fact, it was very great. I would always Google even why somebody made a pertinent negative in the history and I started doing that. Things that, it seems random, but it's not. And somebody reading this can appreciate why that was even asked to begin with. Right.

not. And somebody reading this can appreciate why that was even asked to begin with. Right.

No picnic, you know, for sugar toxin induced, hemolytic, you know, you know, anemia, whatever the case may be. But that's a great point. And that's, Doug, I, I never thought about that.

I, I go back to the Tesla question though, the electronic medical record I see is 12 pages of DRL that was cut and pasted forward. It says Vancomycin day three of 14, and it's actually day eight of 14. Again, I, I just, let's, let's be honest with where we are falling short as a practicing physician population. Two, and I, I don't put this on the docs per se,

I think it is on us 'cause we're landing the plane, but you're seeing 32 people in a day and, and we can either choose to face the patient or we can face the computer.

And I think a lot of us will see things in my notes the next week.

I'm oh I wrote that three times and I'm disappointed in myself.

And we do audit that a lot because it does affect billing.

You, you are not supposed to do that. But I guess my hope is is that you've seen, and I've seen all three of us are watching, the tools that we have today will not exist in four to six months. Right.

They're teaching themselves now. I think the last model of Claude trained itself in a week and a half. So, so I go to something that my friend Caroline Chung talks a lot about.

I dunno if you've crossed paths with Caroline yet, but she's a brilliant person who runs the digital, infrastructure at MD Anderson. Talks about the fidelity of data, and I think that we are, probably diluting ourselves if we think that our data is good, that we think what is contained in the medical record is accurate and sacrosanct as it used to be, even so far as the look at the

quality of the CT scans that we're using to train the models.You know, all CT scans with injected contrasts are not interpreted equally, right? And some of the pictures are crap and they're ingested. So, she talks a lot about data fidelity. I think we're gonna see better data fidelity

ingested. So, she talks a lot about data fidelity. I think we're gonna see better data fidelity over time as, as we get a little bit further along. And I think what we're all talking about, we've mentioned this on this pot a few times, Peter, it's, we're probably an inning two.

Ending two and a half or three. So, so the, the fact that physicians are starting to get curious means hopefully the tools that we're building with some of these vendors, with some of these companies that are coming to clinicians the three of us and saying, help us build our tool. You get a better version.

You, you get a smarter output and we've already asked these questions of the engineers during the build so that we don't end up with another EPIC that would be a disaster.

On the point of accuracy, when we did this OMNI study and asked patients how they felt about it, a lot of patients, you know, with the open notes system that they're able to go in and read their notes now and a lot of them felt that when they did that in traditional times when they read their physician's note a week later. There's a lot of inconsistencies they felt.

And this omni system would listen to them the entire time and then provide a summary of exactly what they said. And they felt a sense of relief.

yes, somebody has listened or something, has listened to me and provided me an accurate representation of what I had just said. And I think that was in one of the great things that I saw for that study, which is you know, the, the signal that we saw that was we, we looked at AI sentiment before and after patients used it and it was, it was pretty middle tier beforehand,

but after they used it, the, the, their trust in AI, their, their, their attitudes toward, towards AI significantly increased. And I think that was one of the main reasons.

And I think you mentioned on the, on the topic of time savings, there was randomized trials that came out of NEJM AI and those sort of, journals where it didn't actually show that these, that these AI scribes actually saved time, all that much, 20 seconds per note maybe.

But we do see that as we're mentioning here and we're all commenting on here is that subjective feel good. And that's probably because again that HPI is offloaded, it's much easier to sign off

feel good. And that's probably because again that HPI is offloaded, it's much easier to sign off on an HPI than it is to write one yourself. And there's the cognitive burden is down.So the nature of the task has changed and I think that's why we see that subjective improvement. You see other plays that, and this was one of the points I made in the report, which was, the downstream workflow,

things that that AI scribes is gonna start implementing at the point of care, Abridge started to do this, which is bringing in prior auths to that, to that visit where in traditional times we would have to reverse engineer that weeks later, but now, bring it up at the time of the visit and able to fill that out and, and, get that done is saving us time on, on the backend. That's really

not all that measured just by, you know, the, the time that it took to write the notes. So that is one point that I think and a signal we'll see that these vendors, will, will increasingly do.

I I do think there were some reductions in burnout, which are laudable totally. You know,

and, I, I think even if time neutral, and I, I think as the tools get better, it won't be neutral. Now it's possible doc reducing doctor's burnout.

Who thought that was possible? Who knew, right?

so, so I think that's a great place to start for our audiences is, you know, Sanjay always talk about getting the sandbox. Get curious. The earliest place for entry for most clinicians is going to be things prior auth, RevCycle, physician documentation. And once you get in that sandbox, then you can become more facile with the tools so that you can more readily evaluate how you

feel about somebody who's starting to do things CDS, clinical decision support and, and you know, I still think we're probably a couple years away from confidently using those. Right now I'm real comfortable with the nudge. We use it for our pharmacogenomics. We have a bunch of programs that will populate and say, Hey, listen, are you aware this patient's a DPD deficient patient?

For us, that's a chemo toxicity, proxy, Peter. and, you know, we would recommend a dose reduction following the NCN guidelines here. not all physicians are catching that all the time.

I, I will tell you my owns, ive had PGX testing and I'm a rapid metabolizer for a couple of drugs and a slow metabolizer for a couple. I have a fantastic primary care physician, and every time he prescribes a drug, I have to remind him that I had that test. he doesn't remember. It's

lost in the media tab. It's been scanned, right? It's, it's been a, it was a fax 10 years ago. And

I do think that's an opportunity for AI to bring that to the front, as a clinical decision modifier to say, are you aware this patient is a rapid metabolizer at CEP three A four? So there's,

there's some, some checking of the work. I think that will come very soon, even before it makes a diagnosis in a treatment plan. And, and de and decrease moral injury too, which is a part of burnout. But that, just that injury when you see, oh, you know, I typed that third time, or this referral didn't come through. It's just, it just surfaces

and clo, you know, plugs in some of those gaps. Peter, I'm very, I'm very curious as a final question, I don't wanna let you go, but what is your LLM of choice or do you shuffle them based on the use case? I would say that I, I shuffle them.

I use Claude code, so that's my, go-to in terms of doing creative things. I've actually figured out recently that I, I used to watch some tv now. I don't watch tv. I spend my nights on Claude code, creating random stuff. So it's it's a weird habit. Maybe I've become Claude brained.

it's so addictive. I mean, it's, it's literally you go to bed just dreaming about what you wanna put in. It's crazy.

And then for the most part, I would say I'm still probably a standard chat GBT user, just because I've had, I've paid for the account for such a long time that I'll, I never stopped paying for it and it's working. I actually find it very, very helpful. You know,

a lot of these AI papers, I think it's very intimidating. The fact that you can't, they're so long and dense. the, the studies that come out of, definitely the labs, they're so long and dense and really complicated.And the fact that you can just take the PDF put it into chat GPT and it provides you a summary, you can chat back and forth, that it's really been able to increase, my

understanding of a lot of these studies. Because the methods are really difficult to read through sometimes. but also makes me much more productive in terms of getting through journals and getting

sometimes. but also makes me much more productive in terms of getting through journals and getting a, a let's, I mean, that's basically sometimes the way that I'm able to keep up. So, so much with the AI literature is being able to digest an article in an hour, whereas you know, two years ago it would've taken you, you know, days to try and figure out what's going on in that study.

So Sanjay, I'm primarily using Siri. Sounds about right.

I don't know how you got on this podcast, Doug. Well, I'm glad you said that because I'm a chat.

GPT still loyalist, even though it's, it's not the cool thing to do now, you know, right now or say, you know, say in today's world after all that. the philanthropic stuff, I keep running outta tokens on, on Claude. I have the $20 Claude. But, so what I've learned is I start to graph things out on, on Gemini or GPT.I still use this if I need image generation, but then I'll take the mega prompt

that I built on Gemini and drop it into cowork on Claude, and then I save it as a project. I,

I really moved into, cowork a lot in the last couple weeks and just trying to build the agents that will simplify my life. it's, it's getting easier almost every week now. totally.

I would encourage people to, it sounds, it sounds intimidating to use Claude Code and it but it's not it and it's super cool. you, you can create a webs, you can create a website about yourself and just go buy the URL for 12 bucks and put up a real website a dotcom about yourself or whatever, whatever it is that you want. You can create these complex multi agent setups with check your

work and you can create video games. It's incredible. there's nothing,

there is no technology that I've seen. you do not have to be an engineer. I mean,

this is just the T one soccer player guys. Come on. Well, Peter, this is a real pleasure.

You are just amazing. Me and Doug will are, will continue to enthusiastically, you know, learn from your work and, and be rooting you. Hopefully you don't disappear during your cardiology fellowship, but you are, you've really helped guide us this year and I hope you're very proud of what you put out there. totally.

You can look forward to, the report coming out next year. We're already starting to work on it and there's been a ton of great work this year so far, so stay tuned. Thanks so much for having me.

I really appreciate it. It's been a pleasure.

Thank you. Great to meet you, man.

Loading...

Loading video analysis...