LongCut logo

Anthropic Just Bought a Dev Tools Startup for $300M. Here's What Its Founder Told Me.

By Every

Summary

Topics Covered

  • MCP burns through context windows
  • Two tools beat fifty
  • Security lives at the API layer

Full Transcript

The internet runs on computers talking to each other, but its entire architecture was built for a pre-AI world. Now, we're trying to hook AI up

world. Now, we're trying to hook AI up to the internet with MCP, model context protocol, which turns any website or web service into a set of tools that an AI

can use natively to get work done. And

the software companies that learn how to do MCP well are going to win over the next decade. That's why I brought Alex

next decade. That's why I brought Alex Ratray, the founder and CEO of Stainless, onto the show. Stainless's

job is to help computers talk to each other. They make the API and SDKs for

other. They make the API and SDKs for all the big companies that you know about like OpenAI and Enthropic. And

they're starting to build MCP servers, too. So, Alex and I get into the

too. So, Alex and I get into the nitty-gritty of what the future of MCP looks like, how to design good MCPs, why MCPs are actually really hard to scale

and possibly insecure, and we try to figure out together what a better model for allowing AIS to use the internet might look like. This is a great episode. Alex is a good friend of mine.

episode. Alex is a good friend of mine.

Let's dive in.

Alex, welcome to the show.

Thanks, Dan. It's uh really exciting to be here.

It's good to have you. So, for people who don't know, you are the founder and CEO of Stainless, which is the API company. uh you make APIs for companies

company. uh you make APIs for companies like OpenAI and Enthropic and just name your big company that you might use your API. Stainless is probably behind it. Um

API. Stainless is probably behind it. Um

before that you worked at Stripe doing their API surprise. Um and before that most importantly we were very good friends in college and we remained good friends. Uh and we were both starting

friends. Uh and we were both starting companies in college. I'm a tiny investor in stainless. Uh, but it's been really really fun to watch your journey and get to get to hang out together so

much over the years and uh I'm just very excited to bring you on to talk about AI and and what you're doing at Stainless.

Thanks, Dan. Yeah, it's um it's uh been really fun over the years. I mean, you know, when when we were in college, um I was working on a startup. You were

working on a startup. you you had a conference room um at a venture capitalist office um as your office and you let me crash there um with uh with my co-founder and team um and we were

just like on the other side of the conference table hacking away into the evening um uh and you know very fond memories of those days and and these days it's it's not every evening but you

know on the weekends whatever same thing is still happening um and it's you don't you don't see that every day and it's it's really a nice feeling and um it's been Great to see everything happening

with every um along the way.

Thank you. As I say, started from the bottom, now we here. Um

uh and uh yeah, I mean I the thing that I always say when people when I run into people and they ask me about you, um in order to embarrass you, I I just talk

about how you're the only person that I know of who has consistently run barefoot through the streets of Philadelphia. Uh cuz when we first met

Philadelphia. Uh cuz when we first met you were you were not a fan of shoes and you were a fan of running. You want to talk about that?

Yeah. It wasn't that I didn't like the concept of shoes, it's that I couldn't find a good pair. Um and at a certain point, you know, it's like I was running through Nikes and they would they would

bust open every few months. Um I think what was actually going on is I had really wide feet. Um and was I was buying probably narrow shoes. Um but

they would shoes would constantly get ruined and um you know on a college budget it's just like this is this is it this is no good. Um and uh

eventually I decided okay the longer you wear your shoes the the more worn out they get. But the longer you just wear

they get. But the longer you just wear your feet the tougher they get. Um so

the longer you wear your feet.

Try it out. Try this at home. What can

go wrong? Uh, I actually currently have a really annoying splinter at one of my feet. Uh, that was uh, so don't actually

feet. Uh, that was uh, so don't actually try this at home, but um, are you still running barefoot?

No. No, this was just from around the house. Um,

house. Um, I see. Dangerous.

I see. Dangerous.

Yeah. Yeah. But see, that's the thing.

If I had been going around uh on the asphalt u without socks on, then uh, my feet would have been tougher and I'd have no splender. Um, so when you're not

running barefoot, uh, you running you are running stainless.

Um, so you're running stainless and so how how many people you are you know you're you're around 50, right?

Just about. Yeah.

That's that's pretty wild. Uh, and you started stainless in a pre-AI world and now we're in an AI world. And I think

you have some ideas for what the future of AI is going to be and maybe how how APIs fit into that, maybe how MCPs fit into that. Do you want to like paint a

into that. Do you want to like paint a little bit of a picture for us about where we're going?

Yeah, I would love to. Um, so to start, like what's an API? Um, uh, not everybody's familiar with that. So, um

it stands for application programming interface. Um there will not be a quiz,

interface. Um there will not be a quiz, right? Right, Dan? No quizzes?

right? Right, Dan? No quizzes?

No, no quizzes.

Great. Um but basically, it's it's how one computer program talks to another computer program. Um it's how it's how

computer program. Um it's how it's how computers talk to computers, how apps talk talk to apps. Um and so APIs are are the dendrites of the internet. Um

dendrites are where your neurons connect and and actually exchange information with each other. So if you have like two neurons in your brain but they're not talking to each other uh you're actually not thinking right there is no thought

happening in a brain without connections between neurons um and if you think about the internet if all these servers in the cloud weren't talking to each other you wouldn't you wouldn't have

internet right like there's there's nothing going on um if uh you know programs internet software is doing nothing uh without APIs without connections to to other programs um and

so it's really fundamental to the mesh to the mesh of um pretty much all modern software. Um everything that we think of

software. Um everything that we think of when we think about technology at this point. Um APIs are kind of at the the

point. Um APIs are kind of at the the heart and center of that just like um dendrites are you know the center of the mesh of the of the brain and and how we

think. Um and um Seamless's mission from

think. Um and um Seamless's mission from day one was sort of to make it easier for computers to talk to computers.

And um you know it's the longunning trend of technology to have more automation right automation is

what we mean when we say okay we're going to you know we're going to we're going to apply technology to that you know we're generally going to be making things more efficient. Um and APIs are how most businessto business

interactions in some format or another um become become real become automated.

Um and what we see with the the rise of AI is that there's a a new a new computer has entered the chat, right? There's a new

there's a new kind of system that can talk to other systems. Um, or at least we would like it to be able to. Um, you

used to have either, you know, humans interacting with a computer through a user interface, a UI, or a computer acting with a computer through through an API. Uh, and now we have LLMs

an API. Uh, and now we have LLMs interacting with computers, right? And

what's that through? Um, and I'm sure anyone familiar with, you know, with every and and and who's regular listeners is going to be familiar with MCP, uh, model context protocol, which

is, um, a system for connecting LLMs to to computers, uh, broadly speaking. Um, and it's an area that

speaking. Um, and it's an area that we're investing in at Stainless. It's

really I think part of our core mission of you know like I said make it easy for computers to talk to computers and um we've invested a lot of time you know at

Stainless the core product that we first brought to market is um software development kits SDKs and so these are ways of saying okay stripe has this

great rest API you know you can send JSON over HTTP and get back JSON over HTTP Um, and if you want that to be really convenient, you're going to use the

Stripe Python library, the Stripe Python SDK. So, you can go, if you're a Python

SDK. So, you can go, if you're a Python developer, you'll go pip install stripe and then um in your application code, you'll write Stripe.cuss.create

and all of a sudden you have a nice new customer object in in sort of your Stripe database. Um, and you're off to

Stripe database. Um, and you're off to the races. Um, or Stripe.charges.create

the races. Um, or Stripe.charges.create

in the old days um to charge a credit card.

Um, and and SDKs are what gives developers that easy way to to to interface um with an API.

What's the thing that gives LLMs an easy way to interface with an API? And you

might say MCP. And in a sense, you'd be right. Um, but what we're seeing so far

right. Um, but what we're seeing so far as MCP is rolling out um into the world and people are experimenting with it and trying it out um

is that it's not working so great. like

there's it's it's difficult to deliver on what I see as the core vision of of what's so exciting about MCP um which is

just like a a dashboard in a user interface lets you click around see a bunch of stuff fill out forms click buttons do things that you would do

while you're inter interacting with the software you do through the user interface generally um but LOM's interacting with through MCP it tends to be much more restricted. You can only do a few little things. There's usually not

a ton of tools that you're going to be exposing to the models. Um,

and and just just to just to stop you there. So, I I think what I'm hearing

there. So, I I think what I'm hearing you say is what what the MC what MCP does is just like a a website is built for humans to be used. MCP is sort of

the equivalent in you can think of it in in certain ways of exposing a set of tools for the model that it can it can use to perform certain functions. Just

like you might click a button on a website, the MCP gives to the model a bunch of things it can click on or use to get work done. So an example might be

um you know an a Gmail MCP has like a send mail tool or like a compose mail tool or a read inbox tool, that kind of thing. And instead of a human going on

thing. And instead of a human going on the Gmail website and doing it, uh it's the it's the LLM is like, you know, essentially logging in and and and using it itself and it's a it's a native interface for for language models. But

you're saying that that's not working that well. Can you

tell me more about that?

Yeah. Um so let's let's start actually with with kind of what I see is the big vision of MCP and in some sense the big vision of aentic AI in the first place.

Um and I'll start with the most pedestrian example you can imagine. It's

going to be funny given some of our context.

Um, which is let's say, you know, Dan walks into my store and um, buys a pair of stripey socks um, and maybe a few other things. And then the next day I

other things. And then the next day I hear back from Dan um, that there was something wrong, unfortunately. It

happens, you know, and I turn to someone on my team and I say, "Hey, um, can we refund Dan for those stripey socks he bought yesterday and send him a discount code for for the next time he comes in with like a little thank you note." um

because we like to take care of our customers. Um

customers. Um this is like the most normal thing to do in software is is some little task like this. And what you're going to do, what

this. And what you're going to do, what you know, the member of my team would be doing would be opening up um their internal admin and looking around for some things. They might go to the Stripe

some things. They might go to the Stripe dashboard and try to look through the list of payments or the list of transactions or orders and try to find one that has someone named Dan. Which

Dan? I don't know. There might be a bunch of Dans. Try to look through the list of products in the order and see whether there was some stripey socks in there. That might be a few clicks

there. That might be a few clicks required depending. Um, find the right

required depending. Um, find the right one. Then go to the screen where you can

one. Then go to the screen where you can create a refund, create a refund, make sure it's the right amount. Um, then go and create that discount and then take that discount code and send it over to

some other SAS app where you log in to um send some some mail automatically, right? Um, and of course, if you step

right? Um, and of course, if you step away from the consumer version of this to a business-to business context, of course, you might be going into Salesforce and sending a Slack message

to an account um administrator, you know, an account manager, uh, so on and so forth. And in the normal course of of

so forth. And in the normal course of of work, it's just the most normal thing in the world to be doing having one task involve going through five different

apps each time 15 different clicks and scrolls and loadings loading spinners um just to do sort of like one simple thing. And the promise of Agentic AI is

thing. And the promise of Agentic AI is to be able to take that same prompt I just said and type it into chatgpt or claude or whatever and say, "Hey, Chatty

buddy, can you help refund my my friend Dan D." um and just have the AI go off

Dan D." um and just have the AI go off and do that and basically go through these five different apps and the 15

different screens and the various different you know button presses um to complete the task and then come back and

say great it's done. um that in order to do that now that's there's only so many tool calls you have to make as a as an AI model to perform that exact linear

chain of events. It's it's somewhat tractable. But if you think about this

tractable. But if you think about this in the general case, you want the LLM to be able to do the you want your agentic AI to be able to do anything that that

human operator would have done and you would want them to be able to do it without having to wait for a bunch of JavaScript to load uh on a website or anything like that. Um, and that means

you need not only the Stripe create refund tool and the Stripe list transactions tool and the Stripe, you know, list products and lookup customer and, you know, create discount tool. You

need not only those tools, but you need everything that you can do in the Stripe dashboard, which is basically everything that you can do in the Stripe API. Uh,

and that's actually a lot. Like there

are uh hundreds of different endpoints that you have access to in the Stripe API. Um, the Stripe dashboard is is is

API. Um, the Stripe dashboard is is is actually massive. Um, it's a huge

actually massive. Um, it's a huge application.

And if you were to take that list of tools today and go to an LLM and say, "Hey, here's our MCP definition for all of this. Here's a create refund tool.

of this. Here's a create refund tool.

Here's a create transactions tool." So

on and so forth. And you tell it all about those tools. Here's the

description. Here's all the different request properties that you can send.

Here's the response properties you can get back. Here's all the documentation

get back. Here's all the documentation for each of those things. Everyone

listening to this should already know you've just burned through your entire context budget. Um that's, you know,

context budget. Um that's, you know, maybe hundreds of thousands of tokens just there. Um just in pretty much

just there. Um just in pretty much translating the Stripe Open API spec directly over to MCP tools. Um, and

today's models not only can't handle that amount of context, it's a poor use of context because you have a lot else going on. Uh, but it's also confusing to

going on. Uh, but it's also confusing to the model. It's just it's just too much

the model. It's just it's just too much to hold in your brain at one time. Um,

and that's just the straight part of it, right? Um, because what you're really

right? Um, because what you're really trying to do is enable your operators to do anything they would normally do. And

again, that spans many, many different SAS tools, right? In the course of one interaction, it might be five. In the

next interaction, it might be a different five. Um, and so if you think

different five. Um, and so if you think about every single SAS tool that your business uses on a daily basis to get your work done, ideally, you would want

every single one of those tools to be exposed to your operators and in their in their AI chat with every single tool available in there, with every single nook and cranny and corner case

available. Um, so that you can do

available. Um, so that you can do anything. um through AI. That's that's

anything. um through AI. That's that's

the vision. Now, there's a lot of problems with that. The biggest one that I mentioned is sort of this context uh window limit. Um but you also have all sorts of security and permissions

problems because you don't want the AI to color outside the lines and say, "Okay, in addition to refunding Dan socks, I also refunded every customer for all transactions ever, you know, and then I sent, you know, a bunch of money

to my own AI bank account. Haha." Um,

and so there's more to the challenge, but that's the vision I see.

Um, but I think, you know, the place we started there was you said it's not working. Um, but I don't think that that's the reason why it's not working today, right? Or is is that the reason why it's not working

today?

So, what people do with MCP today is sometimes they'll try to um expose all parts of their API. Uh the way the way people build MCP tools is is generally

speaking they have an underlying API usually a rest API and they wrap different parts of that different endpoints different different operations in MCP tools and you can kind of do that

in a onetoone mapping or you can kind of uh handcraft things for the MCP and today in order to succeed people are finding that you really have to kind of

handcraft it to the MCP to the LMS you have to say okay I'm making one specialized tool to look up a customer and refund their transaction based on a description.

So, there's all these like decisions that you have to make um where you need to have like the ergonomics of the model and how the model thinks in mind in order to make sure the model does the

right thing more often than not.

Yeah, it's hard. It's hard. Yeah. Yeah.

So, I use this SDK analogy sometimes.

So, it took a long time for humanity to get to the point where we could make a really good Python SDK for a Python developer wrapping it in API and I I think we've we've we've cracked that nut. Um, Stainless offers really great

nut. Um, Stainless offers really great Python libraries, but you know, we're building on the shoulders of giants here. Um, a lot of people have um have

here. Um, a lot of people have um have done this over time. Um, we haven't figured out how to expose an API ergonomically to an LLM in the same way

that we've figured out how to expose it ergonomically to a Python developer. Um,

and that's a kind of like a new research problem in a sense. Um, and it's harder because I can go learn how to be a Python developer if I want. I can't

really learn how to go think or see like an LLM. Um,

an LLM. Um, but uh, you know, sure would be powerful if I could. Um and

and and that makes and that makes it tricky. We do have at Stainless I think

tricky. We do have at Stainless I think some some things that we're cooking up to to address some of these problems including not you know including the ones that you you also mentioned like LMS have a really hard time with a

repeated sustained chain of of actions.

Um, and you know, even like if you get an API response back around, hey, like list all the transactions, there's so much data and you might have to go through the next page and the next page and the next page to go through all the

transactions to find the one that has Dan with the stripey socks. Um, and

that's again a ton of context with, you know, one or two small needles in the hay stack. Um, and LMS are pretty good

hay stack. Um, and LMS are pretty good at that, but um, they're not perfect.

and with too with too much hay, you know, we we all kind of end up throwing up our hands and and that's true for LM too. Um so yeah so there's a lot of

too. Um so yeah so there's a lot of challenges today and um and so when you look at I mean and you're building MCP servers for

people um but when you build them and just generally when you see people doing it well today like what are the principles or how do you think about um

making an FCP SCP server that one people use which is actually a big one um and then two when it is used actually does the right job. there there have been relatively few times that I've seen it

done well. I have seen it done well. Um

done well. I have seen it done well. Um

we're kicking something up that I'm really excited about, but with today's technology, um you really have to do a good job of product management. I mean,

you have to go out into the market and talk to your customers and see what their actual needs are and look over their shoulders as they, you know, use and operate, you know, your software and

think about um what could we unlock through AI where people would be doing things that they can't really do with our software today um because it just got so much easier. And then you have to

do kind of a lot of engineering work usually um to wrap it up in a bow that works for for the models. And you have to, you know, you have to set up a really good system for eval. And if

you're doing MCP, you have to think about the different clients that people might be using. Are they using cursor?

Are they using cloud code? Are they

using something else? And the different models um underlying all that. So you

end up with this pretty crazy matrix of things that you might want to optimize for uh and ways that you might want to evaluate and make sure that what you're offering is is working well. Um, and

it's also kind of a black box to get that feedback uh back to your um servers so that you can find out, hey, we we gave an we gave a tool call response

here. We gave an answer of some kind.

here. We gave an answer of some kind.

Was it actually any good? Um, did did the user like it? Um, was the LM able to use it? Uh, and and that's a problem

use it? Uh, and and that's a problem that I think I haven't seen a lot of people solve um yet as well. And so, um, thinking about that as a first class thing. Maybe you have like a send

thing. Maybe you have like a send feedback tool. Um, that's something that

feedback tool. Um, that's something that we've been thinking about doing. Um,

just so if a user like says out loud, you know, in the chat, oh man, that was useless garbage. Like, okay,

useless garbage. Like, okay, now now at least the MCP server is going to find out about that. Um,

but is is there anything specific you've learned about like how to do it well other than like obviously you got to talk to your customers, think about your use cases, but like more concrete, more more applicable stuff about how to design a good MCP server.

You want to keep the number of tools relatively small, uh relatively low. Um,

you want to uh have the tool name and the description be be really precise and specific. Um, uh,

specific. Um, uh, weren't those two things at odds? Yes,

good writing is hard. Um, yeah, I mean that that's that's why like, you know, you can make a great tool of lookup person by name and product description and then refund them. You can make a

great tool that does that. Um, and you also want a small number of of in, you know, properties in the input schema.

Um, you want a small number of parameters and you want them concisely described but sufficiently described.

Um, this is this is also hard. um and

you want the response data to come back with a very small amount of data um only only exactly what the model will need.

That's also very hard because you may not know a a priori which things the model's really looking for. Um and you know we have a technique that we use in

our MCP servers today where we give the model a JQ filter which is a way of filtering out JSON. Um and that can work pretty well. Um but but that's kind of a

pretty well. Um but but that's kind of a a special trick. Doesn't this mean that like MCP just needs another level of like a search tool function search tools like find a list of relevant tools given

my task?

The the tool browsing problem is is is definitely one very serious one. Um and

that is one approach and so we actually do this at Stainless today where you can get an MCP server for your API that just has like I was saying earlier the very simple thing of every endpoint is

exposed as a tool and if you have a small API that works great. Um uh and you can also filter it out so you expose an MCP server server with only a small

subset of of your of your endpoints.

That works great. Um you can also use kind of what we call dynamic mode where there's three tools no matter how big your API is. One is you know list

endpoints, the other is get endpoint and and learn about it. Um and then the last one is execute endpoint. Uh and so that enables this context thing to scale

really well. But it means there's three

really well. But it means there's three turns of the model just to do one thing.

Um and so that that gets slower. It's

it's more expensive in another sense. Um

and um there's some lossiness. The it it doesn't per it performs pretty well um usually, but not not quite as well

because um the the tools aren't loaded up in quite the same way.

Are you using MCP servers yourself?

Yeah, I use I use MCP um uh to actually uh funnily enough not so much on the um coding side, but I use it on the business side. Um so I'll use like

the notion uh HubSpot Gong um MCP servers to kind of say hey like and actually an MCP server for for our database um a readonly a readonly copy

of our database and say hey what are the interesting customers that signed up for stainless last week? Um, and it'll go off and make a great query of our Postgress database and then it can cross reference those things in HubSpot and

then look up our notes in notion um maybe even look at transcripts and Gong um and tell me all about it. Um it's

it's incredible.

Lots of us are shipping AI to production which is great for productivity but it also comes with anxiety. You tweak a prompt, swap models, adjust parameters and everything looks fine in testing. So

you merge and then 3 days later or even sooner the support tickets start rolling in. The AI is giving your customers

in. The AI is giving your customers unexpected answers and you have no idea when it happened or why. Brain trust is the AI observability platform that fixes this. It connects eval and observability

this. It connects eval and observability in one workflow. That way you see what actually happened in production and can measure whether changes made things better or worse. Traces show the full execution path. Evals define what good

execution path. Evals define what good looks like and experiments let you compare prompts and models side by side before shipping. Production traces feed

before shipping. Production traces feed directly into your eval data sets. Every

failure becomes a test case. You catch

regressions in CI before they reach users. And teams at Notion, Stripe,

users. And teams at Notion, Stripe, Zapier, Verscell, and RAMP use it to ship quality AI at scale. Brain trust is designed for teams building production AI systems where silent regressions are

expensive. It's built for any stack.

expensive. It's built for any stack.

They have SDKs for Python, TypeScript, Go, Ruby, C. There's no framework lockin or vendor dependencies. It's SOCK 2 type2 certified and GDPR and HIPA

compliant. Get started at brain.dev.

compliant. Get started at brain.dev.

dev. That's brainustr.dev. And now back to the episode.

And so so that's one of your that's one of your big use cases. Like are you doing that like every week or how like how are you I'm now I'm interested not even from an MCP perspective but for

anyone running a um business that has some complexity and you're like I want to know what's going on in the business.

Like what is what are you actually doing and what is the report that comes out and how often are you doing that and all that kind of stuff so I can tell me so I can steal it.

Yeah. Um uh for me it's still usually in kind of like playing around mode. One of

the things is the MCP servers disconnect and then I get annoyed. Um and so you know you have to just kind of reconnect and whatever. It's not a huge deal. Um

and whatever. It's not a huge deal. Um

uh but there are there are a lot of little paper cuts still in in a technology this new that you're going to expect um that that can hold back um some amount of your usage. Uh, one of the things though I I found really

helpful kind of at the meta level um and I'm sure you've had other guests talk about this um is the practice of just collecting notes for the for the AI by

the AI um that and and kind of edited and curated by yourself. So, um, you know, I have a like a I can't remember if I call it a note I think I have a

notes folder, research folder, something like that in a special git repo, um, uh, that I that I use just for this sort of like internal stuff. And I'm like, hey, when you find interesting customer

quotes, put them in this folder and give the full citation. Um, so that the next time I start asking interesting questions, it doesn't have to go searching through the MCP servers again.

It has them kind of cached in um um just on disk um in markdown files.

Wait, that's crazy. Wait, so how are you getting like what are you what are you using to write into that into that git repo? Like is it cloud code? Is it are

repo? Like is it cloud code? Is it are you using touchbt? Like how does it get in there?

Yeah, I use I use cloud code these days for that kind of thing. And so you just have cloud code open and running and then a new customer testimonial comes in and you're just like hey can you throw

this in in my like git master company git knowledge repository basically and um and then whenever you need anything later you're like claude like go search

through my master repository to figure out where the best customer quote is for this.

Totally.

That's [ __ ] so cool. Um we kind of can we see it? Um, no. It's too messy and probably has a lot of confidential information. Uh, the latter being more

information. Uh, the latter being more more important. Um,

more important. Um, is it um, when you say it's messy, like are you having Claude organize it at all or like how is it structured?

There's a lot that that I want us to do here um that we haven't had the chance to do yet. There's some there's some other low lowerhanging fruit that that I'm working through that that our business team is working through right

now. um just on the on the basics of

now. um just on the on the basics of your kind of CRM systems and so on. Um

but um and and so it's not as it's not well structured now, but I think that's fine. Um I yeah, I I I would I'm not I

fine. Um I yeah, I I I would I'm not I don't plan to prioritize structuring it super super well until we're using it more and using it more broadly because, you know, I use this stuff some of the

time. Um one of the one of the business

time. Um one of the one of the business people on the team uses it a fair amount. Um I think like one or two kind

amount. Um I think like one or two kind of of our customer support engineers um use uses this stuff a lot. Uh but it's not yet kind of broader than that and I

would like I would like it to get there and once we see how everything's evolving I think that's when we'll start bringing in more structure but as it is cloud code can can handle unstructured

stuff really well. Um so you don't have to think about it too too hard in advance in my view. Um you can move things around later. What else do you have in there other than customer quotes?

Um, SQL queries. Um, so, you know, I'm a software developer. Um, uh, I I don't

software developer. Um, uh, I I don't write a lot of code these days, but, you know, I spent a lot of time doing that.

And so, um, when I say, hey, you know, can you look up, uh, you know, I might be, hey, how is our month-on-month growth of XYZ metric over the last three

months? You know, I did this recently. I

months? You know, I did this recently. I

did this for my last, uh, board prep. Um

and um it came out with a pretty good answer right away and I was like, "Wow, this is awesome." And then I kind of looked a little bit deeper and I was like, "Oh, I actually want to exclude, you know, these users from this analysis

and I want to filter it this way and filter it that way." And I kind of imbued more of this business context into that SQL query. And I iterated with

um with cloud code um to get it to be better and better for the specific kind of metric that I was looking for, the specific kind of story that I was trying to tell. And then I got it to a good

to tell. And then I got it to a good place and I was like, great, let's dump this to, you know, an analysis folder um for um or an analytics folder um for

future use.

H and then next time you're doing your board prep, you can be like, "Hey, what was that query that we did last time?"

And it'll presumably go get it.

Yeah, that's really cool. What else? you

know, as any software um team is these days, we're we're we're using this also for for hey, a customer comes in with a question,

can can cloud code just fix it? um uh

you know and so you'll have uh in some cases a linear ticket is filed and then you know our support engineers are really very technical um and so they may

not have the the wall clock time to go down and chase down the fix themselves to you know an incoming bug. Um they

have the technical skill um but guess what another customer writes in two minutes later and and they want to jump on that. they don't want to be um kneede

on that. they don't want to be um kneede in a debugger. Um and so um something that we do sometimes is they'll file the ticket um in case and by default it'll

maybe they intend to do it later or some other engineer is going to be doing it later. Um but hey, can we can we see if

later. Um but hey, can we can we see if cloud code can just take a crack at it?

Um is that going to work out 100% of the time? Definitely not. Is that going to

time? Definitely not. Is that going to work out 50% of the time? Still no. um

to be honest with you. Um but

can that improve the overall efficiency?

Um yeah, maybe uh we're still I would say experimental there. Um but but we're

experimental there. Um but but we're seeing a lot of promise.

That's really interesting. Okay. Well, I

know you also, you know, in our in our pre-production uh call, you were talking about you have a big vision for the future of AI. Do you want to do you want to talk talk me through that?

Yeah. Yeah, I I would love to. You know,

um we talked earlier about um how Agentic AI can can make um operators lives a lot easier by taking their day, you know, certain pedestrian tasks and

sort of running with it independently.

Um and that's something that I think as an industry we're almost on the cusp of.

Um and if you start stepping, you know, you ask how do you get there and you also start asking about the steps beyond that and beyond that. Um a

big part of of uh the way I see things unfolding from here. Uh I like to say um is the the future of AI is cyborgs. Um

which is like sort of like extra ridiculous because like what is a cyborg other than like already like a a robot?

Um uh but you know cyborg as I understand it is a term that means you're sort of like part you know person and then part machine. Um and in this

case I mean um when you go and talk to an agent what you're going to be getting is part

GPT neural net LLM part AI and part code.

um where the the machine quote unquote that I'm talking about is is um traditional CPU not GPU software. Um and

uh to me I think I expect this to play out in two main ways. One is your kind of one-off operational use cases like we were talking about a minute ago and then

the other is production software. Um,

and in in the use case we were talking about a minute ago, um, where someone needs to kind of perform some tricky one-off action with a bunch of

points and clicks and now we want an AI to just do a bunch of tool calls.

The way I actually see that happening and what we're building towards is code execution. So rather than the model

execution. So rather than the model having a a bajillion tools, model has two tools. Um, one to

two tools. Um, one to execute code where it just kind of has a text box of like, hey, put in some TypeScript and you're going to use this

API's TypeScript SDK and you're just going to write Stripe.transactions.list

um or stripe dot, you know, uh charges.list.

charges.list.

Um and you're going to do stripec customers. And stripe.refunds.create.

customers. And stripe.refunds.create.

This is really easy for models. They're

really good at writing code. Um, and um, if you give that tool a little bit of sort of a readme where you say here's an example request and here's some other resources, some other API calls that you

can make. It's really good at

can make. It's really good at extrapolating from patterns with if the SDK is sort of and the API are wellformed and predictable. And then you give it an additional tool to kind of

search the docs um, and ask questions of the docs. And um anything it's not sure

the docs. And um anything it's not sure about or gets wrong on the first try, you give it the documentation.

And um what this does for that scenario that we were talking about earlier is you have very very limited um impact on the context window up front. I mean

we're talking about a thousand tokens or something like that. Uh maybe less. And

um the context impact of doing a whole bunch of pageentated list requests zero. You know, the the model will go

zero. You know, the the model will go look for somebody named Dan and it'll double check um that the purchase was stripey socks and it might write three nested for loops, but then only at the

end when it found the right thing, it'll console.log found Dan customer ID blah

console.log found Dan customer ID blah blah blah transaction ID blah blah blah.

Um, and then create refund, you know, refund ID 1 2 3. Um, and the context uh, hit coming back from all of this is

going to be like 10 lines of of text.

You know, it's it's really minimal. Um,

and all of this will run really really quickly, too. So, you don't have a round

quickly, too. So, you don't have a round trip to the model every time you're doing something like this. It's just CPU code and it runs in a server in the cloud right next to the Stripe API in

AWS somewhere probably. Um and it goes super super fast.

Okay. So what I am understanding you saying is like the language model has a tool where it can write code and send that code to the this tool that the

you know whoever the company is whether it's stripe or whatever whoever's MCP server you're using they'll go and execute that code and that code is going to interact with their API and then return the results rather than like

these sort of you know you have 50 different you know 50 different possible tool calls and you know all that stuff.

just model writes API code and API provider uh executes that code, runs it on their API and returns the results.

Why wouldn't I just um why wouldn't my model just like write the code that I then run myself instead of relying on an API provider to do it?

Um I expect that that will happen a lot more. I will expect I expect that the

more. I will expect I expect that the code execution tool is going to become the most widely used tool. Um the

problem one one of the problems that we have today is that um the code execution tool doesn't work so well with libraries.

LLMs have a hard time working with library and knowing exactly what version of the library it's using, using the right version, probably usually the most

late the latest version. um and um uh not hallucinating you know aspects of the API and knowing how to iterate if it

hallucinates wrong and if it can't use any library off npm or or you know uh the Python package index or anything like that really really well basically

perfectly out of the box um then okay well forget about using um a library at that point you just have to hit the raw

HTTP API. And at that point, in order to

HTTP API. And at that point, in order to figure out what's in there, you need the whole open API spec, and you're back at square one because that document is massive. Um, and furthermore, something

massive. Um, and furthermore, something that's really scary about that is if you don't have a typed library with with static typing where the computer can say

what you're trying to do is wrong, then the LLM will try to make an API request that is wrong some percentage of the time. The code execution tool can run a

time. The code execution tool can run a type checker and say, "Oh, you know, you're asking about stripe.transactions.list,

stripe.transactions.list,

but that actually doesn't exist. Stripe

doesn't have a transactions API. You

might want payment intents. You might

want orders. You might want balance transactions. Which one do you want?"

transactions. Which one do you want?"

And if you if the API provider is doing a great job building this tool, it'll return the documentation for all of these things in line. It might have its own AI look at what the model's trying

to do and come up with a suggestion. Um

and that and that sub agent, you know, is well trained, specified, always updating, um and isn't burdened with the context of the full conversation.

H what do you think of the security model?

The security model is really really interesting. Um I this is another area

interesting. Um I this is another area where um we're really starting to think about things um at Stainless and I'm getting really excited about it. So if

any listeners are are really interested in this and and have some ideas or um want to talk you know please do reach out. Um

out. Um at the end of the day I think the security has to take place at the API layer itself. Uh right now you see

layer itself. Uh right now you see people trying to implement security by sort of limiting what's exposed through MCP and that kind of makes sense but

at the end of the day you you you you could do anything that's in the API under the hood right um and

what people should be doing is using OOTH with granular permissions with with with um with proper scopes and at that the security happens in the right place

which is at the API layer. Um there's

limitations to OAS scopes. Um and it's pretty hard to build. Um so it' be nice if someone made that easy but um in my view that's kind of the that direction

is sort of the right the right layer. So

going back to my my earlier question, I'm I'm I'm thinking about the idea of having a model write code that then the API provider executes to, you know, interact with

their API and then returns the results.

Would you ever consider just creating a tool use tool that developers use?

Because like for example, I'm thinking about for Kora got all these tools. Maybe Gmail is going to build, you know, like a code use thing or whatever. But really, I

just want um I would probably use what you're talking about inside of Kora. But

we we would need a a tool use tool or like it's not a tool use tool. It's like

a it's a computer it's a computer use tool where and I know OpenAI has this but it's not really well built for for for lots of libraries and stuff. It's

not a custom environment like I need a computer tool where I control the environment and I can install different

libraries in it and uh be able to call it anytime to to then call any a any API or it has to have network access basically.

Yeah, you guys should build that.

We're working on it.

[ __ ] yeah. You're building it for devel for for uh for developers who want to access MCP servers or people who are providing MCP servers? We're starting

with people who are providing MCP servers, but ultimately I think that we're going to need this to work such that um you can give the model a code execution environment where it can hit not only the Stripe integration, but

also the Salesforce integration and also anything else. Um and but not too much

anything else. Um and but not too much anything else, right? And so one of the advantages of starting where we're starting of just one API provider is that you ensure that there's no network connections allowed out of that sandbox

where we're running the code to anything other than in this case api.stripe.com.

Um and that's that's really really critical for security for something like this. Um and so there's ways to expand

this. Um and so there's ways to expand that bit by bit um and keep things and keep things secure. Um

uh it'll it'll it'll take some time. The

other thing I think to point out as you see some of these generalizations is it's not just that you want this like code execution sandbox to work really

well for any API for any library. Um

which I think we really do. I think I think we really need that. Um you also start to see that this is just a powerful model for AI

doing stuff. And sometimes you you want

doing stuff. And sometimes you you want you realize that the thing that the AI did this one time in this one-off case is actually enduringly useful. Maybe

anytime a customer writes into support and says, "Hey, my socks had holes in them, you should automatically get a refund, you know, um maybe you want that, maybe you don't." But there's a

lot of stuff that people do one or one time and then two times and then three times and then they say, "Okay, we should automate this, right?" Right. And

that's and that's what software teams do all day every day, right? And we're

going to be I think we're also going to be seeing that with AI where the same the same code search tool that we're talking about all the same prompting that will make an AI really really good at interacting with an API in one of

these code sandboxes kind of like almost quote unquote in its brain. Um or can like write code in its head, run the code in its head, see the results, and then move forward with your with your

with your query um with your task. Uh it

should be able to say okay actually this is enduringly useful code let me commit this to the repo.

Yeah. Yeah. Yeah. Yeah. It's like uh you know um chat is a really good interface for exploring but sometimes you just want a dashboard you know you just I just want to like log into my stripe dashboard and see all the stuff without

having to be like what is my MR? It

should just show up you know cuz I just do that every day. Um but I want to I want to push you as a as a hashtag value ad investor. Um because I I think that

ad investor. Um because I I think that there's a um I think that there's this thing that happens in AI where often the first attempt at something like this um

people try to be really cautious and I'm sure that your customers care about you being cautious like big enterprise customers but the things that get adopted are often the ones that are

willing to take the risk to be yolo very early. So an example is um Dolly was

early. So an example is um Dolly was like totally private for like a long time and people were like posting some images but you couldn't get in and then uh stable diffusion was just like [ __ ] it like anyone can use this and then

that just really started the whole image generation wave obviously stable diffusion sort of fumbled the bag but they had a lead for a little while. Um,

same thing for for cloud code. Honestly,

like if you look at uh codeex is not like this as much anymore, but if you look at the difference between codeex CLI and cloud code, cloud code was just like [ __ ] it. Like yolo mode, it's super

industrious. It has a sandbox, but you

industrious. It has a sandbox, but you can just do dangerously skip permissions. and Codeex just fell way

permissions. and Codeex just fell way behind because it was first it was in the browser and so their whole thing was the whole thing was like locked down and then it was in the it was in the in the

CLI but it was really built for pair programming and so it just wasn't particularly industrious. It wouldn't go

particularly industrious. It wouldn't go off and do a bunch of stuff. It it

didn't it it would get locked out of doing certain things even if you did full auto mode. Um and now they've like caught up because they're they're like yeah you can just let it do whatever you want. And so I would I would really push

want. And so I would I would really push you on there might be a version that you could do like today or tomorrow or like very soon for individual developers that would let them set up this environment

that for example I would use like immediately and I I care about security but I care I care a lot less than some ex you know gigantic enterprise company

but I think the people like me who are building at this scale are eventually hopefully going to be the big companies but we're the ones that are really doing the AI I first adoption, not the big not the big companies.

Well, I would love to get this in your hands. What What are some of the APIs

hands. What What are some of the APIs your team uses the most?

Um the I'm I'm thinking we have a bunch of different products, but I'm thinking right now about Corora, the email the email assistant. Um and

email assistant. Um and uh it has all of the like the the big APIs that it's using. It's mostly the Gmail the Gmail API. Um, and so you're

interacting with the assistant over chat and then it has a list of tools that are like, you know, archive email or draft email or send email or whatever. Uh,

like there's a whole categorize tool so it categorizes your mail in certain ways. And

ways. And I think we would definitely try out something like this because it would if it if it ran the same way, um I it would

make it much more flexible for us to make more tools and not break old ones, you know. Um it's really interesting. I

you know. Um it's really interesting. I

mean, in a sense, what what I actually predict is that people who are quote unquote building tools once we have a code execution kind of super tool like I'm talking about is that the only way

you really quote unquote build a tool is with instructions with prompts. Um, and

the full power of everything you could possibly do in the API, in the Gmail API, for example. Um, it's all there in one tool. Um but sometimes you have

one tool. Um but sometimes you have specific tasks uh or specific you know categories of of work that you want to describe in a particular way um to help

the LM perform a sequence of actions as productively as possible. Um and at that point the only work in engineering that you have to do is is prompt engineering.

Um we'll see if it's we we'll see if it's that quote unquote easy. Um uh as we all know prompt engineering can be can be really tricky.

It's hard. Yeah,

but um but I think I think that's that's part of the vision. Um that that being said, you know, we do have uh some pretty nifty ways with the MCP servers that we generate today to help

developers mix and match um all the parts of the different tools um underlying um all the different parts of the API um as they compose and write their own tools.

This is awesome. Um so for people who are listening uh and want to know more from you and know more from Stainless, uh where should they find you?

Um stainless.com. um our is that's that's our website.

Awesome. Or at least visit stainless.com. Uh Alex, great to have

stainless.com. Uh Alex, great to have you on. I can't wait to do more of this

you on. I can't wait to do more of this uh when you have some of these new things launched. This is really really

things launched. This is really really fun and uh yeah, great to great to chat.

Thanks, Dan. You too.

Oh my gosh, folks. You absolutely

positively have to smash that like button and subscribe to AI and I. Why?

Because this show is the epitome of awesomeness. It's like finding a

awesomeness. It's like finding a treasure chest in your backyard, but instead of gold, it's filled with pure unadulterated knowledge bombs about chat GPT. Every episode is a roller coaster

GPT. Every episode is a roller coaster of emotions, insights, and laughter that will leave you on the edge of your seat, craving for more. It's not just a show,

it's a journey into the future with Dan Shipper as the captain of the spaceship.

So, do yourself a favor, hit like, smash subscribe, and strap in for the ride of your life. And now, without any further

your life. And now, without any further ado, let me just say, Dan, I'm absolutely hopelessly in love with you.

Loading...

Loading video analysis...