LongCut logo

Everyone Is Sleeping on Composer 2.5

By Web Dev Cody

Summary

Topics Covered

  • Context Window Burns Fast on Simple Tasks
  • Accuracy Trumps Speed and Cost
  • The Real Cost Difference Between Models
  • Use Cheap Models for Simple Tasks Only

Full Transcript

So about a week ago, Composer 2.5 came out and I wanted to give you my opinions on using it. I've been building with it off and on for adding features to Mission Control and also kicking it off for doing refactorings on my existing

projects. So over here in Mission

projects. So over here in Mission Control, I do have Cursor CLI as an option and I will default to Composer 2.5 for a lot of tasks because it's very fast and for the most part it's very

accurate, right? It's not on par with

accurate, right? It's not on par with GPT 5.5 or Opus 4.7. I think it's a little bit below, but overall, if I need to just do a quick little bug fix, this is a really great model to at least try

out. If you haven't tried it out before,

out. If you haven't tried it out before, just try it out. You're going to be blown away with the speed and the accuracy of some of these requests. Now

like I mentioned, I'm not going to hype up this model. I think it's a great model and for most coding tasks, like UI related tasks, this thing works very well. I will say that for the more

well. I will say that for the more esoteric and complex bugs or features, the ones that really span multiple different files and really have complex conditional logic and stuff like that, I

have found defaulting to GPT 5.5 is still my my favorite model, the one I would recommend. Honestly, if you only

would recommend. Honestly, if you only had one model to pick, this is it as of today. This is the model you want. This

today. This is the model you want. This

is the subscription that you want. But

if you do have some additional funds, maybe you're using Cursor or Cursor CLI, Composer 2.5 is great. Check it out, play around with it and I find myself using Opus 4.7 a lot less these days. I

do use it at work all the time. I do

think it's a really great model that does understand the code base. It's has

a million context window, which is also very good, but also GPT 5.5 has a million context window as well. In terms

of benchmarks, it looks like it's almost on par with Opus 4.7 for terminal bench, uh GPT 5.5 still blows them out of the water. Now for the software engineering

water. Now for the software engineering bench multilingual, it's beating GPT 5.5, which honestly, I don't think this is accurate. I Many times I have to use

is accurate. I Many times I have to use GPT 5.5 to fix bugs and implement the hard stuff and then I fall back on Composer 2.5 if it's something that's kind of simple. But I will say often these benchmarks are pretty accurate

with how good they are. I mean, I code with them for like a couple days straight to get a good feel for them.

And overall, I do like this model. If I

only had the code with Composer 2.5, I would be pretty happy. The The issues with Composer 2.5 is the context window.

I find the context window to be very small. So, if you do have a very simple

small. So, if you do have a very simple bug or a simple implementation new feature you want to add in, it's great for that. Like, if you just want to add

for that. Like, if you just want to add a new button and that button has to go and change some back-end code, maybe it modifies some schema, maybe it needs to run some tests or write some tests, it's a good model to pick. But, the context

window does have a limitation. I find

that after just one prompt, I typically hit 50% of my context window, which in comparison to GPT 5.5 or Opus, I mean, I have a lot larger of a window. I can

keep on prompting it, and then at some point I can either compact that or throw it away. And so, I do leave that for the

it away. And so, I do leave that for the harder models. Now, just to kind of demo

harder models. Now, just to kind of demo this project live, I'm going to switch over to my Mission Control project. I do

use Mission Control to build out Mission Control. And then I'm going to load up a

Control. And then I'm going to load up a new session over here with uh I can switch these to Cursor CLI. We can load that up. I have a bunch of extra things

that up. I have a bunch of extra things I should probably delete. And now we have a Composer 2.5 model ready to go. Okay. So, there's

a small bug in this application. When I

click on the setting icon, which is behind my head, you'll notice that it loads the settings just fine, but when I collapse it or press escape, it kicks me back to the main dashboard. There's a

small bug. When I load up the settings panel, it works fine, but when I press escape or collapse it, it seems to redirect me back to my main dashboard route. When the settings panel closes,

route. When the settings panel closes, we should not be redirecting the user anywhere.

Okay, again, this is a really simple bug. Any of the models will probably be

bug. Any of the models will probably be able to kick this and basically ship it out of the park. And I don't even need to tap into my skills. There's a ship skill that I typically use for all my prompts, so I would say like use the

ship skill. It just helps analyze the

ship skill. It just helps analyze the code base a little bit better, and all the code that's written is going to be covered with test. Maybe I'll kick it off just to kind of demo that. But, for

the most part, you can one prompt many requests just like this. For example, I use composer to add in this GitHub branch switcher, to add in the work trees. I do have work tree support now

trees. I do have work tree support now in Mission Control. Now, this was a really stupid example, cuz honestly, the fix was what? We did two two line changes. I can go ahead and just look at

changes. I can go ahead and just look at the diff over here. We can kind of just see, you know, it just changed the settings and it changed the route.

That's it. So, now I can actually just refresh this page and we can test this out. All right, so let's just click on

out. All right, so let's just click on settings. I'm going to go ahead and

settings. I'm going to go ahead and collapse it and then everything is working as it did. Now, I will say there's another bug. If I hide my head again, when the settings panel is open, I can't click on this button to collapse it again. Okay, so that's another bug.

it again. Okay, so that's another bug.

I'm going to go ahead and just reuse that same session, cuz it already kind of dived into my code base and it knows about what we just did. There's also

another small bug. When I click on the settings button to collapse the settings menu, it doesn't seem to work anymore.

Now, I don't think composer broke this.

I think this was just already a bug in my code base. But, we can just go ahead and prompt this off real quick and I think that would probably fix it real fast. But, as you can see down here,

fast. But, as you can see down here, we're already at 28% con- text window.

From one small little bug fix, it had to dive through all my code base, it had to truly understand what was going on and I'm already at like, you know, a fourth of the context window used up. Which you

can tell from a larger feature or a larger refactoring, it kind of burns through that context window extremely fast. All right, so let's just

extremely fast. All right, so let's just do a refresh. Looks like it's done. I'm

going to go ahead and just click on the settings button. Now, unfortunately,

settings button. Now, unfortunately, it's still broken. Like, if I click on the settings and click the button, it doesn't actually collapse it. So, this

is a realistic example of what I'm seeing with composer 2.5. Or some things, it just doesn't get

2.5. Or some things, it just doesn't get it right, right? And you do have to come back in here and you have to keep on re-prompting it. And I'll probably tell

re-prompting it. And I'll probably tell it again, "Hey, like this still doesn't work. This still doesn't work when I

work. This still doesn't work when I click on the settings button, it does load the settings menu, but then when I click it again, it doesn't collapse it. I will state that using escape does properly collapse it

and it shows me the same route I was on previously, but for some reason, manually clicking the settings button does not properly hide the settings route/panel anymore. These models do not

route/panel anymore. These models do not one-shot everything. Uh this is

one-shot everything. Uh this is something that GPT-5.5 I guarantee you probably would have fixed the first or second prompt. But the cool thing about

second prompt. But the cool thing about Composer 2 is that it's very cheap. It's

almost like 10 times cheaper than using Opus, but it has like the same benchmarks as Opus. So, you can do a lot more requests with a fraction of the cost, which is really great. Especially

if you want to do like refactoring or stuff and you have a lot of tests in place, Composer can kind of go through your whole code base and quickly refactor things. But I have I've noticed

refactor things. But I have I've noticed that I do have to re-prompt it a lot.

Sometimes it will get my application to a broken state, and I do have to re-prompt it to get it to be fixed.

Whereas something like GPT-5.5 often just does it correctly the first time.

And I would say that I tend to lean on models that are just accurate. If I have to wait an extra 2 3 5 minutes, I'm okay with that. If the end result means I

with that. If the end result means I don't have to go and re-prompt it or fix something that it broke, I'm okay with waiting a little bit longer. Accuracy by

this point is the most important thing.

Okay, so now it is totally fixed. It

took about three prompts to fix this. I

do want to press escape to verify that works. Okay, that works fine. I will

works. Okay, that works fine. I will

state that the animations don't work when I click the the button to close it. So, that's

something we can also fix. This is

great, you fixed it, but I noticed when I click on the button the animations don't slide the settings menu away like it does when I press escape. Can you

debug and fix why that's happening? And

so, the fourth prompt, let's go ahead and just refresh one more time and then we're going to click on this.

Look at the way, it does slide away.

That's good.

Um pressing escape works. Make sure it works on different pages. Yeah, this is all looking pretty good now. Okay, so it did take four prompts to fix it. Now, I

can't really tell you how much this cost me for doing these prompts because I think right now it's still in the free period, which is unfortunate because it'd be nice to actually see how much this would have cost me if this was um

not included already. But I will say that the model is significantly cheaper.

If you look at this graph over here, the orange is composer 2.5, the Y axis is how accurate the model is, and the X axis is the cost. So, you'll notice here Opus 4.7

cost. So, you'll notice here Opus 4.7 extra high, which is like the default most of us are using in cloud code, it's about $7 per task.

Versus composer 2.5 is around a dollar, even maybe 75 cents per task. And it's

saying that it's even more competent than Opus 4.7, which I find great for like kicking off CLI tools, like for example doing background task or asking composer to create a pull request. All

those like, you know, the minutia stuff that happens in our day-to-day, using a cheap model that's accurate just to like do a terminal command to get it done, and you're going to save a lot of money and not spend a bunch of time waiting for it. Versus this, you're going to

for it. Versus this, you're going to spend a bunch of money for a simple task, you're going to sit there waiting for it like a minute or 30 seconds for it to just, you know, do a grep over your code base to find a single file for you. Versus this thing can get it done

you. Versus this thing can get it done extremely fast, and it's also pretty accurate in terms of its performance.

But going back to my original point, I would probably pay the premium if every single request, like a large feature request or a plan, is implemented into end and works perfectly like the first

time, right? I would pay a premium to

time, right? I would pay a premium to make sure that it is accurate. And I

would say that Codex or GPT-5.5 is the model for doing that. It is about, you know, $4.5 per task, but if your tasks are actually large tasks, then I think this is still the best model. But I do

like using composer. I find myself using it all the time when I'm adding new features. I'm like, "Oh, I just need to

features. I'm like, "Oh, I just need to fix a small thing." Or "Oh, I need to add in a new feature that does X, Y, and Z." I will probably kick it off in

Z." I will probably kick it off in composer first. Now, later on this

composer first. Now, later on this article they do talk about how this is basically built. Like they use like

basically built. Like they use like Kimmy 2.5, and they did some reinforced learning to kind of make it better.

I don't want to talk about that stuff. I

just want to kind of do a benchmark against like a simple bug fix like we just did. Okay, so we've verified that

just did. Okay, so we've verified that this is fixed now. It took us four prompts, or maybe I think it's four or five, right? So, I went ahead and

five, right? So, I went ahead and committed those fixes to my branch, but what I also want to do is I'm going to check out that commit, right? I'm going

to go back and find this commit before I had that fix. And we're just going to go ahead and check it out here.

And then I'm going to try to do the exact same prompt as we did before. I

might have to go find it.

Uh let's go here. There's a small bug.

I'm going to grab all this code.

Or sorry, this text. I'm going to go and make a Codex session.

And we're going to try to fix it with the exact same prompt using GPT 5.5 extra high. Now, I'm going to paste in

extra high. Now, I'm going to paste in the exact same I do have to specify though, do not look at Git logs because we already fixed it, right? And these

models are smart enough to go and do like Git logs to figure out if maybe it broke something along the way. This is

not 100% a good comparison because, you know, you should have the exact same prompt. I should have been able to check

prompt. I should have been able to check out this shawl in isolation. But, you

know, it is what it is, right? Just

we're trying to do a little bit of testing to make sure that these models actually do what they're supposed to. All right, so let's just do

supposed to. All right, so let's just do a hard refresh and load the settings and then I'm going to close them out. Now,

the animations still are kind of broken uh when I close it. I think it's also when I just click on it. Let's see.

Yeah.

When the settings panels are open and I press escape, the panels do animate away, which is great. But, when the settings panel is open and I manually click on the settings button with my mouse, it doesn't animate away. It just

disappears.

Can you fix the animations to make it work consistently for both? Now, one thing I wish I did a

for both? Now, one thing I wish I did a little bit better was track how long that original prompt took. I will say GPT 5.5 took maybe 4 minutes to do its first bug fix and then the second prompt

may take another minute or two. But,

Composer, you have to kind of prompt it more along the way. And so, I don't know, maybe at the end of the day it's still the same amount of aggregate time spent on waiting for these models to run. But, one requires a human in the

run. But, one requires a human in the loop much more versus GPT 5.5 seems to just get it done. All right, so now it's just running a bunch of checks. I'm

actually I'm going to stop this cuz I actually have to end this video. I'm

going to go ahead and just refresh.

We're going to see if the animations are fixed now. And they are. Okay. So again,

fixed now. And they are. Okay. So again,

that was a really simple bug. Composer

took four prompts.

And GPT-5.5 took two prompts. Overall,

the amount of time elapsed probably equivalent. I will say that Composer is

equivalent. I will say that Composer is probably a lot cheaper. But you do have to be more in the loop and kind of re-prompt it and do much more manual testing. So that could add to the actual

testing. So that could add to the actual engineering time, especially if you're not really familiar with how to verify these things manually. You have to keep on clicking through the UI, make sure it works. Whereas I found GPT-5.5 still

works. Whereas I found GPT-5.5 still super accurate. And if you do want a big

super accurate. And if you do want a big feature, you kick it off as a background task. You come back and then usually

task. You come back and then usually kind of works. Okay. So that's what I wanted to kind of talk about. I wanted

to kind of give a realistic scenario of a really simple bug. Um maybe I'll make some more complex test scenarios in the future. We can kind of talk about that.

future. We can kind of talk about that.

I can make a video on it. But that's

about it. If you guys enjoyed this, leave a comment. Have a good day. And

happy coding.

Loading...

Loading video analysis...