Claude Dropped Opus 4.7 and It's Not Even Close
By Chase AI
Summary
Topics Covered
- Resolution Revolution: 3x Visual Clarity Transforms AI
- Default Effort Shift: Claude Code Now Thinks Harder
- Token Bloat Warning: Budget Risk on Medium Settings
Full Transcript
So, Opus 4.7 just released and by the numbers, this is a massive upgrade. So,
let's dive in. So, first things first, the benchmarks. Now, they do show Mythos
the benchmarks. Now, they do show Mythos over here on the right just to tease us about things that do exist. But what I really want to pay attention to is 4.7 versus 4.6 because who knows when Mythos
is going to be available. And by the numbers, this is a very solid leap forward, especially in things like coding. If we take a look at a genta
coding. If we take a look at a genta coding, we see a jump from 53 to 64, from 80 to 87, and then from 65 to 69 on
the three big test being Swechen Pro, Swebench Verified, and Terminal Bench 2.0. The only places that we see Opus
2.0. The only places that we see Opus 4.7 benchmarks not on top of all the other models except for Mythos is a Gent
search where we look at GPT 5.4. Is it
89.3 versus Opus 4.6? 7, which oddly enough has dropped versus 4.6, which you know, when you see things like that where they show benchmarks where it's gone down from Opus 4.6, you wonder if they kind of just insert those. It's
like, oh no, this these benchmarks are actually legit, guys. We wouldn't lie about this. See, see this thing? Um, but
about this. See, see this thing? Um, but
5.4 is ahead in aentic search and you also see it ahead in graduate level reasoning. Now, another area we see a
reasoning. Now, another area we see a massive improvement is visual reasoning.
So we jump from 69 to 82. And that might have something to do with the fact that this model has way better vision. So
they are telling us that the images that you put into Opus 4.7 are at 3x the resolution now, which is huge if you're doing anything with like diagrams or small text. And we see those same sort
small text. And we see those same sort of numbers reflected here in these graphs. So improvements in knowledge
graphs. So improvements in knowledge work, vision, huge jump in document reasoning, 57.1 to 80.6, 6, which is a huge plus. If you're someone who uses
huge plus. If you're someone who uses something like co-work, you're using this in an office scenario and all you do all day is feed it documents. Long
context reasoning is also a big one. We
constantly harp on this channel about context rot and the idea that we need to be very focused on session management. I
don't think that changes at all. I mean,
going from 71 to 75 is great. I don't
think you should change how aggressively you clear, i.e. anytime you're at 20% or 25% of the context window, you should be clearing. But this is an improvement. We
clearing. But this is an improvement. We
love to see this. And this one is also interesting. This coding benchmark that
interesting. This coding benchmark that has to do with multimodal. So they're
coding, but this also includes things where they're throwing it context that has stuff like images. And I don't think this is any surprise. And I think a lot of that has to do with the resolution.
Now, besides the model itself, they had a few more updates. The biggest one is more effort control. So now there is a level X high. Probably stole that from OpenAI between high and max. And on top
of that, Claude Code now defaults to extra high. I think that's probably in
extra high. I think that's probably in response to a lot of people claiming that Opus 4.6 was nerfed and then Boris Churnney, the creator of Opus, well not creator of Opus, creator of Cloud Code came out and said, "Well, actually, we
moved the default reasoning level, the default effort level to medium." So, the fact they came out with X high, I think is a response to that in order to make it quote unquote better and try harder
yet not pushing people to max because then it swings to the other side and everyone complains that their usage is filling up. And remember, if you want to
filling up. And remember, if you want to change that, all you need to do is do forward slasheffort and then set your level. The higher resolution is also on
level. The higher resolution is also on the API. And then they've also released
the API. And then they've also released the new for/ ultra review/comand. So it
gets a dedicated review session. On top
of that, they've extended auto mode as well. And if you don't know about auto
well. And if you don't know about auto mode, it's basically just a alternative to dangerously skip permissions. Now,
one thing they note here is that Opus 4.7 is going to use more tokens than 4.6. So they explicitly state that Opus
4.6. So they explicitly state that Opus 4.7 uses an updated tokenizer and improves how it processes text, but that that increases the amount of tokens on
the input roughly 1 to 1.35 times depending on the content type. And then
secondly, Opus 4.7 thinks more at higher effort levels. So not remember that
effort levels. So not remember that because they're setting the default effort to extra high when before it was on medium. And Opus 4.7 uses more
on medium. And Opus 4.7 uses more tokens. So, if you've been on Medium
tokens. So, if you've been on Medium this whole time, you never changed it, and you were already hitting usage rates or usage limits on 4.6, be wary of this.
Understand that you could definitely run into usage issues if you're someone who's already doing that because now it's going to use even more tokens.
What's also interesting is that they've removed extended thinking as well. And
if you want to read more and get kind of a deep dive on this migration, they put out an entire thing in the documentation. So, all in all, looks
documentation. So, all in all, looks like a really solid upgrade, and I'm excited to jump in there and test it
Loading video analysis...