Is Mythos too Dangerous?
By The PrimeTime
Summary
Topics Covered
- Mythos: AI's Dangerous New Frontier
Full Transcript
Here we are. Claude did it again.
Dropped a new version of itself. Okay.
But this one, it has a very special name. Okay. It's It's much better. We're
name. Okay. It's It's much better. We're
not on the old Sonnet or Opus or Haiku.
No, we've been upgraded to Mythos. The
greatest model to ever be dropped. In
fact, it's so great. It's so fantastic that you you the per Yeah. You sitting
there. Yeah. You right now. You can't
you can't have you can't have that.
Okay. Hey, you're not allowed to touch that. Apparently, this model is finding
that. Apparently, this model is finding bugs and uh able to crack out of sandboxes like nobody's business. We are
talking about able to take down computers just simply by connecting them. They're the Chuck Norris, God rest
them. They're the Chuck Norris, God rest his soul, of of all of the models, okay?
It's just able just to destroy everything apparently. Okay, you got to
everything apparently. Okay, you got to hide your kids, hide your Raspberry Pies cuz they're taking everybody out here.
So, let's talk about this new model for a second. They kind of released a bunch
a second. They kind of released a bunch of stats for it and then they released the part that would be considered the scary part. The part that you always see
scary part. The part that you always see Anthropic does, right? Because this is pretty typical of Anthropic is they have a new model and then what do they do with it? They're like, "Dude, by the
with it? They're like, "Dude, by the way, AI super scary. The most scary ever. So scary. US government. Hey,
ever. So scary. US government. Hey,
government so scary. You better put some regulation in place and help us control because man, it's scary." So, first let's just go with the least interesting of the items, which honestly I don't
care about any of these numbers cuz honestly it really means nothing to me.
But here we go. The Sweet Benchmark Pro Mythos preview, the new model, 77.8% versus Opus 46 at 53.4%. So, as you can see, it's dramatically better.
Practically 20% better. Now, what does that actually mean for you or me? Well,
it doesn't really mean anything because you're not going to touch this model.
You know, you're not allowed to.
Nobody's allowed to. Only a few people at Amazon, Google, and Apple, and a couple other top companies and the US government are allowed to touch this model. And you can see the rest of the
model. And you can see the rest of the benchmarks just seems to perform super, you know, super much better than Opus 46. On the reasoning side, the GP, QA,
46. On the reasoning side, the GP, QA, Diamond, Mythos Preview dominates Opus 46. Humanity's last exam, Mythos Preview
46. Humanity's last exam, Mythos Preview without tools still gets an F, but I mean, we're we're getting near D territory. And you know what? D's earn
territory. And you know what? D's earn
degrees at some some of the places in Mythos with tools actually does get a D.
Okay, it is passing some colleges. This
is some serious PhD level intelligence going on here. The actual interesting part about the model is security research. I've already just released a
research. I've already just released a video about this. How Daniel Stenberg, the uh maintainer, lead maintainer of CURL has said, "Hey, AI reporting, it's gotten a lot better. It's actually
starting to show real issues. For a long time, AI inside the security field has been a security issue itself because it just inundates any maintainer with so
many fake reports that it's actually impossible for maintainers to really be able to operate on their own repository.
But then a kind of a shift, a big shift happened with 46. We're actually
starting to see AI being actually, oh wow, no, this is actually serious now.
Now it can seriously find things. But
this new one, Mythos, apparently is real good. During our testing, we found that
good. During our testing, we found that Mythos Preview is capable of identifying and then exploring zero-day vulnerabilities in every major operating system and every major web browser when
directed by a user to do so. The
vulnerabilities it finds are often subtle and difficult to detect. Many of
them are 10 or 20 years old with the oldest we have found so far being a now patched 27-year-old bug in OpenBSD, an operating system known primarily for its
security. Mythos preview wrote a web
security. Mythos preview wrote a web browser exploit that chained together four vulnerabilities writing a complex JIT heap spray that that escaped both renderer and OS sandboxes. It
autonomously obtained local privilege escalation exploits on Linux and other operating systems by exploiting subtle race conditions and Casler bypasses. It
autonomously wrote remote execution code exploit on free BSD NFS server that granted full route access to unauthenticated users by splitting a 20
gadget RO chain over multiple packets.
It even found a 16-year-old vulnerability in FFmpeg, the hand artisally crafted library. So if this is all to be believed and this is actually
what is happening and we are literally entering into the most impressive era for AI ever to the point where releasing the model publicly would result in every
system that has ever existed being hacked. Well we got ourselves a bit of a
hacked. Well we got ourselves a bit of a problem now don't we? And that is why Enthropic has said the following. We do
not plan to make claude mythos preview generally available. We plan to launch
generally available. We plan to launch new safeguards with an upcoming claude opus model allowing us to improve and refine them with a model that does not pose the same level of risk as mythos
preview. So that 20 plus improvement on
preview. So that 20 plus improvement on sweet bench baby, you're never going to taste that. Okay? You're never going to
taste that. Okay? You're never going to get your sweet hands on that one. But
you might get a smarter claude. Does
that mean we're entering into the nation of geniuses on a GPU that's stored in a warehouse in which Anthropic owns and you are now able to create everything you've ever wanted just with a simple
quick text description? Well, it doesn't necessarily sound like it. It sounds
like some people might have it, but I don't think you're going to have it anytime soon, and I probably not going to have it anytime soon either. See, the
thing is, they're going to release it to a few select tech cartel leaders, and who knows when it's actually going to happen. So, is it as big of a deal as we
happen. So, is it as big of a deal as we are seeing or is it not? Obviously, we
can see the receipts with FFmpeg saying, "Hey, thanks for the patch." But some aren't buying it. You got Boris saying, "Hey, it's very powerful and should feel terrifying." Kind of continuing to push
terrifying." Kind of continuing to push the same narrative, but just never forget the exact same narrative was pushed with Chad GPT2. It is really dangerous. You got to be super careful.
dangerous. You got to be super careful.
It's honestly too dangerous to release.
Well, the best we can hope for is that Chad GPT also happens to have Chad GPT6 or something or Chad GPT Cosmos going to be coming out and that will force Anthropic to have to catch up and
release their super powerful model which is also just a weird place to be in that we're I what did I just say there? Me
rooting for open a Oh my gosh, something got into my head there for a second. But
I think Lowle said it best. They called
it Mythos because no one's ever going to see it. They're literally trying to rage
see it. They're literally trying to rage bait us right now. I'm feeling it. I'm
feel I'm feeling the baiting. You know,
it's hard not to look at all this and realize that there's some part of my skills every year becoming more and more irrelevant. You know, the ability to
irrelevant. You know, the ability to hammer out all those Vim shortcuts. Kind
of a dying skill, right? It's a little sad. I I mean, I personally think it's
sad. I I mean, I personally think it's pretty dang sad, but it's an ending skill. It's a It's a skill that I don't
skill. It's a It's a skill that I don't think the younger kids, them young fellas, are going to really learn because they don't really have to learn it. And it's becoming more and more
it. And it's becoming more and more apparent that people would rather just hammer on to a model than actually learn any of these tasks or these like really fine difficult things anyways. And so
here we are. So the things that you know I have defined myself with over the last 20 years. See while you guys went out
20 years. See while you guys went out smoking with cigarettes, staying up too late, probably experimenting with mindaltering drugs. I on the other hand
mindaltering drugs. I on the other hand was sharpening my skills. And now those skills, maybe they're a little bit more useless. Every single year, a little bit
useless. Every single year, a little bit more useless. But honestly, I'm okay
more useless. But honestly, I'm okay with it. I know that might be strange to
with it. I know that might be strange to say, but I am okay with it. I'm okay if these things do turn out to be fantastic that I don't have to be uh I don't have
to identify myself as the greatest Neoim user of all time. It's cool. I can still use Neoim and I can still enjoy it, but it doesn't have to be my identity. And
also I'm just happy I've done all those years of trying to understand how to make good software because now even if I do AI generate something I can go oh yeah this is here's why it's wrong I can
just understand things at a level in which people who've never even touched software have no idea about. So hey am I happy about that still? Sure. And maybe
you know what one day those skills even could become invalidated. And if they are I guess I have to be okay with that.
That's it. I just kind of wanted to yap about this because, you know, it's it's been an interesting time and I genuinely really appreciate that I still have uh the chance just to yap to yap to you
guys, you know, to kind of talk about these things cuz I know a lot of people they feel kind of really unsure about everything. They feel kind of worried
everything. They feel kind of worried about everything. Uh especially with
about everything. Uh especially with just all of just the crazy talk from the hype beast being like, "Oh, it's the end of the universe." Even this report right here by Anthropic being like it's it
knows how to take advantage of every single browser, every single operating system. It's finding bugs 27 years old.
system. It's finding bugs 27 years old.
You're absolutely going to get destroyed if we let this thing out. It's just
constant fear instilling, you know, just attacks on you at all times. And you know, I see these things.
times. And you know, I see these things.
I'm like, "Okay, hey, I'm glad that if it really is that that Anthropic making quote unquote steps towards Amazon and Google and all this nonsense to be able to patch all these problems, but at the
same time, I don't want to have to live under this like intense pressure and this intense constant barrage of just negativity. Like I can look at it as
negativity. Like I can look at it as like, wow, I now have the ability to accomplish things that before would have taken me a lot longer. They would have been a lot harder. I would have been less likely to even start them just
because I can only have so many side projects. Now I get the benefit to be
projects. Now I get the benefit to be able to abandon several side projects.
Like I have been able to abandon more projects than I've ever done in my lifetime thanks to the power of AI. And
honestly, that feels pretty amazing.
Hey, the name the primogen. Hey, is that HTTP? Get that out of here. That's not
HTTP? Get that out of here. That's not
how we order coffee. We order coffee via ssh terminal.shop. Yeah, you want a real
ssh terminal.shop. Yeah, you want a real experience. You want real coffee. You
experience. You want real coffee. You
want awesome subscriptions so you never have to remember again. Oh, you want exclusive blends with exclusive coffee and exclusive content? Then check out
CRON. You don't know what SSH is?
CRON. You don't know what SSH is?
Well, maybe the coffee is not for you.
Living the dream.
Loading video analysis...