GPT-5.2 is a total monster
By AI Search
Summary
## Key takeaways - **Bees Enter Single Hive Entrance**: GPT 5.2's beehive simulation has all bees coming from a single entrance, which is more correct than Gemini 3 Pro or previous GPT5 where bees dispersed everywhere. [01:34], [02:07] - **Full Photoshop Clone Works**: GPT 5.2 created a Photoshop clone with brushes, layers, edit history, filters, blending options that all work out of the box, better than Gemini 3 which had non-working features. [03:10], [05:46] - **Dual Spheres Reflect Each Other**: GPT 5.2 simulated two metallic spheres that accurately reflect each other and update reflections when moved, the first model to handle this correctly unlike others. [09:05], [09:20] - **Working Excel in Windows Clone**: GPT 5.2's Windows 11 clone includes functional Excel with formulas like SUM and multiplication that compute correctly, while Gemini 3 Pro could not create working Excel. [12:26], [13:12] - **Finds Waldo After 13 Minutes**: GPT 5.2 took 13 minutes using Python code, zooming, and red stripe detection to correctly circle Waldo in a crowded image. [19:10], [20:55] - **Beats Humans on GDP Val**: GPT 5.2 is the first model to beat expert-level human workers over 50% of the time on GDP Val benchmark spanning 44 real-world jobs across top US industries. [27:45], [28:12]
Topics Covered
- Bees Simulate Realistically from Single Entrance
- Photoshop Clone Masters All Tools
- Spheres Reflect Each Other Accurately
- AI Beats Human Experts Half Time
- Benchmarks Hide True Comparisons
Full Transcript
OpenAI just dropped their latest and best model, GPT 5.2. So, in this video I'm going to go over all the cool things that it can and cannot do and compare it with the leading models out there like
Gemini 3. Plus, we're going to go over
Gemini 3. Plus, we're going to go over its specs, performance, and benchmarks and some really important details that you might have missed. Let's jump right in. First, let's just start with some
in. First, let's just start with some demos. So, I'm just on Chat GPT, and I'm
demos. So, I'm just on Chat GPT, and I'm on the plus plan. And at the top here, I can select GPT 5.2 to thinking which is the most performant variant. Note that
all the top AI models out there can already do simple stuff like writing emails or blog posts or stuff like that.
So, in this video, I'm really trying to test its limits with some really tricky prompts that involve some like multi-step coding or reasoning or visual analysis tasks. For my first prompt
analysis tasks. For my first prompt let's try this. Make a visual simulation of a beehive construction showing hexagonal cells forming worker bee paths and honey storage. includes sliders for
colony size and resource availability.
Put everything in a standalone HTML file., Let's, press, run., All right,, that
file., Let's, press, run., All right,, that was pretty quick. It only thought for 19 seconds. Let me expand its thought
seconds. Let me expand its thought process so you can see how it thinks first. So, it's first planning the HTML
first. So, it's first planning the HTML for the colony construction. Then, it's
refining resource effects and worker B movement. Then, simplifying B task
movement. Then, simplifying B task selection and movement. And that's
pretty much it. So, it's pretty quick.
It definitely doesn't waste your tokens if it doesn't need to. And then
afterwards, it gave me this HTML code.
So, here is all the code. It's a really long snippet of code. I'm just going to copy this and open it in a new HTML file. All right. And here's what we get.
file. All right. And here's what we get.
Now, this is actually very different from the other bee colony simulations I've created before with the other leading models in that for this one, all the bees come from a single entrance
which is actually more correct. So, a
bee colony actually creates these cells inside the hive, right? And there's only like one main entrance where the bees fly in and out of. So, this is actually more correct than what I got from like
Gemini 3 Pro or the previous version GBT5 where the bees just kind of disperse everywhere. You can see that
disperse everywhere. You can see that the bees all look very nice. Plus
they're all filling up these cells with honey. The animations look pretty much
honey. The animations look pretty much perfect. And if I increase the colony
perfect. And if I increase the colony size, we do get a lot more bees dispersing. And then if I increase the
dispersing. And then if I increase the resource availability, the cells do tend to fill up faster. if I decrease the resource availability to like 7% let's say then the cells do fill up a bit
slower. You can also see some stats up
slower. You can also see some stats up here plus the build rate and the number of workers. So everything just works and
of workers. So everything just works and the animation looks great. This is
actually the most realistic animation of this beehive construction I've seen so far. All right, my next prompt is even
far. All right, my next prompt is even trickier. Let's get it to create a clone
trickier. Let's get it to create a clone of Photoshop with all the basic tools including brushes, layers, edit history filters, blending options, and more. And
then actually what I can do with chat GPT is down here I can click on canvas so that this code can actually be previewed in a right side window as you'll see in a second. Let's press
generate., All right,, here's, what, we, get.
And again, it only thought for 19 seconds, which is kind of surprising how fast it took for it to create an entire clone of Photoshop. But here is basically its thinking. It's very short.
And let me click on preview. So you can see the Photoshop app in this right side window. Everything looks good to me so
window. Everything looks good to me so far. Let's start drawing over this. So
far. Let's start drawing over this. So
let me draw a few lines here. So, the
brush works. Let's increase the brush size and then change the color to something like this. So, increasing the brush size also works. Let's also add a
new layer here. And then let me decrease the brush size and draw some more stuff.
Perfect. Let me also try this opacity slider. All right. Very nice. So
slider. All right. Very nice. So
opacity works. And then brush hardness.
Let me also decrease the brush hardness.
And then let's make a yellowish brush this time. Let me increase the opacity.
this time. Let me increase the opacity.
So the hardness also works. You can see that the edges are a bit softer. And
then let's also change the layer opacity. So this also works. Very nice.
opacity. So this also works. Very nice.
Over here I can also click on this eye icon to basically hide this layer. And I
can do the same for the layer behind it.
Very nice. And by the way, all the history is actually recorded over here.
So the edit history also works. Next
let me try eraser. And let me decrease the brush size and erase over some of this. And this also works very well. Let
this. And this also works very well. Let
me try move. And move also works. Very
nice., All right., Next,, let's, play, with some filters. So, let's try grayscale
some filters. So, let's try grayscale and apply that. So, grayscale works. And
note that it's only applying this to the layer I selected, which is what it should do. Next, let's try invert and
should do. Next, let's try invert and apply that. So, invert also works. In
apply that. So, invert also works. In
fact, let me press Ctrl +- Z to undo those. So, it's not in grayscale. And
those. So, it's not in grayscale. And
then, let me try invert again. So that
works. And then let's try sepia. That
also works. Let me press undo again. And
then let's try blur. So blur also works.
You can see it is getting blurrier. Very
nice. And then let's try sharpen. So
sharpen also works. And then let's try edge detect. That also works.
edge detect. That also works.
Brightness. And let's make this super bright. And press apply. So that also
bright. And press apply. So that also works. Let's try contrast. And let's
works. Let's try contrast. And let's
increase the contrast. So that also works. And then afterwards we have hue
works. And then afterwards we have hue rotate. And you can see as I adjust the
rotate. And you can see as I adjust the slider, the colors are changing in hue in the background. So that works. Let me
press undo. And then finally saturation. Let's make this a bit less
saturation. Let's make this a bit less saturated. And then press apply. And
saturated. And then press apply. And
that also works. So all these filters just work out of the box. Let me press undo. And next, let's also play with the
undo. And next, let's also play with the layer blending. So first we have normal.
layer blending. So first we have normal.
Next we have multiply. So that works.
Then we have screen. That also works.
Overlay, darken, lighten, color dodge color burn, hard light, soft light. All
of these blend modes just work right out of the box. This is pretty incredible.
This is the most thorough Photoshop clone that I've seen so far. Much better
than Gemini 3, which looks like this.
Some of the features don't really work.
And then also better than GPT5, which was also not bad, but this new one just has a lot more features and everything just works right out of the box. Super
impressive., All right,, next., Let's, also test how good it is at generating a 3D scene from an image. So, I'm going to copy this image and paste it in here and then ask it to create a beautiful 3D
scene inspired by this image. Use 3JS in a single HTML file. This is a library that's used to create 3D animations online. All right. And here's what I
online. All right. And here's what I got. I forgot to press the canvas
got. I forgot to press the canvas button, so I can't preview this in a side window, but let me just copy this code and paste it in an HTML file and then open it up here. So, here is the 3D
render from this model. It's not bad. I
mean, all the elements are there. Let me
also split screen this so you can view the render from Gemini 3 Pro. So, it
seems like GPT 5.2 does have more detail, especially with the cherry blossoms. And then for the Pigota, both of them are not great. So, I don't know.
It's a really close call between both models. Both of them are not perfect. I
models. Both of them are not perfect. I
would say it's a tie. All right. Next
let's try this prompt, which no AI model could get correctly so far. Develop a
simulation of two metallic spheres suspended above a street scene. Notice
that GPT5 could already do this for one metallic sphere, but none of them could do two metallic spheres. Use any
publicly available 3D street environment. Allow adjustable parameters
environment. Allow adjustable parameters such as reflectivity, roughness, and other material properties of the sphere.
And then here are some key phrases I like to use to make sure everything is working. Let's press generate. Here's
working. Let's press generate. Here's
what I got. And this time it thought for a lot longer. So, let me expand the thinking process for you. And here is how it thought through this prompt. So
it's assembling the simulation. It's
choosing a 3D street view, etc., etc. And then it's actually going online to search for publicly available street panoramas, and then it's opening this one, etc., etc. And then it's implementing ray tracing with
accumulation in WebJL 2. It's
implementing path tracing pseudo code and controls yada yada yada. And then
finally, it's creating and finalizing the HTML. And then it's also working on
the HTML. And then it's also working on the ray generation and shader.
Afterwards, here is the code that I got.
I forgot to select canvas, so I can't actually preview this in a side window but, what, I'm, going to, do, is, copy, the code and then paste it in a blank HTML and, then, open, that, up., All right,, so
here is what we get. Let me expand the ray tracing controls. Very interesting.
So it does load up a street map plus this ground here. Let's play around with these settings. So here is exposure. So
these settings. So here is exposure. So
the exposure works. It increases the exposure of the environment. And then
environment intensity. That also works for sphere A. Let's change the height of this. So that works. And then here is X
this. So that works. And then here is X and Z. So that also works. First, let me
and Z. So that also works. First, let me change the base color to just white cuz I don't like the colors of the spheres currently. They look kind of ugly. And
currently. They look kind of ugly. And
then let's change the roughness all the way to zero. Same with sphere B. I'm
going to change this color to white and then change the roughness to zero. Now
the really awesome thing about this and why I prompted it to make two spheres is because I wanted to see if the spheres can reflect within each other as well.
And as you can see here, indeed, the sphere does reflect in the other sphere.
None of the other AI models could actually get this. So, this is super impressive. Note that if I move this
impressive. Note that if I move this sphere, the reflection in this sphere is also accurate. That is so cool. This is
also accurate. That is so cool. This is
the first model that can handle this.
All right., Next,, let, me, change, the, base color to something super ugly like red.
So, that also works. Let me also increase the radius. So, that also works. Notice that as I increase the
works. Notice that as I increase the size, the reflection here also increases. Very nice. And then for
increases. Very nice. And then for roughness, let's play around with the slider. So, I'm going to increase it.
slider. So, I'm going to increase it.
So, that's what happens. For metalness
I'm going to increase it all the way to one, which does this. If I decrease it that's what happens. And then for reflectivity, again, I'm going to decrease it and then increase it. So, it
looks like it's doing something. It
looks like the settings do work. And
then let's change this back to white.
And then for the other sphere, let me also just quickly play around with the settings. The X, Y, and Z sliders work.
settings. The X, Y, and Z sliders work.
The base color also works. Very nice.
And then the roughness and metalness also works. And reflectivity also works.
also works. And reflectivity also works.
And notice that when I change it, it also changes the reflection on the other sphere. Very nice. And then we also have
sphere. Very nice. And then we also have this ground color. So, let's change the color to something super ugly, like red.
So that also works. And then for roughness, let's also change this a bit.
So if we decrease the roughness to like zero, then it basically reflects the sky. Let me change this back to white. I
sky. Let me change this back to white. I
mean, everything just works. And the
reflections on the spheres are physically correct. Super cool. If you
physically correct. Super cool. If you
compare this with like Gemini 3, there was a lot of errors with that generation, and it could not even generate a reflection of the sphere on the other sphere. So this one from GPT
5.2 is the most accurate generation I've seen so far. Very impressive. All right.
Next, let's try something equally as challenging. Make a clone of Windows 11.
challenging. Make a clone of Windows 11.
Use the original wallpaper. On the
desktop, there should be icons for MSWord, Paint, Calculator, and Chrome.
But wait a minute. Gemini 3 can already do these programs. So, instead of Paint and Calculator, I think these are way too easy. Let's try Excel and
too easy. Let's try Excel and PowerPoint. This is going to be way
PowerPoint. This is going to be way harder. And then each program should
harder. And then each program should work. The Start menu should also work.
work. The Start menu should also work.
Put everything in a standalone HTML file. And let me remember to select
file. And let me remember to select canvas so we can preview this in a side window. Let's press generate. All right
window. Let's press generate. All right
here's what we get. And this was super quick. It only thought for 10 seconds.
quick. It only thought for 10 seconds.
And that's basically its thinking process. Its thoughts are definitely way
process. Its thoughts are definitely way shorter and more concise than Gemini 3 or DeepSeek, which tend to think for a lot. Anyways, let's click on preview.
lot. Anyways, let's click on preview.
And here's what we get. So, it doesn't really look like Windows 11 desktop. The
start menu also looks kind of weird, but nevertheless, let's try out each app.
So, I'm going to click on Word. And this
does open up Word. Let's try typing something in. So, that also works. Let
something in. So, that also works. Let
me try to bold this. Very nice. And then
italics. And then underline. So, all
three of these settings work. Let me
also try to change the font of this. So
the font works as well. Let's try
clicking save. So, it does say saved here, but I don't think it actually saved anything. And then let me try to
saved anything. And then let me try to move this window around. So that works.
Let me try to minimize this. That also
works. And then let me expand this. That
also works. And then finally, let's exit out of this. And then try Excel. So here
we have Excel. Let's actually see if this works. So let's try one and then
this works. So let's try one and then two here. And then for the formula
two here. And then for the formula let's try something like equals the sum of a1 to a2. And that also works. Very
cool. Or let's try this. Let me add a cell here. And then let's set this equal
cell here. And then let's set this equal to a3 minus a2 which would give me 1 which is correct. This is 3 - 2, which is, 1., All right,, let's, add, another
is, 1., All right,, let's, add, another number here. And then let's set this
number here. And then let's set this cell equal to a3 * b3. And indeed, it gives me 15, which is correct. Very
cool. So this is actually a very basic but working version of Excel. I actually
tried the same prompt in Gemini 3 Pro but it could not create a working Excel.
You can see that I can't even click on any of these cells or do anything. So
very impressive. Next, let's try PowerPoint. And this also seems to work.
PowerPoint. And this also seems to work.
So, let's try changing the text here.
And then, let me add a new text block. I
can also move the text block around and edit the text. Very nice. And then here is a new slide. Here's the second slide.
And then I can also add a new slide here. That's pretty much it though. I
here. That's pretty much it though. I
can't like change the background or anything or add images. So, it's just a very basic render of PowerPoint.
Anyways, let me also press present. And
present actually works. So, I can also use the arrow keys to navigate to the next or previous slide. And then I can press escape to exit. Everything just
works. This is super impressive. Again
if I try the same prompt in Gemini 3 Pro, it gave me this, which looks a lot more like PowerPoint, but I can't even like insert any text. So, if I click on this, there is no drop down. I can't
insert a new slide. If I click here notice it's not doing anything. None of
these buttons actually work. All right.
Next, let's also try Google Chrome. So
here's what Chrome looks like. Here it
says this is a standalone HTML demo, so it can't embed arbitrary websites. Let's
just try Google and see if that works.
Yeah, it seems like none of these work for security reasons. Nevertheless, it's
still impressive that it's able to code a working version of Word, Excel, and PowerPoint all in this one desktop clone. Let me also check the Start menu.
clone. Let me also check the Start menu.
And then let me open up Word again. And
that also works. So the Start menu works. Let's search for something like
works. Let's search for something like Excel. And that also works. So it's not
Excel. And that also works. So it's not perfect. The icons kind of look weird.
perfect. The icons kind of look weird.
It doesn't use the default Windows 11 wallpaper, but most things are functional. I'm sure with a few more
functional. I'm sure with a few more prompts, we can get this to look even better. All right, final coding example
better. All right, final coding example and then we're also going to move on to some other cooler stuff. Anyways, let me try to also get it to generate an interactive 3D night sky viewer with labeled constellations. Most of the top
labeled constellations. Most of the top AI models can't really do this in just one shot. And here's what we get. It
one shot. And here's what we get. It
thought for 41 seconds. Here is its thinking process. And here's the code.
thinking process. And here's the code.
Let's click on preview. And here's what we get. And I'm actually really
we get. And I'm actually really impressed that it was able to get this in one go. And the constellations do look mostly correct. Let me also try to
find the Big Dipper, which should be Ursa. And here we go. Here is Ursa
Ursa. And here we go. Here is Ursa Major, which is the Big Dipper. Now, let
me auto rotate this. And then let me also play around with the sliders. So
this basically increases the number of background stars. This is the star size.
background stars. This is the star size.
So, increasing the star size also works.
And then increasing the label size also works as you can see here. And then the line opacity also works as you can see here. Very nice. everything just works.
here. Very nice. everything just works.
Again, it's not perfect. Some of the lines of the constellations are missing but I mean, you can try this with Gemini 3 Pro or like GPT5, and often they're not able to code up a working demo in
just one prompt. So, again, I'm really impressed by the coding abilities of GPT 5.2. Everything just works. If you want
5.2. Everything just works. If you want to supercharge your productivity definitely check out Skywork Super Agents, the sponsor of this video. Think
of this as like an army of advanced AI agents right at your fingertips. You can
get it to autonomously do research create reports, spreadsheets, slides web pages, and even podcasts. They have
a new AI poster feature which allows you to create posters with zero design skills. It integrates the best image
skills. It integrates the best image model out there, Nano Banana Pro, to make incredibly beautiful, clean, and accurate posters. For example, let's get
accurate posters. For example, let's get it to make a poster of the top five tourist destinations of Thailand. And
here's our result. I can prompt it to make this more minimalist like this or more detailed like this. Notice that
this poster agent integrates deep research. So you can be sure that all
research. So you can be sure that all the content is factually correct. I can
also click anywhere in the photo to edit it further. For example, let's turn this
it further. For example, let's turn this photo into sunset. And here's what we get. Or here's another example. I can
get. Or here's another example. I can
drag and drop any product photos or people into here and then get it to create a poster with these assets. For
example, let's get it to make a flash sale poster with these items up to 80% off. The store name is called Aura
off. The store name is called Aura located at this address. And here's my result. I can prompt it further to give
result. I can prompt it further to give me this poster in different styles. In
addition to the poster agent, Skywork also has an agent for creating reports slides, sheets, websites, and more. So
whether you want to make sales decks research reports, educational materials marketing pitches, or anything else Skyworks Super Agents is the best platform to use with the highest quality
and accuracy. Try it today via the link
and accuracy. Try it today via the link in the description below right now. The
awesome thing about GPT 5.2 is it's multimodal, so it can also take in images. So what I'm going to do is feed
images. So what I'm going to do is feed it this image of a ton of Demon Slayer characters. And let me just drop it in
characters. And let me just drop it in here. And let me ask it to label each
here. And let me ask it to label each character in a bounding box with their name. Let's press generate. And here's
name. Let's press generate. And here's
what I got. It thought for a long time now. Here's its thinking process. And
now. Here's its thinking process. And
it's actually pretty interesting. So it
was able to identify that this is from Demon Slayer. And it seems to just know
Demon Slayer. And it seems to just know all these characters right away. So
next, it's proceeding to write the code to label the characters with bounding boxes. Then it's analyzing the image.
boxes. Then it's analyzing the image.
It's defining the bounding box for this character, etc., etc. And then it's doing all of this. So notice GPT 5.2 is really good with tool use. It's pulling
this Python tool to code this up in Python and then generate our final image. So then afterwards here it gives
image. So then afterwards here it gives me a link to download the labeled image.
And here is what I got. You can pause the video to confirm, but it was able to correctly label all the characters.
Pretty cool how it has this knowledge of even some uncommon anime characters built in. All right, here's an even
built in. All right, here's an even trickier test. I'm going to plug in this
trickier test. I'm going to plug in this image and ask it to find Waldo or Wall-E depending on where you're from. So
again, I'm going to plug this image into here and then ask it to find and circle Waldo. Let's press generate. All right.
Waldo. Let's press generate. All right.
Now, this time it took way longer. It
thought for 12 minutes and 59 seconds.
So, let's expand its thinking process.
And first, it's opening up this Python code to open up the image and inspect it. So, it's first opening up the full
it. So, it's first opening up the full image, and then it's actually zooming into specific sections to find Waldo.
So, it's scanning this first, and then it's also scanning another location scanning another location, scanning another location, etc. And then it says it spotted a character with a red white
striped shirt and red hat. And then it zooms in further, and then it says it's not sure if this is Waldo. So, it
proceeds to zoom into some other places yada yada yada. It's also searching at the top here and it could not find Waldo in the upper balcony. And you know here's the crazy thing. It's actually
coding up a way to detect Waldo using this red stripe analysis. So, it's
actually creating a mask which would just run through the image and detect any instances of red and white stripes.
How crazy is that? And then afterwards I guess it has found these filtered regions. So, it's doing a lot of really
regions. So, it's doing a lot of really complicated stuff. That is some crazy
complicated stuff. That is some crazy math here just to find Waldo. And then
here it actually ran into some error with this Python code. But here you can see it can automatically fix the error itself and then move on. It thinks for a really long time I'm just going to fast
forward to the end. It's like looking through all these different sections.
Eventually it focuses on this striped man in tile 11 which looks like this.
And now it's confirming Waldo's identity and it says it's confident that this is Waldo. So finally it has honed in on
Waldo. So finally it has honed in on this dude over here. This is definitely Waldo. And finally, after 13 minutes
Waldo. And finally, after 13 minutes it's done. So, it gives me an image to
it's done. So, it gives me an image to download. And as you can see here, it
download. And as you can see here, it has indeed correctly circled Waldo. This
is a really tricky prompt considering that there are so many other really close calls like this woman over here.
Now, because it has vision capabilities let's also test it on some OCR tasks. I
wanted to parse this really complicated table and turn this into a spreadsheet.
This is really complicated because this has columns nested within other columns.
Plus, some of these cells are missing.
Plus, there's not even like lines for each row. So, it's super complicated. It
each row. So, it's super complicated. It
kind of has to figure out a lot of different things. So, I'm going to plug
different things. So, I'm going to plug this into here and then simply write turn this into a spreadsheet. Let's
press generate. All right. So, here
again, it thought for a really long time, 4 minutes and 7 seconds. Here is
its thinking process. You can see it's first opening up some Python code to scan the image, but afterwards here it gives me an Excel spreadsheet to download. So, let me download this and
download. So, let me download this and open it up over here. In fact, let me put the original table in the corner somewhere. And it does seem like
somewhere. And it does seem like everything is correct. Even though this is a super tricky table with nested columns and missing cells, it's able to parse everything into a spreadsheet like
this. This is super impressive. All
this. This is super impressive. All
right, here's an even trickier example because this time it's a flowchart and the arrows are all over the place. So
I'm going to copy this image and paste it in here. And let's ask it to turn this into a canvas where I can drag and move the nodes around and scroll to zoom in and out. Preserve the text and colors
and put everything in one HTML file. All
right, so here it thought for 3 minutes and 39 seconds. And here is its thinking process. Let's just copy the code and
process. Let's just copy the code and paste it in an HTML. So, here's what we have. Let me also open up the original
have. Let me also open up the original flowchart for your reference. So it
seems like all the text is correct. Let
me double check the arrows. So all the arrows at the top here are correct. Same
with the arrows at the bottom here.
However, this is again a really tricky flowchart because as you can see there's actually no arrow connecting these two.
So this arrow here is wrong. What should
happen is there should be an arrow connecting this bottom green one to this top pink one. But we don't see the arrow over here. So that's an error. And then
over here. So that's an error. And then
same with over here. This bottom pink one should connect to this top yellow one, but we don't see this over here.
So, that's a flaw. It could not get the arrows 100% correct, but it was pretty close., And, at least, the, text, and, the
close., And, at least, the, text, and, the colors were correct. Again, note that this is a really tricky prompt that was meant to test its limits. For a basic flowchart or table, I'm sure it can
handle it very well. Now, in my previous video testing Gemini 3 Pro, I asked it to find the cat in this photo, and it was able to get it. That's way too easy.
So, let's step it up a notch. There's a
frog hidden in this photo, and I fed this to Gemini 3 Pro for many times, and it was not able to identify the frog.
Let me try the same thing with GPT 5.2.
So, I'm going to paste the image in here, and I'm not even going to say that it's a frog. I'm just going to write there's an animal hidden in this photo.
Find and circle it. So, here it thought for 1 minute and 15 seconds. Let's
expand its thinking. And let's open this up, here., At least, it, identified, that
up, here., At least, it, identified, that this is a frog. It appears to have zoomed in on this spot. It circled this spot which is not correct. But then
again, even Gemini 3 Pro was not able to get this correct. You know
interestingly, they both kind of narrowed it down to the same point which is over here. But I mean, I double checked this and there's definitely no frog, at, this, point., So,, at least, for this photo, currently none of the top
models could actually find the frog.
Maybe we'll need to wait for GPT6 or Gemini 4. Now, GPT3 Pro is known to do
Gemini 4. Now, GPT3 Pro is known to do really well at analyzing medical images.
So, let's see if GPT 5.2 is also this good. I'm going to upload this image and
good. I'm going to upload this image and then ask it to circle the lesions in these images, if any. Let's press
generate. All right. So, after thinking for a minute and 42 seconds, it proceeds to analyze each image and then circle any lesions that it has identified. So
let me download its result. And here's
what it gave me. Now, if I compare this to the actual answer, it got most of it wrong. So, for slide one, this is not
wrong. So, for slide one, this is not correct. The lesion should be over here.
correct. The lesion should be over here.
For the second slide, it didn't even circle the lesion in the right place. It
just circled this blank area. For slide
three, it got it correct. And then for slide four, it kind of got it correct. I
think it just circled the wrong place.
It should be this dot over here. So
overall, not too impressive for medical image analysis. I definitely would not
image analysis. I definitely would not use this to identify lesions or tumors in, medical, scans., All right,, next., Let's
also see if it's any good at determining the location of a photo. Now, this is a really blurry photo that I took from a hidden beach somewhere in Hong Kong.
This is not a popular location, and I've never uploaded this photo online before.
I've stripped away all the EXIF data or metadata from it. There's no clues on where this photo was taken. I'm going to paste it in here and then ask it what's the exact location of this. And here's
what I got. It was not able to get an exact pin. However, it has identified
exact pin. However, it has identified that, you know, this section is this which is actually correct. So, here it has identified that it's most likely
this point, which is close, but not entirely correct. However, if I run the
entirely correct. However, if I run the same prompt in Gemini 3 Pro, it was way off. So, it guessed this location, which
off. So, it guessed this location, which is a completely different place. Let me
actually map this out for you. So I
actually took the photo at this location facing southeast and GPT predicted that it was over here which is actually also looking at roughly the same direction.
So GPT was at least able to get you know this part correct. However, Gemini
predicted it to be all the way over here which is just completely wrong. So both
of them were wrong but I would say GPT was a lot closer. Again this is a really tricky photo of quite a private location. So, I did not expect it to get
location. So, I did not expect it to get it correct. I'm sure if you feed it a
it correct. I'm sure if you feed it a normal photo with more clues, then it would be really good at actually guessing, that, location., All right,, so that sums up my quick tests on all the
cool things that GPT 5.2 can and cannot do. Overall, after spending a few days
do. Overall, after spending a few days testing it, it is definitely state-of-the-art. I would say it's like
state-of-the-art. I would say it's like tied with Gemini 3 Pro. They're very
similar, and it's really hard to pick a winner. All right, next up, let's go
winner. All right, next up, let's go over its specs and performance and benchmarks and some really important things that you should be aware of. So
here is their official announcement page. First of all, they were very
page. First of all, they were very specific about what GPT 5.2 is good at here. They say it's the most capable
here. They say it's the most capable model for professional knowledge work and specifically in this announcement page, they really focused on this benchmark GDP Val. If you scroll down a
bit here, it explains what this GBT valve benchmark actually is. This
basically tests an AI model's performance on some realworld work tasks spanning 44 jobs from the top nine industries contributing to the US GDP.
So this includes like sales presentations, accounting spreadsheets urgent care schedules, manufacturing diagrams or short videos. And as you can see, GPT 5.2 both the pro and thinking
models are the first ever models to actually beat an expert level human worker over 50% of the time. the
previous version GPT5 thinking still wasn't able to beat an expert level human over half the time. So this is quite a profound chart. This might imply that you know AI is indeed going to
replace most traditional job roles. I
mean why hire a human when you can just get an AI which is cheaper and can respond to you instantly and it's even better than a human expert. Now like I showed you GPT 5.2 is actually very good
at coding things. it was able to create a ton of stuff that were not possible with the other top models. So in terms of agentic coding, here's another relevant benchmark that they focused
heavily on which is Swebench Pro. Now
here's something really important.
Sweetbench Pro is different from Swebench Verified, which is the benchmark that are used from other top models. For example, you can see here
models. For example, you can see here for this official benchmark card by Gemini 3 Pro when it was first released they used SWEBench verified, not Pro.
Same with Claude Opus 4.5 when it was released. They also use Sweepbench
released. They also use Sweepbench verified, not Pro. But here, Open Eye is saying, "No, we should not use Sweepbench verified, which only tests Python, but instead use Swebench Pro
which tests four languages and aims to be more contaminant resistant." Now, I hope you can see where we're going with this. Of course, OpenAI would prefer to
this. Of course, OpenAI would prefer to use SWEBench Pro because GPT 5.2 Sue actually performs a lot better than competitors like Opus 4.5 and Gemini 3
Pro. However, it's not true for SUIB
Pro. However, it's not true for SUIB bench verified in which case GPT 5.2 actually doesn't even perform as well as
Gemini 3 or Opus 4.5. So make sure you don't just you know read the benchmarks from OpenAI. Now for GPQA Diamond which
from OpenAI. Now for GPQA Diamond which are like graduate level reasoning questions, it does perform better than Gemini 3 Pro. Same with Frontier Math and Competitive Math. It also performs
very well on this charive reasoning which tests how good it is at analyzing scientific figures. And then here's
scientific figures. And then here's another crazy achievement for ARC AGI2.
It scores 52.9%.
So if you plot this out against the other competitor models, then GPT5 Pro is indeed one of the best for this benchmark. Now, this is extremely
benchmark. Now, this is extremely important because ARC AGI 2, if you're not familiar with this benchmark, it basically tests an AI model's ability to learn new patterns. You see, on this
test, the model is basically given some question and answer pairs like this where it needs to figure out the underlying pattern. Now, for a human
underlying pattern. Now, for a human it's pretty easy to do. So, we basically need to color these gray shapes based on how many holes it has. But for an AI this is actually really tricky because
technically an AI model doesn't learn new things after training. So if we feed it new data it has never seen before, it has a really hard time actually answering correctly. So this ArcGI
answering correctly. So this ArcGI benchmark is basically testing whether an AI can still like take in new data and learn underlying patterns from it.
And, as, you, can, see,, GPT5, Pro,, at least the high mode, is definitely state-of-the-art, much better than Gemini 3 or even Gemini 3 Deep Think which is all the way over here, which is
more expensive, but less performant.
Here on this announcement page, they also say that GPT 5.2 performs much better in long context reasoning. So
you can feed it a ton of information at once, like some really long documents or an entire codebase, and even up to like 256 tokens, which is roughly 200,000
words. You can see that the accuracy of
words. You can see that the accuracy of GPT 5.2, which is the top line here remains very close to 100%. So, even if you feed it a ton of information at once, it's able to kind of remember
everything and answer you correctly.
That being said, note that the maximum context window of GPT 5.2 too high is only 400,000 tokens. So that's roughly 300,000 words. If you want to fit even
300,000 words. If you want to fit even more information into your prompt, then you'll have to go with Gemini 3, which has a context window of 1 million tokens. So you can fit way more info
tokens. So you can fit way more info into your prompt at once using Gemini.
Here's another interesting fact here.
They say that GPT 5.2 has a knowledge cutoff of August 2025. This is
relatively recent and that means this model has a lot more recent data built into it compared to other competitors which might have a much earlier knowledge cutoff date. Final thing to
note here is that currently you can only use GPT 5.2 if you're on any of these paid plans. It's not available on a free
paid plans. It's not available on a free plan yet. Now those were just some of
plan yet. Now those were just some of OpenAI's self-reported benchmarks. It's
also important to actually look at some independent third-party leaderboards to get an objective sense of how GPT 5.2 actually compares. So, here's an
actually compares. So, here's an independent leaderboard called artificial analysis. And for GPT 5.2
artificial analysis. And for GPT 5.2 extra high, notice that this is like the max thinking mode. It's basically tied with Gemini 3 Pro. And this is kind of
how I feel as well. Both of these models are very similar in performance. Note
that they haven't actually added, you know, just the regular high or the medium thinking models of GT 5.2 to this leaderboard, but you can probably assume that it performs a bit worse. If you
look at pricing, then it's actually slightly more expensive than Gemini 3 Pro at $4.8 per million tokens, but it's still like way cheaper than Anthropics
Opus 4.5, which to be honest is kind of a ripoff for me. If you look at LiveBench, which is another independent and private leaderboard, surprisingly
GPT 5.1 Codeex Max High is ranked number one with the highest reasoning and coding averages as well. And then GPT 5.2 high is all the way down here below Opus and Gemini 3. If you look at this
other leaderboard called SimpleBench which basically tests an AI's performance on some common sense tasks surprisingly, GPD 5.2 2 is all the way
down here at eighth place. Note that
this is the extra highinking model. So
it's even below GPT 5 Pro and also below Gemini 3 Pro. Now, after some initial testing, I get the feeling that GPT 5.2 is actually a bit better at Gemini 3 in
terms of geo guessing or basically guessing the location of a photo.
However, for this Geobbench leaderboard it doesn't seem like they've added GPT 5.2 here yet. Now, here's another leaderboard called OCR, which tests a model's ability to recognize and parse
text in images. So, this would be for tasks like turning an image of a table into a spreadsheet, like I showed you in the demos. And over here, GPT 5.2 is
the demos. And over here, GPT 5.2 is also not ranked number one. It's all the way in fifth place. Now, here they haven't added the high or extra high versions of this yet for some reason.
So, maybe those variants would rank a bit higher. Now, it's also important to
bit higher. Now, it's also important to look at the hallucination rate of an AI model, or basically how often does it just make stuff up. And you can see that
GPT 5.2 extra high is all the way in the middle. So, it's not as bad as Gemini 3
middle. So, it's not as bad as Gemini 3 Pro, which is over here. It got stuff wrong 88% of the time according to this test, whereas GPT 5.2 is only 78%.
However, if you're really concerned about like factual accuracy of its outputs, then there are more accurate models for that like Kim K2 thinking or
Gro 4.1 or even Grock 4. So, that sums up my review and tests on this latest GPT 5.2. It's a very capable model. I
GPT 5.2. It's a very capable model. I
was especially impressed in its coding capabilities, and it is definitely among the state-of-the-art models out there. I
mean, things are moving so fast and with all these benchmarks and metrics, it's really hard to actually get an objective sense of how good a model is. At least
for me, its intelligence or performance feels on par with Gemini 3 Pro. But let
me know in the comments what you think.
What other impressive things were you able to get it to do? As always, I will be on the lookout for the top AI news and tools to share with you. So, if you enjoyed this video, remember to like
share, subscribe, and stay tuned for more content. Also, there's just so much
more content. Also, there's just so much happening in the world of AI every week.
I can't possibly cover everything on my YouTube channel. So, to really stay
YouTube channel. So, to really stay uptodate with all that's going on in AI be sure to subscribe to my free weekly newsletter. The link to that will be in
newsletter. The link to that will be in the description below. Thanks for
watching, and I'll see you in the next one.
Loading video analysis...