LongCut logo

GPT-5.2 is a total monster

By AI Search

Summary

## Key takeaways - **Bees Enter Single Hive Entrance**: GPT 5.2's beehive simulation has all bees coming from a single entrance, which is more correct than Gemini 3 Pro or previous GPT5 where bees dispersed everywhere. [01:34], [02:07] - **Full Photoshop Clone Works**: GPT 5.2 created a Photoshop clone with brushes, layers, edit history, filters, blending options that all work out of the box, better than Gemini 3 which had non-working features. [03:10], [05:46] - **Dual Spheres Reflect Each Other**: GPT 5.2 simulated two metallic spheres that accurately reflect each other and update reflections when moved, the first model to handle this correctly unlike others. [09:05], [09:20] - **Working Excel in Windows Clone**: GPT 5.2's Windows 11 clone includes functional Excel with formulas like SUM and multiplication that compute correctly, while Gemini 3 Pro could not create working Excel. [12:26], [13:12] - **Finds Waldo After 13 Minutes**: GPT 5.2 took 13 minutes using Python code, zooming, and red stripe detection to correctly circle Waldo in a crowded image. [19:10], [20:55] - **Beats Humans on GDP Val**: GPT 5.2 is the first model to beat expert-level human workers over 50% of the time on GDP Val benchmark spanning 44 real-world jobs across top US industries. [27:45], [28:12]

Topics Covered

  • Bees Simulate Realistically from Single Entrance
  • Photoshop Clone Masters All Tools
  • Spheres Reflect Each Other Accurately
  • AI Beats Human Experts Half Time
  • Benchmarks Hide True Comparisons

Full Transcript

OpenAI just dropped their latest and best model, GPT 5.2. So, in this video I'm going to go over all the cool things that it can and cannot do and compare it with the leading models out there like

Gemini 3. Plus, we're going to go over

Gemini 3. Plus, we're going to go over its specs, performance, and benchmarks and some really important details that you might have missed. Let's jump right in. First, let's just start with some

in. First, let's just start with some demos. So, I'm just on Chat GPT, and I'm

demos. So, I'm just on Chat GPT, and I'm on the plus plan. And at the top here, I can select GPT 5.2 to thinking which is the most performant variant. Note that

all the top AI models out there can already do simple stuff like writing emails or blog posts or stuff like that.

So, in this video, I'm really trying to test its limits with some really tricky prompts that involve some like multi-step coding or reasoning or visual analysis tasks. For my first prompt

analysis tasks. For my first prompt let's try this. Make a visual simulation of a beehive construction showing hexagonal cells forming worker bee paths and honey storage. includes sliders for

colony size and resource availability.

Put everything in a standalone HTML file., Let's, press, run., All right,, that

file., Let's, press, run., All right,, that was pretty quick. It only thought for 19 seconds. Let me expand its thought

seconds. Let me expand its thought process so you can see how it thinks first. So, it's first planning the HTML

first. So, it's first planning the HTML for the colony construction. Then, it's

refining resource effects and worker B movement. Then, simplifying B task

movement. Then, simplifying B task selection and movement. And that's

pretty much it. So, it's pretty quick.

It definitely doesn't waste your tokens if it doesn't need to. And then

afterwards, it gave me this HTML code.

So, here is all the code. It's a really long snippet of code. I'm just going to copy this and open it in a new HTML file. All right. And here's what we get.

file. All right. And here's what we get.

Now, this is actually very different from the other bee colony simulations I've created before with the other leading models in that for this one, all the bees come from a single entrance

which is actually more correct. So, a

bee colony actually creates these cells inside the hive, right? And there's only like one main entrance where the bees fly in and out of. So, this is actually more correct than what I got from like

Gemini 3 Pro or the previous version GBT5 where the bees just kind of disperse everywhere. You can see that

disperse everywhere. You can see that the bees all look very nice. Plus

they're all filling up these cells with honey. The animations look pretty much

honey. The animations look pretty much perfect. And if I increase the colony

perfect. And if I increase the colony size, we do get a lot more bees dispersing. And then if I increase the

dispersing. And then if I increase the resource availability, the cells do tend to fill up faster. if I decrease the resource availability to like 7% let's say then the cells do fill up a bit

slower. You can also see some stats up

slower. You can also see some stats up here plus the build rate and the number of workers. So everything just works and

of workers. So everything just works and the animation looks great. This is

actually the most realistic animation of this beehive construction I've seen so far. All right, my next prompt is even

far. All right, my next prompt is even trickier. Let's get it to create a clone

trickier. Let's get it to create a clone of Photoshop with all the basic tools including brushes, layers, edit history filters, blending options, and more. And

then actually what I can do with chat GPT is down here I can click on canvas so that this code can actually be previewed in a right side window as you'll see in a second. Let's press

generate., All right,, here's, what, we, get.

And again, it only thought for 19 seconds, which is kind of surprising how fast it took for it to create an entire clone of Photoshop. But here is basically its thinking. It's very short.

And let me click on preview. So you can see the Photoshop app in this right side window. Everything looks good to me so

window. Everything looks good to me so far. Let's start drawing over this. So

far. Let's start drawing over this. So

let me draw a few lines here. So, the

brush works. Let's increase the brush size and then change the color to something like this. So, increasing the brush size also works. Let's also add a

new layer here. And then let me decrease the brush size and draw some more stuff.

Perfect. Let me also try this opacity slider. All right. Very nice. So

slider. All right. Very nice. So

opacity works. And then brush hardness.

Let me also decrease the brush hardness.

And then let's make a yellowish brush this time. Let me increase the opacity.

this time. Let me increase the opacity.

So the hardness also works. You can see that the edges are a bit softer. And

then let's also change the layer opacity. So this also works. Very nice.

opacity. So this also works. Very nice.

Over here I can also click on this eye icon to basically hide this layer. And I

can do the same for the layer behind it.

Very nice. And by the way, all the history is actually recorded over here.

So the edit history also works. Next

let me try eraser. And let me decrease the brush size and erase over some of this. And this also works very well. Let

this. And this also works very well. Let

me try move. And move also works. Very

nice., All right., Next,, let's, play, with some filters. So, let's try grayscale

some filters. So, let's try grayscale and apply that. So, grayscale works. And

note that it's only applying this to the layer I selected, which is what it should do. Next, let's try invert and

should do. Next, let's try invert and apply that. So, invert also works. In

apply that. So, invert also works. In

fact, let me press Ctrl +- Z to undo those. So, it's not in grayscale. And

those. So, it's not in grayscale. And

then, let me try invert again. So that

works. And then let's try sepia. That

also works. Let me press undo again. And

then let's try blur. So blur also works.

You can see it is getting blurrier. Very

nice. And then let's try sharpen. So

sharpen also works. And then let's try edge detect. That also works.

edge detect. That also works.

Brightness. And let's make this super bright. And press apply. So that also

bright. And press apply. So that also works. Let's try contrast. And let's

works. Let's try contrast. And let's

increase the contrast. So that also works. And then afterwards we have hue

works. And then afterwards we have hue rotate. And you can see as I adjust the

rotate. And you can see as I adjust the slider, the colors are changing in hue in the background. So that works. Let me

press undo. And then finally saturation. Let's make this a bit less

saturation. Let's make this a bit less saturated. And then press apply. And

saturated. And then press apply. And

that also works. So all these filters just work out of the box. Let me press undo. And next, let's also play with the

undo. And next, let's also play with the layer blending. So first we have normal.

layer blending. So first we have normal.

Next we have multiply. So that works.

Then we have screen. That also works.

Overlay, darken, lighten, color dodge color burn, hard light, soft light. All

of these blend modes just work right out of the box. This is pretty incredible.

This is the most thorough Photoshop clone that I've seen so far. Much better

than Gemini 3, which looks like this.

Some of the features don't really work.

And then also better than GPT5, which was also not bad, but this new one just has a lot more features and everything just works right out of the box. Super

impressive., All right,, next., Let's, also test how good it is at generating a 3D scene from an image. So, I'm going to copy this image and paste it in here and then ask it to create a beautiful 3D

scene inspired by this image. Use 3JS in a single HTML file. This is a library that's used to create 3D animations online. All right. And here's what I

online. All right. And here's what I got. I forgot to press the canvas

got. I forgot to press the canvas button, so I can't preview this in a side window, but let me just copy this code and paste it in an HTML file and then open it up here. So, here is the 3D

render from this model. It's not bad. I

mean, all the elements are there. Let me

also split screen this so you can view the render from Gemini 3 Pro. So, it

seems like GPT 5.2 does have more detail, especially with the cherry blossoms. And then for the Pigota, both of them are not great. So, I don't know.

It's a really close call between both models. Both of them are not perfect. I

models. Both of them are not perfect. I

would say it's a tie. All right. Next

let's try this prompt, which no AI model could get correctly so far. Develop a

simulation of two metallic spheres suspended above a street scene. Notice

that GPT5 could already do this for one metallic sphere, but none of them could do two metallic spheres. Use any

publicly available 3D street environment. Allow adjustable parameters

environment. Allow adjustable parameters such as reflectivity, roughness, and other material properties of the sphere.

And then here are some key phrases I like to use to make sure everything is working. Let's press generate. Here's

working. Let's press generate. Here's

what I got. And this time it thought for a lot longer. So, let me expand the thinking process for you. And here is how it thought through this prompt. So

it's assembling the simulation. It's

choosing a 3D street view, etc., etc. And then it's actually going online to search for publicly available street panoramas, and then it's opening this one, etc., etc. And then it's implementing ray tracing with

accumulation in WebJL 2. It's

implementing path tracing pseudo code and controls yada yada yada. And then

finally, it's creating and finalizing the HTML. And then it's also working on

the HTML. And then it's also working on the ray generation and shader.

Afterwards, here is the code that I got.

I forgot to select canvas, so I can't actually preview this in a side window but, what, I'm, going to, do, is, copy, the code and then paste it in a blank HTML and, then, open, that, up., All right,, so

here is what we get. Let me expand the ray tracing controls. Very interesting.

So it does load up a street map plus this ground here. Let's play around with these settings. So here is exposure. So

these settings. So here is exposure. So

the exposure works. It increases the exposure of the environment. And then

environment intensity. That also works for sphere A. Let's change the height of this. So that works. And then here is X

this. So that works. And then here is X and Z. So that also works. First, let me

and Z. So that also works. First, let me change the base color to just white cuz I don't like the colors of the spheres currently. They look kind of ugly. And

currently. They look kind of ugly. And

then let's change the roughness all the way to zero. Same with sphere B. I'm

going to change this color to white and then change the roughness to zero. Now

the really awesome thing about this and why I prompted it to make two spheres is because I wanted to see if the spheres can reflect within each other as well.

And as you can see here, indeed, the sphere does reflect in the other sphere.

None of the other AI models could actually get this. So, this is super impressive. Note that if I move this

impressive. Note that if I move this sphere, the reflection in this sphere is also accurate. That is so cool. This is

also accurate. That is so cool. This is

the first model that can handle this.

All right., Next,, let, me, change, the, base color to something super ugly like red.

So, that also works. Let me also increase the radius. So, that also works. Notice that as I increase the

works. Notice that as I increase the size, the reflection here also increases. Very nice. And then for

increases. Very nice. And then for roughness, let's play around with the slider. So, I'm going to increase it.

slider. So, I'm going to increase it.

So, that's what happens. For metalness

I'm going to increase it all the way to one, which does this. If I decrease it that's what happens. And then for reflectivity, again, I'm going to decrease it and then increase it. So, it

looks like it's doing something. It

looks like the settings do work. And

then let's change this back to white.

And then for the other sphere, let me also just quickly play around with the settings. The X, Y, and Z sliders work.

settings. The X, Y, and Z sliders work.

The base color also works. Very nice.

And then the roughness and metalness also works. And reflectivity also works.

also works. And reflectivity also works.

And notice that when I change it, it also changes the reflection on the other sphere. Very nice. And then we also have

sphere. Very nice. And then we also have this ground color. So, let's change the color to something super ugly, like red.

So that also works. And then for roughness, let's also change this a bit.

So if we decrease the roughness to like zero, then it basically reflects the sky. Let me change this back to white. I

sky. Let me change this back to white. I

mean, everything just works. And the

reflections on the spheres are physically correct. Super cool. If you

physically correct. Super cool. If you

compare this with like Gemini 3, there was a lot of errors with that generation, and it could not even generate a reflection of the sphere on the other sphere. So this one from GPT

5.2 is the most accurate generation I've seen so far. Very impressive. All right.

Next, let's try something equally as challenging. Make a clone of Windows 11.

challenging. Make a clone of Windows 11.

Use the original wallpaper. On the

desktop, there should be icons for MSWord, Paint, Calculator, and Chrome.

But wait a minute. Gemini 3 can already do these programs. So, instead of Paint and Calculator, I think these are way too easy. Let's try Excel and

too easy. Let's try Excel and PowerPoint. This is going to be way

PowerPoint. This is going to be way harder. And then each program should

harder. And then each program should work. The Start menu should also work.

work. The Start menu should also work.

Put everything in a standalone HTML file. And let me remember to select

file. And let me remember to select canvas so we can preview this in a side window. Let's press generate. All right

window. Let's press generate. All right

here's what we get. And this was super quick. It only thought for 10 seconds.

quick. It only thought for 10 seconds.

And that's basically its thinking process. Its thoughts are definitely way

process. Its thoughts are definitely way shorter and more concise than Gemini 3 or DeepSeek, which tend to think for a lot. Anyways, let's click on preview.

lot. Anyways, let's click on preview.

And here's what we get. So, it doesn't really look like Windows 11 desktop. The

start menu also looks kind of weird, but nevertheless, let's try out each app.

So, I'm going to click on Word. And this

does open up Word. Let's try typing something in. So, that also works. Let

something in. So, that also works. Let

me try to bold this. Very nice. And then

italics. And then underline. So, all

three of these settings work. Let me

also try to change the font of this. So

the font works as well. Let's try

clicking save. So, it does say saved here, but I don't think it actually saved anything. And then let me try to

saved anything. And then let me try to move this window around. So that works.

Let me try to minimize this. That also

works. And then let me expand this. That

also works. And then finally, let's exit out of this. And then try Excel. So here

we have Excel. Let's actually see if this works. So let's try one and then

this works. So let's try one and then two here. And then for the formula

two here. And then for the formula let's try something like equals the sum of a1 to a2. And that also works. Very

cool. Or let's try this. Let me add a cell here. And then let's set this equal

cell here. And then let's set this equal to a3 minus a2 which would give me 1 which is correct. This is 3 - 2, which is, 1., All right,, let's, add, another

is, 1., All right,, let's, add, another number here. And then let's set this

number here. And then let's set this cell equal to a3 * b3. And indeed, it gives me 15, which is correct. Very

cool. So this is actually a very basic but working version of Excel. I actually

tried the same prompt in Gemini 3 Pro but it could not create a working Excel.

You can see that I can't even click on any of these cells or do anything. So

very impressive. Next, let's try PowerPoint. And this also seems to work.

PowerPoint. And this also seems to work.

So, let's try changing the text here.

And then, let me add a new text block. I

can also move the text block around and edit the text. Very nice. And then here is a new slide. Here's the second slide.

And then I can also add a new slide here. That's pretty much it though. I

here. That's pretty much it though. I

can't like change the background or anything or add images. So, it's just a very basic render of PowerPoint.

Anyways, let me also press present. And

present actually works. So, I can also use the arrow keys to navigate to the next or previous slide. And then I can press escape to exit. Everything just

works. This is super impressive. Again

if I try the same prompt in Gemini 3 Pro, it gave me this, which looks a lot more like PowerPoint, but I can't even like insert any text. So, if I click on this, there is no drop down. I can't

insert a new slide. If I click here notice it's not doing anything. None of

these buttons actually work. All right.

Next, let's also try Google Chrome. So

here's what Chrome looks like. Here it

says this is a standalone HTML demo, so it can't embed arbitrary websites. Let's

just try Google and see if that works.

Yeah, it seems like none of these work for security reasons. Nevertheless, it's

still impressive that it's able to code a working version of Word, Excel, and PowerPoint all in this one desktop clone. Let me also check the Start menu.

clone. Let me also check the Start menu.

And then let me open up Word again. And

that also works. So the Start menu works. Let's search for something like

works. Let's search for something like Excel. And that also works. So it's not

Excel. And that also works. So it's not perfect. The icons kind of look weird.

perfect. The icons kind of look weird.

It doesn't use the default Windows 11 wallpaper, but most things are functional. I'm sure with a few more

functional. I'm sure with a few more prompts, we can get this to look even better. All right, final coding example

better. All right, final coding example and then we're also going to move on to some other cooler stuff. Anyways, let me try to also get it to generate an interactive 3D night sky viewer with labeled constellations. Most of the top

labeled constellations. Most of the top AI models can't really do this in just one shot. And here's what we get. It

one shot. And here's what we get. It

thought for 41 seconds. Here is its thinking process. And here's the code.

thinking process. And here's the code.

Let's click on preview. And here's what we get. And I'm actually really

we get. And I'm actually really impressed that it was able to get this in one go. And the constellations do look mostly correct. Let me also try to

find the Big Dipper, which should be Ursa. And here we go. Here is Ursa

Ursa. And here we go. Here is Ursa Major, which is the Big Dipper. Now, let

me auto rotate this. And then let me also play around with the sliders. So

this basically increases the number of background stars. This is the star size.

background stars. This is the star size.

So, increasing the star size also works.

And then increasing the label size also works as you can see here. And then the line opacity also works as you can see here. Very nice. everything just works.

here. Very nice. everything just works.

Again, it's not perfect. Some of the lines of the constellations are missing but I mean, you can try this with Gemini 3 Pro or like GPT5, and often they're not able to code up a working demo in

just one prompt. So, again, I'm really impressed by the coding abilities of GPT 5.2. Everything just works. If you want

5.2. Everything just works. If you want to supercharge your productivity definitely check out Skywork Super Agents, the sponsor of this video. Think

of this as like an army of advanced AI agents right at your fingertips. You can

get it to autonomously do research create reports, spreadsheets, slides web pages, and even podcasts. They have

a new AI poster feature which allows you to create posters with zero design skills. It integrates the best image

skills. It integrates the best image model out there, Nano Banana Pro, to make incredibly beautiful, clean, and accurate posters. For example, let's get

accurate posters. For example, let's get it to make a poster of the top five tourist destinations of Thailand. And

here's our result. I can prompt it to make this more minimalist like this or more detailed like this. Notice that

this poster agent integrates deep research. So you can be sure that all

research. So you can be sure that all the content is factually correct. I can

also click anywhere in the photo to edit it further. For example, let's turn this

it further. For example, let's turn this photo into sunset. And here's what we get. Or here's another example. I can

get. Or here's another example. I can

drag and drop any product photos or people into here and then get it to create a poster with these assets. For

example, let's get it to make a flash sale poster with these items up to 80% off. The store name is called Aura

off. The store name is called Aura located at this address. And here's my result. I can prompt it further to give

result. I can prompt it further to give me this poster in different styles. In

addition to the poster agent, Skywork also has an agent for creating reports slides, sheets, websites, and more. So

whether you want to make sales decks research reports, educational materials marketing pitches, or anything else Skyworks Super Agents is the best platform to use with the highest quality

and accuracy. Try it today via the link

and accuracy. Try it today via the link in the description below right now. The

awesome thing about GPT 5.2 is it's multimodal, so it can also take in images. So what I'm going to do is feed

images. So what I'm going to do is feed it this image of a ton of Demon Slayer characters. And let me just drop it in

characters. And let me just drop it in here. And let me ask it to label each

here. And let me ask it to label each character in a bounding box with their name. Let's press generate. And here's

name. Let's press generate. And here's

what I got. It thought for a long time now. Here's its thinking process. And

now. Here's its thinking process. And

it's actually pretty interesting. So it

was able to identify that this is from Demon Slayer. And it seems to just know

Demon Slayer. And it seems to just know all these characters right away. So

next, it's proceeding to write the code to label the characters with bounding boxes. Then it's analyzing the image.

boxes. Then it's analyzing the image.

It's defining the bounding box for this character, etc., etc. And then it's doing all of this. So notice GPT 5.2 is really good with tool use. It's pulling

this Python tool to code this up in Python and then generate our final image. So then afterwards here it gives

image. So then afterwards here it gives me a link to download the labeled image.

And here is what I got. You can pause the video to confirm, but it was able to correctly label all the characters.

Pretty cool how it has this knowledge of even some uncommon anime characters built in. All right, here's an even

built in. All right, here's an even trickier test. I'm going to plug in this

trickier test. I'm going to plug in this image and ask it to find Waldo or Wall-E depending on where you're from. So

again, I'm going to plug this image into here and then ask it to find and circle Waldo. Let's press generate. All right.

Waldo. Let's press generate. All right.

Now, this time it took way longer. It

thought for 12 minutes and 59 seconds.

So, let's expand its thinking process.

And first, it's opening up this Python code to open up the image and inspect it. So, it's first opening up the full

it. So, it's first opening up the full image, and then it's actually zooming into specific sections to find Waldo.

So, it's scanning this first, and then it's also scanning another location scanning another location, scanning another location, etc. And then it says it spotted a character with a red white

striped shirt and red hat. And then it zooms in further, and then it says it's not sure if this is Waldo. So, it

proceeds to zoom into some other places yada yada yada. It's also searching at the top here and it could not find Waldo in the upper balcony. And you know here's the crazy thing. It's actually

coding up a way to detect Waldo using this red stripe analysis. So, it's

actually creating a mask which would just run through the image and detect any instances of red and white stripes.

How crazy is that? And then afterwards I guess it has found these filtered regions. So, it's doing a lot of really

regions. So, it's doing a lot of really complicated stuff. That is some crazy

complicated stuff. That is some crazy math here just to find Waldo. And then

here it actually ran into some error with this Python code. But here you can see it can automatically fix the error itself and then move on. It thinks for a really long time I'm just going to fast

forward to the end. It's like looking through all these different sections.

Eventually it focuses on this striped man in tile 11 which looks like this.

And now it's confirming Waldo's identity and it says it's confident that this is Waldo. So finally it has honed in on

Waldo. So finally it has honed in on this dude over here. This is definitely Waldo. And finally, after 13 minutes

Waldo. And finally, after 13 minutes it's done. So, it gives me an image to

it's done. So, it gives me an image to download. And as you can see here, it

download. And as you can see here, it has indeed correctly circled Waldo. This

is a really tricky prompt considering that there are so many other really close calls like this woman over here.

Now, because it has vision capabilities let's also test it on some OCR tasks. I

wanted to parse this really complicated table and turn this into a spreadsheet.

This is really complicated because this has columns nested within other columns.

Plus, some of these cells are missing.

Plus, there's not even like lines for each row. So, it's super complicated. It

each row. So, it's super complicated. It

kind of has to figure out a lot of different things. So, I'm going to plug

different things. So, I'm going to plug this into here and then simply write turn this into a spreadsheet. Let's

press generate. All right. So, here

again, it thought for a really long time, 4 minutes and 7 seconds. Here is

its thinking process. You can see it's first opening up some Python code to scan the image, but afterwards here it gives me an Excel spreadsheet to download. So, let me download this and

download. So, let me download this and open it up over here. In fact, let me put the original table in the corner somewhere. And it does seem like

somewhere. And it does seem like everything is correct. Even though this is a super tricky table with nested columns and missing cells, it's able to parse everything into a spreadsheet like

this. This is super impressive. All

this. This is super impressive. All

right, here's an even trickier example because this time it's a flowchart and the arrows are all over the place. So

I'm going to copy this image and paste it in here. And let's ask it to turn this into a canvas where I can drag and move the nodes around and scroll to zoom in and out. Preserve the text and colors

and put everything in one HTML file. All

right, so here it thought for 3 minutes and 39 seconds. And here is its thinking process. Let's just copy the code and

process. Let's just copy the code and paste it in an HTML. So, here's what we have. Let me also open up the original

have. Let me also open up the original flowchart for your reference. So it

seems like all the text is correct. Let

me double check the arrows. So all the arrows at the top here are correct. Same

with the arrows at the bottom here.

However, this is again a really tricky flowchart because as you can see there's actually no arrow connecting these two.

So this arrow here is wrong. What should

happen is there should be an arrow connecting this bottom green one to this top pink one. But we don't see the arrow over here. So that's an error. And then

over here. So that's an error. And then

same with over here. This bottom pink one should connect to this top yellow one, but we don't see this over here.

So, that's a flaw. It could not get the arrows 100% correct, but it was pretty close., And, at least, the, text, and, the

close., And, at least, the, text, and, the colors were correct. Again, note that this is a really tricky prompt that was meant to test its limits. For a basic flowchart or table, I'm sure it can

handle it very well. Now, in my previous video testing Gemini 3 Pro, I asked it to find the cat in this photo, and it was able to get it. That's way too easy.

So, let's step it up a notch. There's a

frog hidden in this photo, and I fed this to Gemini 3 Pro for many times, and it was not able to identify the frog.

Let me try the same thing with GPT 5.2.

So, I'm going to paste the image in here, and I'm not even going to say that it's a frog. I'm just going to write there's an animal hidden in this photo.

Find and circle it. So, here it thought for 1 minute and 15 seconds. Let's

expand its thinking. And let's open this up, here., At least, it, identified, that

up, here., At least, it, identified, that this is a frog. It appears to have zoomed in on this spot. It circled this spot which is not correct. But then

again, even Gemini 3 Pro was not able to get this correct. You know

interestingly, they both kind of narrowed it down to the same point which is over here. But I mean, I double checked this and there's definitely no frog, at, this, point., So,, at least, for this photo, currently none of the top

models could actually find the frog.

Maybe we'll need to wait for GPT6 or Gemini 4. Now, GPT3 Pro is known to do

Gemini 4. Now, GPT3 Pro is known to do really well at analyzing medical images.

So, let's see if GPT 5.2 is also this good. I'm going to upload this image and

good. I'm going to upload this image and then ask it to circle the lesions in these images, if any. Let's press

generate. All right. So, after thinking for a minute and 42 seconds, it proceeds to analyze each image and then circle any lesions that it has identified. So

let me download its result. And here's

what it gave me. Now, if I compare this to the actual answer, it got most of it wrong. So, for slide one, this is not

wrong. So, for slide one, this is not correct. The lesion should be over here.

correct. The lesion should be over here.

For the second slide, it didn't even circle the lesion in the right place. It

just circled this blank area. For slide

three, it got it correct. And then for slide four, it kind of got it correct. I

think it just circled the wrong place.

It should be this dot over here. So

overall, not too impressive for medical image analysis. I definitely would not

image analysis. I definitely would not use this to identify lesions or tumors in, medical, scans., All right,, next., Let's

also see if it's any good at determining the location of a photo. Now, this is a really blurry photo that I took from a hidden beach somewhere in Hong Kong.

This is not a popular location, and I've never uploaded this photo online before.

I've stripped away all the EXIF data or metadata from it. There's no clues on where this photo was taken. I'm going to paste it in here and then ask it what's the exact location of this. And here's

what I got. It was not able to get an exact pin. However, it has identified

exact pin. However, it has identified that, you know, this section is this which is actually correct. So, here it has identified that it's most likely

this point, which is close, but not entirely correct. However, if I run the

entirely correct. However, if I run the same prompt in Gemini 3 Pro, it was way off. So, it guessed this location, which

off. So, it guessed this location, which is a completely different place. Let me

actually map this out for you. So I

actually took the photo at this location facing southeast and GPT predicted that it was over here which is actually also looking at roughly the same direction.

So GPT was at least able to get you know this part correct. However, Gemini

predicted it to be all the way over here which is just completely wrong. So both

of them were wrong but I would say GPT was a lot closer. Again this is a really tricky photo of quite a private location. So, I did not expect it to get

location. So, I did not expect it to get it correct. I'm sure if you feed it a

it correct. I'm sure if you feed it a normal photo with more clues, then it would be really good at actually guessing, that, location., All right,, so that sums up my quick tests on all the

cool things that GPT 5.2 can and cannot do. Overall, after spending a few days

do. Overall, after spending a few days testing it, it is definitely state-of-the-art. I would say it's like

state-of-the-art. I would say it's like tied with Gemini 3 Pro. They're very

similar, and it's really hard to pick a winner. All right, next up, let's go

winner. All right, next up, let's go over its specs and performance and benchmarks and some really important things that you should be aware of. So

here is their official announcement page. First of all, they were very

page. First of all, they were very specific about what GPT 5.2 is good at here. They say it's the most capable

here. They say it's the most capable model for professional knowledge work and specifically in this announcement page, they really focused on this benchmark GDP Val. If you scroll down a

bit here, it explains what this GBT valve benchmark actually is. This

basically tests an AI model's performance on some realworld work tasks spanning 44 jobs from the top nine industries contributing to the US GDP.

So this includes like sales presentations, accounting spreadsheets urgent care schedules, manufacturing diagrams or short videos. And as you can see, GPT 5.2 both the pro and thinking

models are the first ever models to actually beat an expert level human worker over 50% of the time. the

previous version GPT5 thinking still wasn't able to beat an expert level human over half the time. So this is quite a profound chart. This might imply that you know AI is indeed going to

replace most traditional job roles. I

mean why hire a human when you can just get an AI which is cheaper and can respond to you instantly and it's even better than a human expert. Now like I showed you GPT 5.2 is actually very good

at coding things. it was able to create a ton of stuff that were not possible with the other top models. So in terms of agentic coding, here's another relevant benchmark that they focused

heavily on which is Swebench Pro. Now

here's something really important.

Sweetbench Pro is different from Swebench Verified, which is the benchmark that are used from other top models. For example, you can see here

models. For example, you can see here for this official benchmark card by Gemini 3 Pro when it was first released they used SWEBench verified, not Pro.

Same with Claude Opus 4.5 when it was released. They also use Sweepbench

released. They also use Sweepbench verified, not Pro. But here, Open Eye is saying, "No, we should not use Sweepbench verified, which only tests Python, but instead use Swebench Pro

which tests four languages and aims to be more contaminant resistant." Now, I hope you can see where we're going with this. Of course, OpenAI would prefer to

this. Of course, OpenAI would prefer to use SWEBench Pro because GPT 5.2 Sue actually performs a lot better than competitors like Opus 4.5 and Gemini 3

Pro. However, it's not true for SUIB

Pro. However, it's not true for SUIB bench verified in which case GPT 5.2 actually doesn't even perform as well as

Gemini 3 or Opus 4.5. So make sure you don't just you know read the benchmarks from OpenAI. Now for GPQA Diamond which

from OpenAI. Now for GPQA Diamond which are like graduate level reasoning questions, it does perform better than Gemini 3 Pro. Same with Frontier Math and Competitive Math. It also performs

very well on this charive reasoning which tests how good it is at analyzing scientific figures. And then here's

scientific figures. And then here's another crazy achievement for ARC AGI2.

It scores 52.9%.

So if you plot this out against the other competitor models, then GPT5 Pro is indeed one of the best for this benchmark. Now, this is extremely

benchmark. Now, this is extremely important because ARC AGI 2, if you're not familiar with this benchmark, it basically tests an AI model's ability to learn new patterns. You see, on this

test, the model is basically given some question and answer pairs like this where it needs to figure out the underlying pattern. Now, for a human

underlying pattern. Now, for a human it's pretty easy to do. So, we basically need to color these gray shapes based on how many holes it has. But for an AI this is actually really tricky because

technically an AI model doesn't learn new things after training. So if we feed it new data it has never seen before, it has a really hard time actually answering correctly. So this ArcGI

answering correctly. So this ArcGI benchmark is basically testing whether an AI can still like take in new data and learn underlying patterns from it.

And, as, you, can, see,, GPT5, Pro,, at least the high mode, is definitely state-of-the-art, much better than Gemini 3 or even Gemini 3 Deep Think which is all the way over here, which is

more expensive, but less performant.

Here on this announcement page, they also say that GPT 5.2 performs much better in long context reasoning. So

you can feed it a ton of information at once, like some really long documents or an entire codebase, and even up to like 256 tokens, which is roughly 200,000

words. You can see that the accuracy of

words. You can see that the accuracy of GPT 5.2, which is the top line here remains very close to 100%. So, even if you feed it a ton of information at once, it's able to kind of remember

everything and answer you correctly.

That being said, note that the maximum context window of GPT 5.2 too high is only 400,000 tokens. So that's roughly 300,000 words. If you want to fit even

300,000 words. If you want to fit even more information into your prompt, then you'll have to go with Gemini 3, which has a context window of 1 million tokens. So you can fit way more info

tokens. So you can fit way more info into your prompt at once using Gemini.

Here's another interesting fact here.

They say that GPT 5.2 has a knowledge cutoff of August 2025. This is

relatively recent and that means this model has a lot more recent data built into it compared to other competitors which might have a much earlier knowledge cutoff date. Final thing to

note here is that currently you can only use GPT 5.2 if you're on any of these paid plans. It's not available on a free

paid plans. It's not available on a free plan yet. Now those were just some of

plan yet. Now those were just some of OpenAI's self-reported benchmarks. It's

also important to actually look at some independent third-party leaderboards to get an objective sense of how GPT 5.2 actually compares. So, here's an

actually compares. So, here's an independent leaderboard called artificial analysis. And for GPT 5.2

artificial analysis. And for GPT 5.2 extra high, notice that this is like the max thinking mode. It's basically tied with Gemini 3 Pro. And this is kind of

how I feel as well. Both of these models are very similar in performance. Note

that they haven't actually added, you know, just the regular high or the medium thinking models of GT 5.2 to this leaderboard, but you can probably assume that it performs a bit worse. If you

look at pricing, then it's actually slightly more expensive than Gemini 3 Pro at $4.8 per million tokens, but it's still like way cheaper than Anthropics

Opus 4.5, which to be honest is kind of a ripoff for me. If you look at LiveBench, which is another independent and private leaderboard, surprisingly

GPT 5.1 Codeex Max High is ranked number one with the highest reasoning and coding averages as well. And then GPT 5.2 high is all the way down here below Opus and Gemini 3. If you look at this

other leaderboard called SimpleBench which basically tests an AI's performance on some common sense tasks surprisingly, GPD 5.2 2 is all the way

down here at eighth place. Note that

this is the extra highinking model. So

it's even below GPT 5 Pro and also below Gemini 3 Pro. Now, after some initial testing, I get the feeling that GPT 5.2 is actually a bit better at Gemini 3 in

terms of geo guessing or basically guessing the location of a photo.

However, for this Geobbench leaderboard it doesn't seem like they've added GPT 5.2 here yet. Now, here's another leaderboard called OCR, which tests a model's ability to recognize and parse

text in images. So, this would be for tasks like turning an image of a table into a spreadsheet, like I showed you in the demos. And over here, GPT 5.2 is

the demos. And over here, GPT 5.2 is also not ranked number one. It's all the way in fifth place. Now, here they haven't added the high or extra high versions of this yet for some reason.

So, maybe those variants would rank a bit higher. Now, it's also important to

bit higher. Now, it's also important to look at the hallucination rate of an AI model, or basically how often does it just make stuff up. And you can see that

GPT 5.2 extra high is all the way in the middle. So, it's not as bad as Gemini 3

middle. So, it's not as bad as Gemini 3 Pro, which is over here. It got stuff wrong 88% of the time according to this test, whereas GPT 5.2 is only 78%.

However, if you're really concerned about like factual accuracy of its outputs, then there are more accurate models for that like Kim K2 thinking or

Gro 4.1 or even Grock 4. So, that sums up my review and tests on this latest GPT 5.2. It's a very capable model. I

GPT 5.2. It's a very capable model. I

was especially impressed in its coding capabilities, and it is definitely among the state-of-the-art models out there. I

mean, things are moving so fast and with all these benchmarks and metrics, it's really hard to actually get an objective sense of how good a model is. At least

for me, its intelligence or performance feels on par with Gemini 3 Pro. But let

me know in the comments what you think.

What other impressive things were you able to get it to do? As always, I will be on the lookout for the top AI news and tools to share with you. So, if you enjoyed this video, remember to like

share, subscribe, and stay tuned for more content. Also, there's just so much

more content. Also, there's just so much happening in the world of AI every week.

I can't possibly cover everything on my YouTube channel. So, to really stay

YouTube channel. So, to really stay uptodate with all that's going on in AI be sure to subscribe to my free weekly newsletter. The link to that will be in

newsletter. The link to that will be in the description below. Thanks for

watching, and I'll see you in the next one.

Loading...

Loading video analysis...