LongCut logo

Gemini 3 Flash:比2.5 Pro还强?全面测试Flash | 编码+多模态+写作

By NiceKate AI

Summary

## Key takeaways - **Flash基准超Gemini 3 Pro**: 在ARC-AGI-2和MMMU-Pro基准上,Gemini 3 Flash的得分超越了Gemini 3 Pro;在SWE-bench Verified编程基准上,其得分达78%,超过Gemini 3 Pro的76.2%。 [00:47], [01:09] - **Flash思考模式省令牌30%**: 在最高思考级别处理时,Gemini 3 Flash能智能调整思考深度,对于日常任务平均减少2.5 Pro 30%的令牌使用量。 [01:12], [01:27] - **多模态识别精确细致**: Flash能识别图像中14只火烈鸟并指出中间有白色一只,还能通过裁剪放大识别上海博物馆小logo,以及辨识模糊Muji收据。 [10:32], [11:07] - **代码生成网页生动交互**: Flash生成羊毛剪羊店有旋转风扇和悬停提示;打字机提供实时打字速度反馈;SVG收音机控制面板细节丰富。 [02:14], [02:55] - **支持视频音频生成页面**: 上传30秒Shadcn视频,Flash生成详细网页介绍核心引擎、视觉风格和预设;音频上传后总结李沐读论文三步法,并识别马保国名言。 [03:18], [04:18] - **Claude评测Flash胜DeepSeek**: 在Claude Opus评测的诗歌写作、翻译和小任务中,Gemini 3 Flash的答案优于DeepSeek V3.2。 [12:04], [12:26]

Topics Covered

  • Flash基准超Pro
  • 网页生成生动逼真
  • 视频音频智能总结
  • 规划执行提升编码
  • 多模态理解领先

Full Transcript

Hello everyone, I'm Kate.

Gemini 3 Flash has finally been released In the Gemini app When we choose "thinking," it means it's Flash with thought, which is slightly different from before.

This time, Flash is Yes, it is on multiple benchmarks On par with 2.5 Pro And it's very fast We can use it on multiple platforms starting today, such as Gemini CLI and Google's Antigravity.

Google is really emphasizing Gemini 3 Flash this time.

It's fast, cheap, and smart The price of Gemini 3 Flash is higher than the previous 2.5 Flash It's a bit more expensive We can see that on the ARC-AGI-2 benchmark What's its score?

it actually surpasses Gemini 3 Pro On MMMU-Pro Its score also surpassed Gemini 3 Pro One benchmark closely related to programming is SWE-bench Verified Its score reached 78% exceeding Gemini 3 Pro's 76.2% This also came as a surprise to many people.

What about Gemini 3 Flash?

When processing at the highest level of thought It can intelligently adjust its level of thinking For everyday tasks It's 2.5 Pro 30% average reduction in token usage Even though it has "Flash" in its name But it's much smarter now than it used to be.

We can focus on Multimodal understanding and reasoning It uses code execution You can zoom in on images Let's look at the discussion on Hacker News Some users have said This is their new favorite Because it's very fast The world's knowledge coverage is extensive We can use this table To get a more intuitive look

From 1.5 to 2.0, then to 2.5 And now to 3.0 Flash's price Gradual increase Flash supports four levels of thinking It's more flexible than the Pro version This makes it a neo-grotesque webpage If you look at its web design It's truly beautiful And here, I'm just using its fast mode.

This is the sheep shearing shop generated by Flash We can see the fan above It's spinning very well And then there's a cloth here It's very good When I hover over an element It will show a tooltip This is something other models haven't processed The only slight regret is A sheep barber is missing here

This is the typewriter it generated When I type content Above, it will provide real-time feedback on the typing speed and accuracy The overall page design is quite fresh and clean.

This is an SVG of a radio-style control panel generated by Flash.

I've also had Gemini 3 Pro do this before Clearly perceive Flash Compared to Gemini 3 Pro It's going to have some missing details Yes, Google's model First, the context is relatively long There are 1 million What's needed is that it supports video and audio So it has many uses For example, I'll give it a 30-second video here

And then have it describe the content of the video Help me generate a visual page We can see that this video is about Shadcn's new create feature What about the right side?

You can see It gave me a detailed introduction in the form of a webpage The core engine used here Visual style design Features and final product Below, it also displays presets for visual styles.

This is where it gets very interesting.

When I select different presets on the right, On the left, its page will undergo some changes.

I uploaded Teacher Li Mo's audio on how to read papers to it Then let it introduce the content below the audio It summarizes quite well The first step is to determine the category of the paper Background and Relevance The second time is to understand the general meaning Grasp the main content and logic of the paper And then the third time It's a deep understanding Reading word for word, sentence by sentence Equivalent to

It helped us organize our notes very well I also uploaded a fun audio clip It can also quickly help me identify This is the opening line of Ma Baoguo's quotes.

And then all the subtitles match up.

And it can connect to the internet But When I upload an image like this to it The content replicated here is completely unrelated to the content I originally gave it Google Gemini also has a very good feature It can directly generate editable slides Its layout style Still very fresh Compared to the slides generated by NotebookLM It's a completely different concept

The slides generated here are relatively concise.

Now, let's take a look I used the plan mode in Cursor to have Gemini 3 Flash generate some pages.

First, I looked at price comparisons The price it offers Very simple.

So this might be Cursor's own search Not very powerful The second is the visualization it creates The visualization effect it creates feels pretty good But in terms of speed, it's relatively fast right now.

This is the periodic table it generated.

The overall page is much worse than the Gemini 3 Pro I used before.

much worse When I click You can choose any two elements to make such a very intuitive comparison.

I need to mention When I was on the Gemini official website Let Flash create such a periodic table It just has some errors And here in Cursor First plan Then execute The generated effect is often the one on the official website.

We need to look at future career data analysis and visualization again I downloaded a CSV file from Kaggle.

And then?

I want Flash to help me perform a data visualization analysis in plan mode.

Help me perform a data visualization analysis But I still feel that it doesn't have many dimensions It's still a bit too simple Terracotta Warriors Dance Click once Awaken the Qin Dynasty Dance King This dance effect is among all the models I've tested very, very interesting.

Look at this hypnotic math This effect is something I've never seen before Let's switch to another one This effect is also very cool Let's change the Mandelbrot fractal again.

When I don't click, it's a fractal like this Its fractal is still very beautiful And it automatically hides the left-side interaction This makes it easier for us to be hypnotized by it Then I'll switch to dynamic fluidity again These are also very good effects Let's look at the digital calligraphy and painting application made with Flash

On the left, you can see that the menu is relatively simple.

Style Still quite aesthetically pleasing Choose different patterns The rice-patterned grids are still quite good.

You can also re-lay the Xuan paper Affix a seal There's a bit of a problem here.

This seal can't be stamped Flash-made Minecraft style Eating dumplings and watching the Spring Festival Gala at a New Year's party But we see such green little people It's embedded in this table.

So there's definitely a bit of a problem with the space here.

The dumplings on the table are presented quite well The most interesting thing is the TV across the table Above it is a display of the 2026 Spring Festival Gala It's red, but with colorful particles.

Very, very interesting.

It has an upside-down "fu" character That means good fortune is coming, very, very good Very familiar Our traditional culture Now let's look at the 3D domino simulation But I can click to push it to the first one The force here is not enough Click again Now, let's start to push it down little by little.

Look at this console again.

Click the button below It triggered a red alert This simulation is still very similar And it also has a flickering effect The "History of Humanity" page made with Flash is also very good.

First, a timeline.

Then, an area chart Next is such a line chart that meets the requirements I initially set for it Let it make more charts This is the minimalist hair salon it made I especially like this picture it cited The words below These professional services Because it uses a hat So it feels a bit outdated here What happened next?

I told it I want to use Remotion.js to display different pages And I need to make a video So it helps me plan first Soon, it started to show But the image it first gave me was like a slide show a colorful slide Then paste a piece of text So I need it to be modified

Based on existing projects To truly present This is the modified effect, the dinosaur tree.

And then this is 3D dominoes This is DNA And this is Monica Bedroom The last one is the periodic table Here, I was a bit lazy and just used numbers directly to represent them Other pages We can also see It uses screenshots And it uses Playwright to take screenshots.

Overall, I think it did a decent job It's a bit better than I expected.

Finally, let's take a look at Flash.

Image recognition and writing capabilities This is to make it recognize butterflies I mentioned Flash at the beginning Its image understanding and reasoning capabilities are superior to Gemini 3 Pro.

Gemini 3 Pro can easily answer questions like these.

Flash, on the other hand, on these questions Basically, the answer was correct.

This is to make it identify the number of flamingos.

It answered that there were 14 It says it's worth noting In the middle, there's a white one.

It specifically mentioned This is very interesting.

This is a lead type case.

Previously, in my tests, Only Gemini 3 Pro can recognize the text above Sent it to Flash It quickly told me where the content came from So Flash is really strong in multimodal aspects When I give it such a The front is quite reflective The image here actually includes the Shanghai Museum logo

But this logo is very small No model could identify it before As I just mentioned Flash model It has a code execution tool It can crop and zoom So I can understand and recognize it The logo at the top is from the Shanghai Museum.

Let's give it another receipt.

It recognized this as a Muji receipt Actually, the original image is quite blurry Some previous models would identify Meituan as "America" Let's look at this picture again.

This is an image with a tilted perspective.

And a lot of the content it identified is very accurate Later I looked at the reference link below to find out It turns out it also called the image search tool here Finally Let's look at the performance on other tasks at the beginning Then I Compared with DeepSeek V3.2 Answer 1 is Gemini Answer 2 is DeepSeek For this small task of writing poetry,

Opus believes Gemini's answer would be better.

And for translation tasks like this, Answer 1 is Gemini Answer 2 is DeepSeek Opus believes both answers completed the task But with different focuses This is the last question Similarly The answer is Gemini.

Opus thinks Gemini's answer is better.

That concludes today's introduction to Gemini 3 Flash All content Overall, It still gave me a lot of surprises in terms of coding.

Everyone, remember to build a good framework and use First, make a plan And then execute it I recommend everyone use

Loading...

Loading video analysis...