Why Chinese AI Is Suddenly So Good (ft. DeepSeek, SeeDance 2.0) | AB Explained

By Asian Boss

Summary

Topics Covered

AlphaGo's Move 37 Triggered a National AI Awakening
Chip Embargoes Cut China Off From the AI Engine
Constraint Drove China's Software Breakthrough
DeepSeek Built a World-Class Model for Under $6 Million
Data Is the Ultimate Weapon in the AI Race

Full Transcript

Do you remember by any chance what you were doing on March 10th, 2016?

I know it sounds pretty specific, like a complete random date roughly ten years ago, but on that day something happened that would go on to change or rather redefine humanity's relationship with AI forever.

You see, on March 10th, 2016, there were two opponents sitting across from each other in a quiet room in Seoul, South Korea.

I say quiet, but it was actually a fairly large room filled with cameras, technicians, and journalists from all over the world.

Yet, despite all that attention, the atmosphere inside was dead silent and tense.

It had to be because the two opponents were about to begin the second match in the game of Go.

Now, if you've never heard of Go before, it's widely considered the oldest strategy board game that originated in China more than 2500 years ago.

And unlike chess, which most people are familiar with, Go is vastly more complex.

The number of possible board configurations in Go is so large that some experts say it even exceeds the number of atoms in the observable universe.

That's like trillions upon trillions upon trillions of possible board configurations.

Meaning the number of ways a single game could unfold is pretty much unimaginable.

Which is exactly why for decades, up until that point, the general consensus was that machines could never beat a human master at the game of Go.

And even if they were capable one day, that would have been years, if not decades away.

Anyway, on one side of the board that day sat Lee Sedol, one of the greatest Go players in history and a national hero in Korea.

He wasn't ranked number one at the time, but he had already won 18 international titles, making him pretty much a global Go legend known for his creative and aggressive playing style.

So who was sitting across from him as his opponent?

It was a computer program called AlphaGo, developed by the British AI company DeepMind, which had recently been acquired by Google.

Before the match began, which was the best of five series, many Go experts believed that Lee would defeat AlphaGo fairly comfortably.

After all, humans had dominated the game of Go for over 2000 years.

But in game one, after a competitive match that lasted nearly four hours, something unexpected happened.

AlphaGo defeated Lee Sedol.

Most spectators in the room looked on absolutely stunned.

Well, except for two people who looked as if they had expected the result all along.

They were Demis Hassabis, the founder of DeepMind, the company that built AlphaGo, and Sergey Brin, the co-founder of Google, who had flown to Seoul to watch the historic match in person.

But what happened in the next match, game two of the series on March 10th of 2016, would shock the entire Go world, and particularly China, even more.

Midway through the game, AlphaGo made a move that at first looked completely bizarre to the professional commentators watching the match live.

It placed a stone on the board in a way that almost no top human player would normally do in that situation.

For several minutes, even the commentators paused their explanation, struggling to make sense of what they were seeing and wondering if the machine had made a mistake.

Lee himself was so stunned that he left the room for about 15 minutes to take a break, which, by the way, players are actually allowed to do during a match as long as their clock keeps running.

That single move, later known simply as move 37, wasn't a mistake, and it would go on to completely reshape the way professional players thought about the game of Go.

AlphaGo would go on to win the second match as well, and at that moment, something suddenly dawned on everybody watching.

Not only had a machine beaten one of the best human players in the world two games in a row, it had also revealed an entirely new strategy that no human had even considered in more than 2000 years.

By the way, Lee Sedol did manage to beat AlphaGo once in game four, using an extraordinarily creative move that many later referred to as the divine move.

But in the end, he still lost the series 4 to 1, and ultimately it was AlphaGo's sheer dominance, especially that alien-like move 37 that really set China into motion.

In the West, many saw it simply as another impressive milestone in a clever algorithm beating a human at a board game.

But to many Chinese scientists and policymakers, the AlphaGo matches served as a powerful wake up call that dramatically accelerated China's push into AI.

In the following year, the Chinese government released a national strategy declaring AI a top priority, with an explicit goal.

To become the world leader in AI by the year 2030.

Fast forward less than a decade and suddenly we are seeing Chinese AI tools like DeepSeek and Seedance going viral across the internet.

So how did China pull it off?

Well, as I'll bring it up later in the video, the ultimate weapon in the AI race is data.

Big tech companies are ruthlessly scraping the entire internet, collecting every piece of text, video, and personal information they can find.

But these tech giants aren't the only ones harvesting your digital footprint.

And that brings me to today's sponsor, Incogni.

In this new age of automated AI web scraping, your personal information is more exposed than ever.

Right now, an entire industry of data brokers, people-search websites, and background check companies are quietly scraping the internet, to build highly detailed profiles on you.

And that's exactly where scammers and identity thieves go to buy your personal data.

Use it to impersonate you.

Open credit cards in your name and steal your money.

Imagine someone else taking out a loan under your stolen identity, and you don't even realize it until it's too late.

That's why you need Incogni.

Incogni doesn't just remove you from data broker sites, they go way beyond that.

Through their Custom Removals feature, Incogni targets the specific websites and platforms where your personal data is being sold and traded.

They contact these companies on your behalf.

Demand your information be deleted and actually verify that it's been removed.

They handle everything so you don't have to spend months doing it yourself.

Think of it as taking yourself off the internet.

The same internet where your data is being bought and sold every single day without your permission.

I actually use Incogni myself because of my relative public profile and the results had been eye-opening.

So if you want to protect your data from being exploited, try Incogni by scanning this QR code or using the link in the description.

You can use the code ASIANBOSS at incogni.com/asianboss

to get 60% off an annual plan.

That's 60% off to protect your identity and remove yourself from the internet.

I highly recommend taking advantage of this offer.

Now let's get back to the deep dive.

So before we can answer how China suddenly got this good at AI, why don't we clear up a really basic question that almost nobody in these conversations seems interested in clarifying?

What exactly is AI or artificial intelligence?

Because when most people hear that term, maybe they think of ChatGPT or Gemini writing emails or essays.

Maybe they imagine all the ever-more realistic-looking fake videos on social media.

Or you might even be thinking about those humanoid robots dancing or doing kung fu with their crazy somersault kicks.

But here's the thing.

AI is not really one single technology.

It's really more like a stack, multiple layers of technology sitting on top of one another with each one depending on the layers below it to function.

And if you want to understand why the United States and China are locked in what is arguably the most consequential technological rivalry in modern history, you have to understand that stack from the very bottom up, because this race is happening simultaneously across the hardware layer, the model layer and the application layer that billions of people use every single day.

So let's start at the very foundation.

At the absolute bottom of the AI stack is the hardware layer.

This is essentially the physical infrastructure that turns AI from super complex math operations into reality.

We are talking about massive data centers, cooling systems, GPUs, and most importantly, the microchips, which, as the name implies, are incredibly small.

If you've never actually seen how tiny these chips are, take a look at this mind-blowing footage of a microchip under a microscope.

It's so insanely small that when I watched this for the first time, it made me wonder how human beings are even capable of creating something like this.

To put it simply, a microchip is a highly engineered product carved out of a raw material called a semiconductor.

And the most common semiconductor material used today is silicon.

So physically, a microchip is literally just a tiny piece of silicon that has billions of microscopic electrical transistors, or switches, built directly into it.

But what are these switches actually doing?

They're simply flipping electricity on and off to create the language that computers understand.

If the switch is off, it's a zero.

If the switch is on, then it's a one.

String enough of these together and you have computer code.

But here's the cool part.

And it's why we call it a "semiconductor" in the first place.

If you try to build a computer chip out of copper wire, it wouldn't work because copper is a perfect conductor.

The electricity would just flow through it nonstop.

So the switch would always be stuck on one.

On the other hand, if you try to build a chip out of rubber, it wouldn't work either, because rubber is a perfect insulator.

It completely blocks electricity.

So the switch would always be stuck on zero.

But silicon is special in that it naturally sits right in the middle.

Under normal conditions, it blocks electricity like rubber, but if you hit it with a tiny electrical charge, it suddenly transforms and lets electricity flow through like copper.

And that's the magic.

Because silicon can instantly switch back and forth between blocking power and conducting power, human engineers can actually control it.

We can treat those billions of microscopic switches like tiny gates, commanding the silicon to flip between 0 and 1 billions of times per second.

A single advanced microchip today can contain tens of billions of these switches, all crammed into a piece of silicon roughly the size of your fingernail.

And that's just for a regular microchip.

If you start talking about the chips used to power AI, everything is much bigger and much more expensive.

The single most important type of chip in the AI world right now is something called a GPU, which stands for Graphics Processing Unit.

Now, a GPU is still technically a microchip, but physically it's built completely differently than the normal chip you would find inside your laptop.

If you were to look at a GPU, the surface of the silicon is divided up into a massive grid of thousands of tiny brains all packed tightly together.

Originally, GPUs were designed for one very specific purpose, rendering the complex graphics in video games.

Things like shadows, reflections, and three dimensional environments.

But researchers eventually discovered that the exact same architecture that made GPUs so good at rendering images also made them incredibly powerful for running the massive, parallel mathematical calculations that AI systems require.

So when you're text prompting an AI tool to generate a video clip, what's actually happening is that the system is performing hundreds of billions, sometimes trillions of mathematical operations in a fraction of a second.

Pattern matching against everything it was trained on to generate an output that seems coherent.

To do that, you need an engine called GPUs.

The absolute newest, most advanced GPU architecture currently on the market is called the Blackwell B200.

It's so massive that it's not even a single piece of silicon.

It is actually two separate chips seamlessly stitched together to function as one, packing a mind-bending 208 billion microscopic switches.

And the technology is moving so fast that in January 2026, it was unveiled that an even more advanced GPU architecture is already on the way, called the Rubin platform.

Because these chips are essentially the engine of the entire AI revolution, a single one of these GPUs costs between $30,000 and $40,000.

And to train a massive AI model like ChatGPT, you don't just need one, you need tens of thousands of them, all wired together in massive, multi-billion dollar data centers.

By the way, if you've seen the viral clip of the GPU that Nvidia CEO Jensen Huang gifted to Elon Musk, you might be thinking, "Wait, that doesn't look like a tiny chip, it looks like some kind of a heavy hard drive."

And you'd be right.

What he handed to him wasn't just a GPU, but an entire mini supercomputer that has massive cooling fans built in.

The actual piece of silicon doing the math, the GPU itself is only about the size of a playing card.

But here's where things start to get deeply geopolitical, because who actually controls the production of these GPUs?

As of March 2026, Nvidia is the most valuable company on the entire planet, worth roughly $4.5 trillion.

They design the architecture, and they essentially control the global supply of these top tier GPUs.

But here's the thing.

Nvidia doesn't actually physically manufacture the chips themselves.

They design them and then they hand those blueprints over to be manufactured by one specific company in Taiwan, TSMC.

TSMC, or the Taiwan Semiconductor Manufacturing Company, is the only factory on Earth capable of mass producing these hyper-advanced chips reliably.

Today, they control nearly 70% of the entire global chip manufacturing market and over 90% of the market for the most advanced chips used in AI.

Now, you might be thinking, if TSMC is a Taiwanese company and China claims Taiwan is part of its territory, why can't China just force TSMC to give them the chips?

The answer comes down to leverage.

Even though TSMC is a Taiwanese company, their factories rely heavily on American software, American patents and American-made machinery.

And under U.S. law, any foreign company that uses American technology to build a product is strictly banned from selling advanced AI chips to China.

If TSMC were to break that rule, the U.S. would instantly cut them off from the tools they need to survive, effectively shutting the company down.

So then you might wonder if these chips are so incredibly valuable, why can't China just spend billions of dollars to build their own version of TSMC from scratch?

Because manufacturing these chips is arguably the most complex physical process in human history.

It requires decades of accumulated engineering know-how, and it relies on extremely specialized equipment, like $200 million laser machines built in the Netherlands and ultra pure chemicals from Japan.

Even massive tech giants like South Korea's Samsung have spent billions trying to catch up to TSMC, but they still struggle to get their advanced AI chips to work as reliably.

The global bottleneck is so severe that Elon Musk recently announced that Tesla is launching its own massive AI chip factory in the U.S.,

called Project Terafab.

He claimed that he has no other choice because third-party suppliers, like TSMC, simply cannot manufacture enough chips to meet his future demands.

Remember when AlphaGo put fire under China's ass to come up with a sweeping national strategy on AI?

That strategy was called the Next Generation Artificial Intelligence Development Plan, and part of China's plan at the time, in order to become the global leader in AI by 2030, was to dominate the hardware layer as well.

China's initial plan was to invest in their local tech companies like Huawei and SMIC to start producing their own chips, but China likely did not anticipate that in a few years later they would be completely cut off from the global supply of top-tier GPUs.

So it seems like catching up on the hardware front is virtually impossible for China right now, because the U.S. and its allies have essentially blocked China from accessing TSMC, blocked them from buying Nvidia's best GPUs, and blocked them from buying the specialized European and Japanese machines needed to manufacture them domestically.

You can see how this creates a massive bottleneck for China.

Sure, China is perfectly capable of pouring the concrete and building the giant data centers and cooling systems needed to house an AI supercomputer.

That's the easy part.

But getting their hands on the actual engine to power that supercomputer, that's a completely different story.

Because China had been locked out of the most critical hardware supply chain in the world.

Most Western experts assumed their AI ambitions were dead in the water.

No one thought that China would be able to catch up to the U.S.

But what the West underestimated was how much China could innovate on the software front.

Instead of brute-forcing the hardware, Chinese engineers found a way to rewrite the rules of the game.

And that blind spot is exactly why the entire Western tech world was caught completely off guard when China announced an AI model called DeepSeek.

Okay, so hopefully by now it makes sense that AI is really a stack of different technologies.

We talked about the hardware layer at the very bottom, which is mainly those insanely powerful GPUs.

Now, sitting directly on top of that hardware is what I referred to earlier as the software layer.

Now, to be more accurate, we should really call this the model layer though there's a software element involved.

And this is where the actual brain of modern AI lives.

And arguably the layer where China proved that it could compete.

Now, to be fair, the hardware layer is still super important.

You cannot build your model on nothing.

Yes, China was cut off from the most advanced AI chips coming out of the United States, but Chinese researchers could still work with older, less capable GPUs from Nvidia that they had stockpiled before the U.S. export bans.

Because they were limited on the hardware front, they had to be more innovative and squeeze far more performance out of those chips through smarter engineering and lower cost training strategies.

They became obsessed with efficiency, constantly asking questions like, "How do we get better results from weaker hardware?"

Or, "How do we reduce wasted computation?"

And that brings us to the model layer.

This is where terms like foundation model comes in.

Take ChatGPT for example.

The ChatGPT app is not actually the model itself.

It is the consumer-facing app wrapped around the foundation model or the brain that ordinary people can interact with.

And the latest underlying brain powering ChatGPT as of March 2026 is a model called GPT 5.4.

But you might be wondering, how do engineers even build the foundation model?

The answer traces back to a massive engineering breakthrough in 2017 by American researchers.

They introduced a brand new architectural blueprint for AI called the Transformer.

Think of the Transformer like a revolutionary new engine design.

Before 2017, AI processed text strictly one word at a time, left to right, the same way a human reads a book.

It was slow, and if a sentence was too long, the AI literally forgot the context of what it was reading by the time it reached the end.

The Transformer completely changed the game because it was mathematically wired to look at an entire, massive block of text all at once.

More importantly, it drew invisible mathematical connections between every single concept, so it never lost the context.

And when you take this transformer engine and design it specifically to process an absolute mountain of human text, that's what we call a Large Language Model or LLM.

Now, at its absolute core, the structure of an LLM is basically just playing the ultimate game of guess the next word, kind of like the simple autocomplete feature on your smartphone when you text somebody, right?

But thanks to the Transformer architecture, the foundation model never loses context.

And because it gets trained on grammar, logic, physics, math, and basically everything else on the internet, its predictions go way beyond simple autocomplete.

When it guesses the next word, it starts looking and acting exactly like complex human reasoning.

Almost every major AI company in the world, including OpenAI, built their model on top of this exact Transformer blueprint.

But building a massive Transformer brain requires thousands of top-tier, state-of-the-art GPUs, which again, the U.S. had banned China from buying.

So Chinese companies like DeepSeek had to take that Transformer blueprint and fundamentally alter it to make it cheaper and more efficient.

And they pulled off two massive breakthroughs to do it.

First, they pushed an architecture called Mixture of Experts, or MoE, to an absolute extreme.

An expert here is literally just a cluster of artificial neurons inside the model that specializes in recognizing specific patterns.

So if you ask a math question, the model routes it to the cluster of neurons that recognizes math equations.

Now, in older, dense AI models, every time you ask a question, the entire massive brain would have to light up to calculate the answer, which consumes enormous GPU computing power.

To address this, American companies like OpenAI pioneered the use of MoE, which essentially divides the model into multiple specialized expert clusters.

However, DeepSeek engineers took this concept to an engineering extreme.

Instead of dividing the model into dozens of expert clusters, they slice it into 256 tiny, hyper-specialized experts.

So when you ask DeepSeek to solve a coding problem, an ultra efficient router instantly kicks in and activates just eight of those tiny experts, leaving the vast majority of the brain completely asleep.

But the Chinese didn't stop there.

They paired this extreme fragmentation with a brand new technique they invented called Multi-head Latent Attention, or MLA.

To understand what this does, think about having a conversation with an AI.

When you ask it a long, complicated question, the AI has to constantly hold the previous parts of the conversation in its head so it doesn't lose context.

In engineering, this is called the key value cache, but you can just think of it as the AI short-term memory.

Normally, storing all the short-term memory takes up a massive amount of GPU power, but DeepSeek's new MLA technology acts like an extreme memory compression tool.

It essentially shrinks the AI short-term memory footprint by over 90%.

This allows the model to perfectly keep track of what it is thinking about while using dramatically less memory, which makes the entire system incredibly cheap and highly efficient to run.

Second, they also made profound optimizations to the way hardware communicates because DeepSeek had to use older, less powerful Nvidia GPUs.

They couldn't just rely solely on Nvidia's default software.

Usually, every AI company in the world relies on Nvidia's industry standard software, called CUDA, or Compute Unified Device Architecture, to run their chips.

CUDA is like the automatic transmission of AI development.

Fast, reliable, and easy to use.

But what DeepSeek did was dig deeper, building an intermediate assembly layer within the CUDA ecosystem called PTX, which stands for Parallel Thread Execution.

By writing highly customized, low-level code using PTX, they shifted from automatic into manual.

They forced those older chips to communicate and process calculations much more efficiently than Nvidia's default general purpose software allowed.

Now, could American tech companies do this, too?

Absolutely.

OpenAI, Anthropic, and even Nvidia itself regularly develops highly optimized, low-level operators.

The difference here is all about necessity.

DeepSeek had no choice but to pursue extreme software optimization because they couldn't access the latest high-end GPUs.

For companies like OpenAI, who have virtually unlimited budgets and unlimited access to top-tier chips, it is often faster and more cost-efficient to just buy more hardware than it is to spend months meticulously optimizing low-level code.

So these incredibly complex software tricks, along with their version of MoE and memory compression technology, are exactly how DeepSeek managed to build a world-class AI model for reportedly just under $6 million, while OpenAI was spending hundreds of millions to train GPT models.

But the ultimate killer feature of DeepSeek's model layer wasn't just the architecture itself, it was distribution.

What I mean by that is DeepSeek made the model open source.

In practice, that means researchers, startups, and developers around the world could inspect it, run it, fine-tune it, and build on top of it themselves.

This is also how the general public was able to understand the key design choices inside the model in the first place, including details about its mixture of expert system and exactly how many experts they divided the brain into.

That is the exact opposite of the closed model approach used by companies like OpenAI.

The bottom line is that DeepSeek effectively turned its model into a platform and started handing out its secret recipe behind the model.

And once that happens, progress no longer depends on one company alone.

Thousands of outside engineers can start experimenting, refining, and extending the system in parallel.

Anyway, this incredibly structured model, whether open source or not, is still completely useless by itself without one thing to get it going.

It's pretty much just an empty brain at this point, and it needs fuel.

And that fuel is data.

And this is exactly where China's AI story gets interesting, because once you understand that you need not just the hardware and the model or the brain, but the massive amounts of data to fuel it, you realize that China holds a structural advantage that the U.S. fundamentally cannot match.

And that brings us to the third and most visible layer of the entire AI stack, the consumer app layer.

And the unique way China actually generates and collects its data.

To say that the arrival of DeepSeek, the Chinese open source foundation model, sent shockwaves through Silicon Valley and the broader Western tech community would be an understatement.

For many Western engineers and investors, it felt like a Sputnik moment.

If you aren't familiar with that term, it is a reference to 1957, when the Soviet Union unexpectedly launched the first satellite into space and completely stunned the United States.

In tech, a Sputnik moment is a sudden, shocking reminder that a strategic rival is catching up much faster than expected and doing so through a completely different path.

But if you thought DeepSeek's advantages stopped at software tricks and model architecture, you're missing the final piece of the AI stack.

We covered the hardware layer, right?

And then I just talked about the model layer.

Now comes arguably the most important layer of all and the true bottleneck of the entire AI industry.

The data layer.

An AI model, no matter how elegant its design, begins as an empty brain.

It has no inherent understanding of the world.

To become intelligent, it has to be trained on an enormous amount of human knowledge and behavior.

You've probably heard the term big data, right?

We are talking about text from the entire internet.

Books, news articles, academic papers and so on and so forth.

During the training process, the model is repeatedly exposed to billions of patterns inside the scraped data.

Every time it reads a sentence, it adjusts the invisible connections inside its brain over and over again, until it becomes extraordinarily good at predicting exactly what the most coherent, logical answer to your question should be.

And for years, the assumption in the West was that the smartest model would come from the companies that could buy the most human expertise.

They hired huge teams of annotators, researchers and domain specialists to create premium training data.

Step by step math solutions, carefully written coding examples and highly structured explanations designed to teach the model how to reason.

That approach still worked, but it was also brutally expensive because expert knowledge is expensive.

Imagine paying not just one lawyer, consultant or software engineer by the hour, but thousands of them at scale to handcraft the AI study material.

But DeepSeek pushed much harder into a different approach, reinforcement learning.

Instead of constantly showing the model the correct reasoning process written by a human, you let the model generate many possible answers on its own.

Then a scoring system checks the result.

If the answer is correct, clear, or useful, the model gets rewarded.

If it's wrong, sloppy, or inconsistent, it gets penalized.

So the model is not learning the way a student memorizes an answer.

It is learning more like a person solving practical problems over and over again, slowly discovering which strategies lead to success.

Over time, the system starts reinforcing the patterns that produce better reasoning.

Now, at this point, you might be asking, "If this is so efficient, why didn't OpenAI, Google, or Meta just take this approach themselves?

The truth is, American AI companies do use reinforcement learning, but they rely on a hybrid approach.

They spend hundreds of millions of dollars on human expertise to guarantee that the AI is highly controllable, safe, polite, and commercial ready.

And then they layer reinforcement learning on top of that.

DeepSeek, on the other hand, didn't have billions of dollars to spend on human tutors.

So out of pure financial necessity, they leaned into pure reinforcement learning much harder and much earlier in the training process.

But there is a downside to this.

When DeepSeek first tested this pure reinforcement learning approach, they ran into a massive problem.

The AI became brilliant at logic, math and coding, but it became terrible at communicating with humans.

Because the AI was only being rewarded for getting the right mathematical answer, it stopped caring about how it sounded.

DeepSeek's engineers admitted that the model started suffering from what they call "language mixing."

It would start thinking in a bizarre, unreadable hybrid of English and Chinese.

It basically turned into a genius mathematician who mumbles to himself and doesn't know how to speak to normal people.

So to make the model useful for the general public, DeepSeek still had to go back and use a small amount of that expensive human-labeled data just to teach the AI how to format its answers clearly.

But still, DeepSeek proved that you could build a core reasoning engine of a world-class AI for a fraction of the cost.

Alright, are you still with me?

I know this is all very technical, and I'm trying my best to explain this in the simplest way possible based on our research.

If you made it this far, you now understand the two crucial parts of the AI race.

The chips, the reasoning.

But solving the reasoning problem is really only the beginning.

I remember when ChatGPT first came out.

I was so blown away that I kept experimenting with it nonstop.

But fast forward to today and you hear people complaining about how models like Claude or Gemini just aren't good enough anymore, despite how amazing they are at reasoning.

And that is because, number one, people's expectations can never be satisfied, and number two, the real world doesn't just operate based on text.

There is a hard limit to these large language models like ChatGPT and Gemini.

Because no matter how advanced they are or how much data they were trained on, they're fundamentally based on language.

But how do you teach an AI to understand the physical world?

You cannot just describe a sunset or the physics of water splashing in text, right?

You need images.

You need audio. You need video.

Basically, you need a massive amount of high definition video, audio, and images to train the next generation of AI.

In the industry, this is called multimodal data.

If a language model is like a brain trapped in a dark room that only knows how to read text, a multimodal model is a brain that has been given eyes and ears.

It can process multiple modes of information at the same time, meaning it can look at a photo, listen to an audio clip and understand exactly what is happening in the physical world.

And that is precisely where the US AI industry is currently hitting a massive brick wall.

American AI companies trained their early model by scraping the open internet, often without permission and probably illegally.

I'm talking about platforms like YouTube, Reddit, X, and public websites.

But from our research, as of 2026, they've essentially run out of high quality multimodal data.

Whatever they could scrape off the internet, they already have.

Even Elon Musk has publicly warned about this, stating that because the industry is running out of real human data, AI companies will have no choice but to rely on what is called synthetic data.

That is essentially AI generating its own data to train itself.

Very similar to the self-teaching reinforcement learning loops we covered earlier.

Worse yet, the real human data they have managed to scrape is often heavily compressed, highly fragmented, and increasingly locked behind strict copyright lawsuits and privacy laws.

But in China, the ecosystem is entirely different.

And that brings us right back to those crazy viral AI videos you're seeing all over the internet.

These hyper realistic videos are being generated by Chinese AI tools like Seedance 2.0.

To understand why a Chinese company is suddenly dominating this space, you have to understand the consumer app layer, which is the final piece of the AI stack.

In China, the digital economy is dominated by the so-called super apps like WeChat and video platforms like Douyin, the Chinese equivalent of TikTok.

And companies like ByteDance, the parent company of both domain and TikTok, doesn't just host videos.

These are all consumer-facing apps, and they operate the most efficient, high-volume video data pipelines ever engineered.

Every single day, hundreds of millions of Chinese citizens use these apps to upload ultra-high definition videos covering every conceivable aspect of human life.

From cooking and dancing to complex mechanical repairs, drone footage, and daily vlogs.

So guess who owns Seedance 2.0?

That's right, ByteDance.

Because ByteDance literally owns the platform, they possess the native, uncompressed video files directly on their own servers.

More importantly, that video is perfectly categorized and paired with exact user engagement metrics.

When their AI is training on this data, it isn't just looking at a video of a person walking.

It has access to the metadata.

It knows the exact camera angle, the lighting conditions, and the exact millisecond a human viewer lost interest and swiped away.

This is a perfectly labeled, infinitely growing database that exists entirely behind the Chinese internet wall.

And that is precisely why Seedance can blow away other AI video generators like OpenAI's Sora.

Because Sora still struggles with physical consistency, audio synchronization and visual hallucinations, largely because OpenAI hit the limits of the multimodal data they could legally and cleanly scrape.

In fact, Bytedance's AI chatbot Doubao has already surpassed deep seek in users and is currently the number one AI chatbot in China.

And the main reason for that is simple.

While DeepSeek is great at text and reasoning, it cannot handle images or video.

Doubao, on the other hand, taps into Bytedance's massive data engine to seamlessly generate AI images, cinematic videos, and realistic voices, all in one place.

So now you know why Seedance is fundamentally better than any of its American equivalents, due to the sheer quality and structure of its training data.

Engineers call this natural motion synthesis.

When Seedance generates a video of a person walking through a puddle, the water splashes correctly.

The reflection matches the environment, and the sound of the splash syncs perfectly with the visual.

And when you look at it from that perspective, China's advantage starts to look much bigger.

Because if DeepSeek showed that China could compete in reasoning, companies like ByteDance showed that China also sits on one of the deepest reservoirs of multimodal consumer data in the world.

And of course, this is all possible because China has a population of 1.4 billion people and Douyin has over 750 million daily active users constantly feeding the machine.

That being said, the flip side of the coin is that these Chinese AI tools might eventually struggle with accessing multimodal data outside of China.

If they want their models to perfectly understand Western culture, Western physics, or Western cityscapes, they just cannot rely on Douyin, the Chinese video platform, right?

They will eventually bump into the exact same data wall problem as the American AI companies.

To me, this means that the AI race is far from over and we are barely scratching the surface.

It's just shifting to a new battlefield.

When it comes to the future, specifically the rise of AI agents and physical robots, which is a whole other topic, my intuition and I could be wrong here, is that there's still a massive amount of real life data waiting to be discovered and trained on.

It just might not be on the internet.

For example, at Asian Boss, we've been conducting real-life street interviews for over a decade, collecting and curating people's honest opinions on social and cultural trends.

What if we had the capacity to do a lot more of these straight interviews?

Or better yet, what if we created our own app layer to help represent ordinary people's voices in a video format, and show the world why people think the way they do?

If the future of AI requires understanding real human beings, then maybe the most valuable data won't come from scraping websites.

Maybe it'll come from actually talking to people.

If you found this video insightful and felt like you learned something new, I also want to let you in on something that would be far easier to understand than what I've just explained.

Our analytics show that only about 23% of you watching right now are actually subscribed to our channel.

And I'll tell you why that matters.

Because we put so much time and effort into our research, we cannot really upload on a fixed schedule, like the same day every week.

So unless you actually subscribe to our channel and turn on the notification bell, the algorithm will often just bury our latest video whenever we upload it.

What that means is that when a new video comes out, sometimes you won't even see it.

Even if your regular viewers of our channel.

So if you regularly watch our videos, not just because of the interesting topics, but because of how we deep dive and break things down, please do me a huge favor.

Subscribe to Asian Boss and turn on the notifications.

It'll help us beat the algorithm, reach more viewers, and build a brand big enough to truly rival legacy media outlets.

You can also participate by leaving comments, emailing us, or filling out the form in the description if you'd like to join our future live streams. The goal here is to build a real community of culturally curious people and future leaders who help power the content that we create, so that we can be the go-to source for authentic, nonpolitical insights on all things Asia.

I really hope you'll be a part of it.

Of course, I'm Stephen Park.

Thank you for watching all the way to the end.

And as always, stay curious.

Loading...

Loading video analysis...