LongCut logo

Gemini Can Now Write You a Song

By The AI Daily Brief: Artificial Intelligence News

Summary

## Key takeaways - **Lyria 3 Generates Music from Multimodal Inputs**: Google has launched Lyria 3, the latest version of DeepMind's music generation model, which allows users to generate music clips based on text, images, or video inputs, unique compared to Sunno's text-based input. [00:16], [00:22] - **30-Second Clips for Fun Social Use**: Lyria 3 produces 30-second clips not capable of building entire songs, aimed at generating background music for YouTube shorts or fun personal song messages, with Google's goal to provide a fun way to express yourself rather than musical masterpieces. [00:45], [01:13] - **Anthropic Clarifies Ooth Token Policy**: Anthropic updated terms prohibiting use of Ooth tokens from free, pro, or max accounts in other products like OpenClaw agents, but clarified it's a docs cleanup not changing agent SDK or Max subscription use, intending to make third-party businesses pay via API. [02:49], [03:51] - **All Labs Restrict Modular AI Use**: Anthropic is late to restricting Ooth tokens for non-Claw apps, as Google had already banned it for OpenClaw, and users should read OpenAI and Google Gemini terms showing similar policies against modular AI use cases. [04:17], [04:39] - **Meta Revives AI Smartwatch Malibu 2**: Meta has revived plans for a smartwatch codenamed Malibu 2 with health tracking and built-in Meta AI assistant, originally featuring cameras and nerve signal reading, planned for release this year amid focused wearable bets. [05:25], [05:41] - **Chinese Models Lag in Real-World Tasks**: Chinese models show high benchmarks but fall extremely short in real-life agentic behavior outside coding, due to distilling frontier models leading to shallow intelligence and training for evals; test them yourself beyond benchmarks. [07:49], [08:20]

Topics Covered

  • Google's Multimodal Music Flexes Video Alignment
  • AI Labs Tighten Terms Against Free Token Abuse
  • Meta Revives AI Smartwatch with Neural Control
  • 16 Sub-Agents Debate for Deeper AI Answers
  • Chinese Models Fail Real-World Agentic Tasks

Full Transcript

Welcome back to the AI daily brief headlines edition. All the daily AI news

headlines edition. All the daily AI news you need in around 5 minutes. Today we

kick off with Google's continuing quest to have AI products in every single multimodal category. The latest news is

multimodal category. The latest news is that the company has launched an AI music generator called LIIA 3. It's the

latest version of DeepMind's music generation model and allows users to generate music clips based on text, images, or video inputs, which is pretty unique compared to something like Sunno, which is of course just textbased input.

Lyrics can be generated in eight different languages, including German, French, Spanish, and Hindi. The feature

can be accessed directly in the Gemini app by switching to a musical output.

It's also being added to YouTube's Dream Track tool to allow creators to quickly generate soundtracks for YouTube shorts.

Each track is accompanied by custom cover art generated by Nano Banana. Now,

previous versions of Lria have only been available through Google's cloud's Vertex program, so this is a big expansion in access. However, there is a pretty significant limitation, which is that these are 30-second clips. The

model itself isn't really capable of building on top of the initial generation. So, this feature won't be

generation. So, this feature won't be useful to generate entire songs.

However, and it's pretty clear that this is the use case they're imagining initially. This could be extremely

initially. This could be extremely useful for generating background music for YouTube shorts or fun interactive personal types of song messages. Indeed,

this appears to be what Google had in mind with them writing, "The goal of these tracks isn't to create a musical masterpiece, but rather to give you a fun, unique way to express yourself."

And really, while it would be tempting to compare this to Sunno, this is actually more of a social feature than anything else. We've talked in the past

anything else. We've talked in the past about how one of the really interesting things about Sunno is the extent to which it is used not for any sort of professional or work music generation, but just as a fun interactive mode, and

LIA really seems to be doubling down on that. Google has also embedded their

that. Google has also embedded their synth ID audio watermarks into the music, so they're easily flagged as AI generated. A lot of the discourse around

generated. A lot of the discourse around the first tries is that this is indeed not and that Sunno's generations feel much more polished and musically complex. On the flip side, others point

complex. On the flip side, others point out how Google just keeps adding new arrows in its multimodal quiver. Aaron

Upright comments, "People talking about OpenAI versus anthropic and Gemini just over here quietly getting more powerful.

People underestimate the importance of an easily accessible multimodal platform when it comes to adoption." Chaien Xhiao sees the future, writing, "Video to audio alignment is the real flex here.

Generating lyrics and vocals that actually sync with visual cues in real time is a massive multimodal serving challenge. Lia 3 probably relies on some

challenge. Lia 3 probably relies on some crazy high throughput infr to keep the latency low enough for creative workflows. Ultimately, I think we are

workflows. Ultimately, I think we are just scratching the surface on what role generated music is going to play, and Google is now firmly in that game as well.

Next up, a bit of a controversy that ended up being less of a controversy than it seemed, but still taught us some interesting things around the state of competition. A change in Anthropic terms

competition. A change in Anthropic terms of service triggered a tinderbox of complaints from those using Claw to power their OpenClaw agents. This week,

Anthropic changed their policies now stating using OOTH tokens obtained through claude free pro or max accounts in any other product tool or service including the agent SDK is not permitted. Now, to clarify, OOTH tokens

permitted. Now, to clarify, OOTH tokens are kind of like API keys for regular Anthropic subscriptions, allowing users to access AI models through thirdparty apps. And of course, a lot of the

apps. And of course, a lot of the attention is around the people who have been using their Claude Max accounts to power their Open Clause. Indeed, Alex

Finn writes, "This is going to piss off a lot of OpenClaw users paying $200 a month." The tweets like this one were

month." The tweets like this one were too numerous to count. Hubert Leiki

writes, "Anthropic is in an active self-destruction mode now. First, they

went after tokens you already paid for blocking use in non-claw code apps. Then

they send their lawyers after developers for supposed branding infringement. And

now this. Open code, Gemini CLI, Codex CLI are all legitimate coding agents with comparable features and abilities, but Anthropic are behaving like they're still the only player on the block. Now,

all of this caused Anthropics to reach Shahipard to comment writing, "Apologies. This was a docs cleanup we

"Apologies. This was a docs cleanup we rolled out that's caused some confusion.

Nothing is changing about how you can use the agent SDK and Mac subscriptions." He added that the

subscriptions." He added that the intention isn't to block personal tinkering, but rather to force third party businesses to pay for usage through the API. Unfortunately, the

confusion only continued with that unclear clarification. Podcaster Felix

unclear clarification. Podcaster Felix Javin wrote, "Brother, can you just tell us whether we can use OpenClaw or not?

And it seems like if you're using it to build your own personal agents, the answer is yes, but the incident raised a ton of discussion about how long the big AI labs will continue to support these modular AI use cases." Some tried

switching providers only to find that Google had already banned OOTH for the use case. Richard Hulcom wrote, "I feel

use case. Richard Hulcom wrote, "I feel like getting banned by Google for using anti-gravity OOTH with OpenClaw is a right of passage. I was already not impressed with Gemini preferring Anthropic and Open AAI, but now I really

have a bad taste in my mouth." Colin

Darling, however, pointed out, "Everyone upset about Anthropic's update to their terms would be wise to read the OpenAI and Google Gemini terms while they're at it. I'm bummed out, too, but Anthropic

it. I'm bummed out, too, but Anthropic is late to this party, not leading it."

In any case, the controversy quickly faded, but there is a lingering question about walled gardens and what they're going to mean for AI going forward.

Next up, more news in the AI wearables category. Meta has revived plans to

category. Meta has revived plans to release a smartwatch as part of their AI device lineup. Rumors of a Meta

device lineup. Rumors of a Meta smartwatch started circulating in late 2021, complete with leaked photos of a prototype. The device was given the

prototype. The device was given the internal code name Malibu and featured two cameras, one in the dial for video conferencing and another on the underside of the watch. The idea was that users could quickly remove the watch to take a photo. Another big part

of the design brief was the ability to read nerve signals in the wrist, allowing the device to be used as a controller. This concept has since gone

controller. This concept has since gone on to feature in Meta's haptic control wristbands for their Orion smart glasses prototype, which was unveiled in late 2024. That said, by the summer of 2022,

2024. That said, by the summer of 2022, Project Malibu was killed off and Meta shifted focus to smart glasses as their big wearable play. Now, the information reports that Meta has revived the smartwatch under the code name Malibu 2.

The watch is said to include health tracking features and a built-in Meta AI assistant. Sources said the revival

assistant. Sources said the revival effort came out of a project strategy meeting late last year. Executives are

reportedly concerned about a bloated product lineup for augmented reality glasses. So have delayed some products

glasses. So have delayed some products to focus on a limited number of concentrated bets. Among them is a new

concentrated bets. Among them is a new version of the Ray-B band displays which is expected later this year as well as a pair of AR glasses which could arrive in 2027. The smartwatch is planned for

2027. The smartwatch is planned for release this year putting Meta in direct competition with Apple and Google in the category. Now, one thing to watch for

category. Now, one thing to watch for will be how far each company goes in making the smartwatch an integral part of their wearable AI stack. Earlier this

week, we covered rumors that Apple was working on a trio of new AI enabled devices, namely smart glasses, a pendant, and a camera equipped version of the AirPods. That report mentioned that a camera equipped version of the Apple Watch had been passed over as an

AI device with testers reportedly finding the prototype impractical due to clothing sleeves obscuring the camera.

Ultimately, we don't know how Meta is thinking about the Maliba 2, but they are very clearly focused on this wearable category as a place for their AI strategy.

Next up, another follow-up in the Gro 4.2 public beta. XAI has announced a new version of Gro Heavy, and this one goes to 16. The big innovation with Gro 4.2

to 16. The big innovation with Gro 4.2 was the inclusion of four sub agents to debate responses before providing a final answer. Opinions were a little

final answer. Opinions were a little mixed on whether this was actually a useful feature, but it's an interesting experiment if nothing else. Gro Heavy

turns the sub agent count all the way up to 16 in a bid to either get better answers or at least burn through a ton of tokens getting an output. XAI

community promoter Ted Suo shared an output from the query. How does chaos birth cosmic order? The agents debated the response for a little over a minute and then delivered a 700word report using almost 900 references. It's

difficult to judge accuracy or usefulness based on such a strange subjective question, but the output certainly has a ton of detail and is an interesting read. If nothing else, these

interesting read. If nothing else, these continue to be interesting experiments and worth watching for that reason alone.

Lastly today, Chinese models. Lindy

founder Flo Crell recently shared a thread about the difference between Chinese models on benchmarks and Chinese models in the real world. He wrote, "By far our biggest cost at Lindy is inference. So believe me when I say

inference. So believe me when I say we've looked at these models very closely and continue doing so. They're

actually delivering on the claims would make a material difference to the business. But every time we've evaluated

business. But every time we've evaluated them, we found the same thing that their real life performance for agentic behavior and outside of coding use cases falls extremely short of what they show on the Avals. I think the industry

consensus is right. He continues, these Chinese labs are one distilling frontier models, duh, which leads to a more shallow intelligence. Two, training for

shallow intelligence. Two, training for evals. Three, potentially stealing

evals. Three, potentially stealing weights. Not saying these models will

weights. Not saying these models will always be bad or that these labs are completely incompetent. They're doing a

completely incompetent. They're doing a fine job, but it's delusional to think they're actually at sonnet and opus level. they're still at least one

level. they're still at least one generation behind. Take the evals with a

generation behind. Take the evals with a huge grain of salt. That I think is a lesson that is relevant not just for Chinese labs, but also whenever you see a new Western model as well that has high benchmarks. Ultimately, you got to

high benchmarks. Ultimately, you got to just dive in and test these things out for yourself. And with that, we will end

for yourself. And with that, we will end today's headlines. Next up, the main

today's headlines. Next up, the main episode.

Loading...

Loading video analysis...