I Tried Gemini CLI Security Extension for my Vibe-Coded App — Here’s What Happened

By Software Testing Trends

Summary

Topics Covered

You're Trusting Everyone Else's Code
AI Apps Need AI-Specific Security Tools
High Precision Means You Can Actually Trust the Output
Skip Boilerplate, Focus on Your Code
No Tool Catches Zero-Day Supply Chain Attacks

Full Transcript

Last week, something quite alarming happened in the JavaScript ecosystem.

Axios, the HTTP client that something like a 100 million developers download every week, was compromised in a supply chain attack. Someone got hold of a

chain attack. Someone got hold of a maintainer's credentials and pushed two poison versions to npm. Axio's version

1.14.1 and version 030.4.

Those versions injected a hidden dependency called plain crypto.js. And

when that package installed, it silently dropped a remote access Trojan onto the user's machine, including Mac OS, Windows, and Linux. The really

disturbing part, the malware cleaned up after itself. After installation,

after itself. After installation, inspecting the package directory showed nothing suspicious. No audit tool would

nothing suspicious. No audit tool would have flagged it. So if you pulled one of those versions during that window, you might not even know. If you did, treat

your machine as compromised. Rotate all

your secrets, cloud keys, deploy tokens, npm credentials, everything. This is the reality of modern development. Your

dependencies have dependencies. And

somewhere in that chain, a compromised credential or a bad actor is all it takes. You're not just trusting your own

takes. You're not just trusting your own code, you're trusting everyone else's.

So the question is what can you actually do about it? You can audit every transitive dependency by hand, but you can get a lot smarter about scanning what changes in your own code and what

lands in your lock file before it ships.

That's what we're looking at today. The

Gemini CLI security extension, a free open- source tool that adds AI powered security analysis directly into your poll request workflow. It was developed by Google to find security

vulnerabilities and risks in code changes. The tool works directly from

changes. The tool works directly from the terminal or within CI/CD pipelines.

You can install the extension using the Gemini CLI extension manager. Before you

do, make sure you're running Gemini CLI version 0.4.0 or newer. That's the

minimum version you'll need. Head over

to the extensions GitHub page. Copy the

install command and run it in your terminal. After installation, a set of

terminal. After installation, a set of new security commands will be available for you to use. Scandups checks your dependencies against Google's open-source vulnerability database.

Analyze command scans your branch's diff and produces an AI generated security report. Analyze full command analyzes

report. Analyze full command analyzes the entire repository for common security vulnerabilities. Analyze GitHub

security vulnerabilities. Analyze GitHub PR command works with the rungemini CLI GitHub action and analyzes code changes on your PR.

Let's see it in action. I'll start by running a security scan on my vibecoded test case generator app.

After running the dependency scan command, it prompts me for permission to use the scan vulnerable dependencies tool. Once I approve, it gets to work

tool. Once I approve, it gets to work and starts analyzing the dependencies.

It takes a couple of minutes to complete the process. Once the analysis is

the process. Once the analysis is complete, it wraps up with a clean summary report of everything it found. I

then asked it to save the findings to a markdown file for easy reference. And

here's what it generated with all the details laid out. It's a solid summary overall, but it doesn't cover all 36 vulnerabilities and the specific versions that would fix them. Since it

uses OSV scanner under the hood, let's see what we get by running OSV scanner directly. I already have OSV scanner

directly. I already have OSV scanner installed. You can install it with this

installed. You can install it with this simple command. Brew install OSV

simple command. Brew install OSV scanner. Now run the OSV scanner command

scanner. Now run the OSV scanner command to scan the package lock JSON file.

Here's the result and it was impressively fast. The table lists all

impressively fast. The table lists all affected versions and also the exact version that resolves each issue. Some

packages like XLSX don't have a fix yet, but it still flags them for your awareness. If you're comfortable reading

awareness. If you're comfortable reading raw output, I'd recommend just running OSV scanner directly.

Next, I'll use the analyze command to scan some newly added code. I've put

together a sample file packed with hard-coded secrets, API keys, private keys, passwords, and connection strings, all embedded directly in the source. To

put the extension to the test, let's run the security analyze command. This

command scans code changes on my current branch for common security vulnerabilities. The security extension

vulnerabilities. The security extension starts by setting up the security analysis environment and defining the audit scope to create the necessary folder and files. It asks for my

permission. Once I grant the permission,

permission. Once I grant the permission, it creates a Gemini_curity folder and within the folder, it creates two files.

a draft security report and a security analysis to-do file. In the to-do file, it defines the audit scope. It then runs a reconnaissance scan across the

identified files and once complete, compiles a summary report of all findings, correctly, flagging every hard-coded secret in the demo file. It

then prompts me to clean up the temporary files. I choose to keep the

temporary files. I choose to keep the draft security report, so I ask it not to delete the draft security report.

Next, it offers three options for addressing the identified vulnerabilities. I go ahead and choose

vulnerabilities. I go ahead and choose to remediate all of them as recommended.

It then asks how I'd like to handle the selected vulnerabilities. I choose to

selected vulnerabilities. I choose to patch them directly, meaning it will modify the code to fix each issue in place. It went through the code and fix

place. It went through the code and fix all the hard-coded values and move them out in av example file. It also retained the draft security report just as I had

requested rather than deleting it as part of its default cleanup workflow.

In addition to secrets, this extension also scans for insecure data handling and injection vulnerabilities like cross-sight scripting, SQL injection, and other types of injections. It also

looks for security issues specific to LLMs, like unsafe prompts, passing LLM output into eval, rendering it as raw HTML, or using overly permissive tool

settings. If you're building AI powered

settings. If you're building AI powered apps, this category is especially useful.

You can also guide the analyze command with plain natural language, something like analyze all the route files. And if

you need the output in JSON, just ask for it naturally. Or you can use the dash d-json flag directly with a command.

And here's the JSON report it generated after scanning all the route files.

Let's take a look at the types of issues it reported.

The first issue flagged is a prompt injection vulnerability. For each

injection vulnerability. For each finding, it includes the exact line of code, a description of the issue, and a recommended fix. For this vulnerability,

recommended fix. For this vulnerability, it clearly mentioned that untrusted user input from requirements and context is directly concatenated into the prompt

sent to the LLM via the build user prompt function. An attacker can provide

prompt function. An attacker can provide malicious instructions to override the system prompt, potentially leading to unauthorized data extraction or manipulation of the LLM's behavior. It

recommends to sanitize and validate all user supplied input before including it in prompts. Use structured prompt

in prompts. Use structured prompt templates and implement guardrails or LLM based scanners to detect and block potential prompt injection attacks. The

next issue is an insecure direct object reference. It flags that the app

reference. It flags that the app performs get, put, and delete operations on test cases using a user supplied ID without verifying whether the authenticated user actually has

permission to access or modify that resource, leaving the door open for attackers to iterate through IDs and tamper with other users data. The next

reported vulnerability is denial of service attack where it mentions that the API endpoints lack limits on the size of the request body or the number of items that can be processed. An

attacker can submit an excessively large array of test cases or a long list of IDs for export consuming significant CPU and memory potentially leading to a

service outage.

The last vulnerability is a potential PII leak in logs. This is an important one that developers often overlook. It

flags that error messages containing unsanitized user input. Things like

requirements or contexts that may include PI are logged directly to the console via console. If those logs are improperly stored or accessed, that data

is exposed. For the fix, it recommends

is exposed. For the fix, it recommends sanitizing error messages before logging, avoiding logging entire error objects or raw user input and using a

structured logging library that supports masking sensitive fields. These are all real vulnerabilities I had no idea existed in my vibe coded app. And just

like before, it offered to fix these issues.

On the Open SSF CV benchmark, a real world data set of TypeScript and JavaScript vulnerabilities. The

JavaScript vulnerabilities. The extension hits 90% precision and 93% recall. That means nine out of 10

recall. That means nine out of 10 flagged issues are real. For context,

most commercial SCSD tools run in the 70 to 85% precision range before tuning.

These are competitive numbers and they matter for a specific reason. Low

precision means your team spends more time chasing false positives than fixing real issues. High precision means you

real issues. High precision means you can actually trust the output.

I also tried the analyze all command on the same repo and it took way too long.

It ended up scanning everything including shad CN components and CSS files. So here are three things I

files. So here are three things I learned that'll save you time. First,

skip the boilerplate. If you're using Shad CN UI, Material UI, or similar libraries, don't skin those files.

They're generated, widely audited, and rarely the source of real vulnerabilities. Scanning them just

vulnerabilities. Scanning them just dilutes your findings. Focus on API routes, custom library files, O and session handling, and anything

constructing prompts. Quick pass on

constructing prompts. Quick pass on config files and skip UI primitives, CSS, and build configs entirely. For

dependency scanning, if you're comfortable with raw output, just run OSV scanner directly. The scandups

command calls it internally and passes the results through Gemini for a narrative summary. That's great for

narrative summary. That's great for sharing with stakeholders, but unnecessary overhead if you're already familiar with CV IDs. Third, ask for your report file at the start of the

session, not the end. The extension

creates working files during analysis and cleans them up when the session ends. So add this text to your prompt up

ends. So add this text to your prompt up front. When complete, save the final

front. When complete, save the final report to security_report markdown file and do not delete any generated files.

If you're already using Runge Gemini CLI GitHub actions workflows, the integration is minimal effort. You

replace your existing workflow file with the one from the extension repo and it starts running on every pull request and posting its findings as a PR comment automatically. If you want to go deeper,

automatically. If you want to go deeper, especially around CI/CD integration and setting up O between GitHub and Google Cloud, the official Google code lab is a

great place to start.

Coming back to the Axios incident, the extensions dependency scanner wouldn't have caught that specific attack because the malicious versions weren't in any vulnerability database when they

dropped. No tool would have caught it.

dropped. No tool would have caught it.

That's the nature of a zeroday supply chain attack. But the broader pattern, a

chain attack. But the broader pattern, a dependency introducing unexpected code changes, a post install script running something it shouldn't. That's exactly

the kind of thing OSV scanner and careful diff are designed to surface over time. All right, that's it. Now

over time. All right, that's it. Now

it's your turn to give this security extension a try. If you enjoyed this video, please like and subscribe for more real world AI tips for software testers. Happy testing.

testers. Happy testing.

Loading...

Loading video analysis...