I Built a Deck With AI, Then Made a Second AI Attack It.

By AI News & Strategy Daily | Nate B Jones

Summary

Topics Covered

Agents at the Heart, Not the Edge
A Prompt Asks for Output, A Workflow Defines Trust
Build the Spec Before the Artifact
Flip the Task to Enumerate, Not Fix
Keep Your Brain Turned On

Full Transcript

So, finally, this is the Excel spreadsheet conversation. Finally, we

spreadsheet conversation. Finally, we get to talk about what all of these agents have made a difference for when it comes to work and office files because everything still lives in Word.

It lives in Excel. It lives in PowerPoint. And when I did all of my

PowerPoint. And when I did all of my prompting guides and everything like that last year, they were hugely popular around Excel and PowerPoint. You can

still find them. They're still useful for building individual assets, but we have moved past individual asset territory with what models can do now.

It just isn't the same thing. I I'm not kidding you. I can now draft eight

kidding you. I can now draft eight simultaneous documents at once. I don't

think that's the ceiling. That's just

what I happened to do this week. And I

did it by focusing on the structure of the data around the document so that what I got was high high quality and very powerful. You want to think beyond

very powerful. You want to think beyond that. You want to think in terms of how

that. You want to think in terms of how you build an entire infrastructure around agents that drives successful artifact creation, whether that's Excel

or PowerPoint or something else. And

that mindset parallels a lot of what we're seeing in the rest of the workforce as we wrestle with what it means to have agents in production in

2026. Because the real, you know,

2026. Because the real, you know, n-dimensional move, the big brain move right now is not to think of it as I bolt this onto my workflow, but to think of it as agents are at the heart of the

new workflow. I therefore need to torque

new workflow. I therefore need to torque myself around, change my process, adjust everything inside so that agents are centralized and agents are first. And so

a lot of what I'm going to talk about is essentially how you rebuild an office document workflow agentically. And

that's the theme. That's what makes this really exciting because if you do that, that's when you get to these massive increases because if you think about it, eight documents, 10 documents, whatever it is, you're talking about an order of

magnitude increase in productivity on just knowledge work. That's huge. And I

don't see that happening. In fact, I know that's not happening because I have talked to folks at Hyperscalers about what I'm doing and they're like, "Oh, wow. That's actually really cool. We

wow. That's actually really cool. We

didn't know that these models could do that." Well, they can. And we're going

that." Well, they can. And we're going to show you how. Claude can build Excel and PowerPoint files. Chat GPT analyzes any spreadsheet you drop into it. So

does Codeex. Microsoft Copilot for Office hit general availability in April. Everybody watching this video has

April. Everybody watching this video has access to AI that can build a PowerPoint deck in minutes. What you don't have is a way to know that the deck is correct, accurate, and complete. A and I know

this because I've lived it. Last

quarter, I opened an Excel file that looked like a financial model.

Assumption inputs at the top, revenue projections, valuation outputs rolling up very cleanly. There was a written guide attached saying the model had been validated. And then I opened the revenue

validated. And then I opened the revenue growth row and the formula was incorrect and it was copied across every future year from the same two cells year after year. Excel didn't tell me that. There

year. Excel didn't tell me that. There

was no ref error. The valuation output still looked good, but it was a financial model in a costume. It wasn't

the real thing. The the layout was right. The labels were right. The cells

right. The labels were right. The cells

were incorrect. And that's obviously the only thing that matters. And the same thing is happening in a lot of docs that I see time after time in real production

environments. And in this video, I want

environments. And in this video, I want to fix that for you. I want to get into how we build workflows for agents with agents at the center that help us to build these heavy knowledge work

documents in ways that ensure reliability and accuracy. The key fix is a mental shift. A prompt asks for an output. A workflow defines the stages

output. A workflow defines the stages the output has to pass through before it can be trusted. And you need to be in a workflow world, not a prompt world. So

for Office files, that workflow has four stages. And we're going to go through

stages. And we're going to go through all four. One, you have to prepare your

all four. One, you have to prepare your sources. You do not ask for the deck or

sources. You do not ask for the deck or the spreadsheet. You ask for an

the spreadsheet. You ask for an inventory of what the model has to work with, and you make sure that it's organized. Two, structure. Before any

organized. Two, structure. Before any

slide or formula gets created, AI needs to produce a file specification. For a

deck, that's a narrative spine and a slide list with claim headlines. For a

workbook, it should have a tab architecture and a calculation flow. You

want to go from producing that spec to then building the artifact. Right now,

you build the artifact constrained by the source packet of information, constrained by the spec that you've built. You're not freestyling from

built. You're not freestyling from whatever the model thinks the answer should be. And then four, verification.

should be. And then four, verification.

The file isn't done when it opens. It's

done when it has survived a reviewer that looks at it really aggressively. I

use another model for this. I actually

use codeex to build office artifacts and then I use claude opus 4.7 to review them aggressively and generate edit loops. And it's like my own Ralph loop,

loops. And it's like my own Ralph loop, right? Where like it finishes and opus

right? Where like it finishes and opus looks at it and says, "Oh, you're not finished. Here's your edit loop to make

finished. Here's your edit loop to make it better." And I just keep running that

it better." And I just keep running that until I get a very high quality set of docs. And I can run it at scale if my

docs. And I can run it at scale if my document source repositories are clear.

I I'm increasingly thinking of knowledge work as if it was code. So the

capability to generate these files is absolutely everywhere and it's far outpacing our ability to scale quality unless we have these kinds of systems like I'm talking about. I wrote about the upstream piece of this on Substack

earlier this month about how to organize your sources before you ever ask AI to write anything. It's critical for this

write anything. It's critical for this kind of work to actually scale and call that the before, right? This video is the after, how you build with it. Uh, so

that link to the previous Substack is in the description. If you're interested,

the description. If you're interested, feel free to dive in. I think it's a nice pair with this video. Let's get

back to it. The failures I worry about aren't dramatic. They're ordinary enough

aren't dramatic. They're ordinary enough to survive a quick review. Right? A

board deck pulled from a folder containing a Q3 actuals export and a Q4 plan file. The slide headline says,

plan file. The slide headline says, "Revenue is ahead of plan." and the chart looks really clean and then you look at the underlying numbers and they're blending actuals and plan data

because nobody labeled the difference and that error travels every time the deck gets shared. It traveled out of the spreadsheet or wherever it came from originally and it's a problem. So you

are now in a world where you can have a great PowerPoint with sharp headlines and executive language but there's no way to show that it has a foundation. It

may look finished but it has no foundation. And I want to talk to you

foundation. And I want to talk to you about how you build documents reliably in a pipeline that are wellounded because that's how you actually start to build momentum with this knowledge work.

Otherwise, all of this AI tooling becomes a massive trust breaker. And I

want to be clear, I'm not cherrypicking edge cases here. Models are

goaloriented. If you tell them to build something on a bad foundation, they'll try to do it. And that's the normal result of asking AI to jump straight from messy sources to a finished file.

the model is optimizing for the artifact you asked for. And if you ask for a deck and don't give it choices, it'll try to make a deck. Same thing with a spreadsheet. Unless you explicitly

spreadsheet. Unless you explicitly define the evidence structure, the tool treats source discipline as optional.

And that's backwards. Source discipline

is a big part of the work. And I want to call out this is not because the models are getting worse. In fact, the models have had a lot of work put into them to care about sources. And if you give them

the chance to, they will. Part of why we have to have this conversation now is that the models are trained to check claims and look at sources. And if you don't take the same care as the models

do, the models will try to take that care, try to look at sources. And that

very goalorientedness to build completely will betray you because you have no clean sources for it to rely on.

It will just have to work its way through and guess, but it's been trained to try and find it. So, it will try and do that and it will guess and then you'll be in trouble. And so we have to

be aware of where the hyperscalers are taking the models. And the hyperscalers are basically saying models need to do better work by checking sources. They're

right, but we have to do the work of making sure the sources and the workflow pipelines are there so the models can build these office documents appropriately. And that's what serious

appropriately. And that's what serious knowledge work looks like in 2026. The

goal is to make every important claim and calculation something that can be checked and that invites that scrutiny.

And so you have two stages here, right?

You have source prep. That's stage one.

Before you ask AI for the deck or the workbook, you need to ask AI to look at what it can see. What's in the folder?

Find out what your work packet has. Does

does everything in the work packet have an owner and a date and a file type? And

can you create an index of evidence that has all of that? Does it have a status?

Has the AI said if it's current data or superseded? If it's an estimate, if it's

superseded? If it's an estimate, if it's a transcript, if it's raw data, have you removed sensitive material that would need to be removed before any public-f facing artifact gets generated? That

does happen, too. Uh, have you checked your facts and your estimates based on research on the net? This one move changes the process. A messy folder can become a controlled work packet. The AI

can't blend a transcript and a deck and a spreadsheet and a half-remembered assumption into a confident answer if you've organized everything and if there's an index of what's in there.

Stage two is structure. Before any file gets created, you need to produce a file specification. At least if you're doing

specification. At least if you're doing serious work. I'm not saying every

serious work. I'm not saying every single time you have a conversation with Jet GPT. Don't don't misunderstand me.

Jet GPT. Don't don't misunderstand me.

I'm saying when you're doing serious knowledge work that has implications. So

for PowerPoint, that's the narrative in English in a way that you understand.

And do insist on plain English. One of

the issues with the current models is they like cute language. They like

shortorthhand. Insist on plain English.

Who is the audience? What decision do they need to make? What do they need to believe before they can make that decision? And then look at the list of

decision? And then look at the list of slides. What are the claims? What are

slides. What are the claims? What are

the supporting source IDs? Where are

charts needed? What are the assumptions?

What are the open questions? And for

Excel, you need to look at your tab architecture. Where does raw data live

architecture. Where does raw data live in the tab? Where do the assumptions live? Where are calculations performed?

live? Where are calculations performed?

Where are checks recorded? Where does

the user see the summary and how is it driven? So the file spec is like a

driven? So the file spec is like a blueprint for a serious doc. If the

blueprint doesn't explain where the truth lives, the finished file won't do that either. So, if you're wondering how

that either. So, if you're wondering how to get started, there is a full source packet template on the Substack, including an ID schema, a status taxonomy, a conflict log format. If you

want to copy it for your team for the next big project you do, I've linked it in the description. Let's move on here, though. I want to talk about file

though. I want to talk about file creation. This is the part we all want,

creation. This is the part we all want, right? How can we create this stuff? So

right? How can we create this stuff? So

only now does AI actually build the artifact. After you prep for PowerPoint,

artifact. After you prep for PowerPoint, I would do it in two passes. First pass

is the storyboard. You want slide titles and claims and evidence and notes. Don't

design render yet. No charts laid out, just the argument and the evidence trail. Second pass, render the deck.

trail. Second pass, render the deck.

This split keeps visual polish from hiding a weak argument. And you catch unsupported claims before they become beautiful and difficult to edit. Right

now I am doing all of the argumentation in codec and I'm moving to claude opus 4.7 for the render of the deck because that front-end polish is just beautiful and claude for Excel do it in three

layers. Layer one load the raw data

layers. Layer one load the raw data exactly. Layer two build the assumptions

exactly. Layer two build the assumptions in the calculation logic and layer three produce the output views. A workbook

should be able to answer a very simple question. If I change an assumption,

question. If I change an assumption, does the relevant output change for the right reason? And that question matters

right reason? And that question matters a lot more than whether the workbook itself looks good. A spreadsheet that cannot recalculate isn't a model. A

model whose formulas can't be inspected isn't ready for decisioning. Right now,

what I am doing is I am using codeex to build Excel models and then I am having Claude take a pass at making them pretty afterward. But I find that codeex is

afterward. But I find that codeex is really really good at completeness in Excel files and that's really important when you're trying to build serious models. One more piece I want to call

models. One more piece I want to call out here. You need to understand your

out here. You need to understand your task risk gradient. Where is AI highest risk or lowest risk? AI is lowest risk for formatting, layout exploration, chart drafts, summary wording and

consistency checks. It's medium risk for

consistency checks. It's medium risk for source attribution and data extraction.

And it's highest risk for numerical synthesis, for financial calculations, for any kind of regulatory language or compliance language, and for claims that will travel up to senior leadership for

a decision. Make sure you check those.

a decision. Make sure you check those.

So yes, let the model help her everywhere. No matter what the task risk

everywhere. No matter what the task risk gradient is, it's faster with the model.

Now, don't give every task the same review burden depending on the risk. If

you want to understand how to dig into that further, there's more on that in the substack. I broke down the full file

the substack. I broke down the full file specification format. There's a

specification format. There's a PowerPoint narrative spine template.

There's an Excel tab architecture.

There's assumption log fields. And there

are sort of checks, tab, smoke alarms you can set in Excel that I put together. So, if you want the full

together. So, if you want the full version both for PowerPoint and Excel, the link is in the description. We're

going to move on though to one of the things I think is most important, and that's verification with a hostile reviewer prompt. So verification asks

reviewer prompt. So verification asks whether the artifact can be trusted.

Sources dates formulas assumptions charts, and unsupported claims. Every one of these gets inspected. It's a

different job from proofreading, and most teams tend to skip it because the file can look done much sooner than it's actually done. The useful pattern is to

actually done. The useful pattern is to make the model itself act as a hostile reviewer. You you can use the same

reviewer. You you can use the same model. I use different models for this.

model. I use different models for this.

I like to play them off against each other. So, I use Opus 4.7 to review what

other. So, I use Opus 4.7 to review what 5.5 does. I think that's super fun. Uh,

5.5 does. I think that's super fun. Uh,

and I'm going to give you the prompt right here. So, you know, pay attention.

right here. So, you know, pay attention.

So, the prompt is this. Read this Decker workbook as a skeptical reviewer who suspects every claim and every number.

For each slider sheet, identify claims without source attribution, numbers without a data source, charts whose underlying data isn't traceable, formulas inconsistent across parallel

rows or columns, and assumptions presented as facts. Produce a written list of every issue found. Don't fix

anything, just enumerate. That last

instruction, don't fix, just enumerate, is what makes it work. The model is just trying to find the problems. It's not trying to solve them. different task,

different output, different value. A

model can catch a lot of its own mistakes when you flip the task from generation to enumeration. And the human gate can stay on the consequential claims, the numbers that travel, the calls that are going to become an

important decision. And and one of the

important decision. And and one of the things I want to call out here is my personal workflow is very much a Ralph loop for this. I will take a doc that is

created by codeex and I will give it to opus 4.7 and I will ask opus 4.7 to do that hostile review and generate an extremely detailed edit list. I will

pipe that back to codeex and I will ask codeex to fix everything in there with a new version in the same folder. Then I

will go back to the same thread in open 4.7 and say check the work. Did they do the job? And then I will do it again and

the job? And then I will do it again and I will do it again. And toward the end, I will actually introduce a language check. Opus is actually very very good

check. Opus is actually very very good at checking for LLM isms like you're absolutely right in the doc, right?

Something like that. And you can get it into plain English that actually works better for humans to read if you introduce that polish step later. And

the overall goal is that you have an autonomous loop between codeex and opus that gets you to Alevel work without having to invest a ton of time along the

way so that your time is best focused on reading A-level material and saying, "Okay, this is where I agree. This is

where I disagree. This is where I would edit. Here's some final polish." And

edit. Here's some final polish." And

that's what it looks like to do heavy knowledge work in 2026. If we step back, AI has made it much easier to create Office files, right? And that matters a lot. PowerPoint and Excel are still

lot. PowerPoint and Excel are still where an enormous amount of business judgment becomes visible inside companies. And traditionally, they have

companies. And traditionally, they have taken real hours out of everyone's week in millions and millions of cases. If AI

helps people build those artifacts faster, the productivity upside is real.

It's measured in weeks a year for all of us. The upgrade is that you can now

us. The upgrade is that you can now build a repeatable production system around the file. Source prep and structure and constrained creation and verification. The file is just an output

verification. The file is just an output of that knowledge works system. It's not

the whole thing. Drag in the sources, ask for a deck, hope the output is right, that's the version of the workflow that loses you a meeting.

Instead, if you prepare your sources and define your structure and create the file carefully and verify it, you're going to be trusted more and more. So,

the next deck you build with AI should not start with a deck. It should start with the moves that I described. Right?

Write out a narrative spine before you open the AI tool. Drop in your source materials and ask for a source inventory. Ask for a conflict log before

inventory. Ask for a conflict log before you try and make any slides. Generate

the storyboard with claims and notes before any visual rendering. And then

run a hostile reviewer prompt before you decide to share it out. The model can help everywhere, but you have to keep owning the truth. This is the your polish stopped meaning trust. The

companies that build a truth layer around their AI office files are going to ship a lot faster and be wrong much less. The ones that don't are going to

less. The ones that don't are going to put out a really nice looking deck next quarter with a number that nobody can defend. And by the way, if you think

defend. And by the way, if you think that's not true, I want you to remember that the team at OpenAI did that with a chart that chat GPT presumably helped

make in the chat GPT5 launch. And the

chart was famously wrong and everyone had a good chuckle and that happens to the best of us. We can do better than that. I want to answer one question to

that. I want to answer one question to close us off and I think I I I think a number of you will ask this and it's simple. Nate, why is it this hard? I

simple. Nate, why is it this hard? I

have to build this entire effectively knowledge work harness on my own. Why do

I have to do that? Why hasn't someone made this for me so it's easier and it's just push a button simple? I have a really clear answer for you. Knowledge

work is profoundly contingent on domain knowledge. That's why it's knowledge

knowledge. That's why it's knowledge work. If you are going to do knowledge

work. If you are going to do knowledge work that is specific and memorable and useful and deep, you have to be deep enough in theformational

context that you can custom assemble the pieces of information you need to make a useful artifact. It's not something you

useful artifact. It's not something you can generically turn into a workflow.

It's not that simple. It's not like you can assume there are five slots for evidence for a PowerPoint deck and good luck if you have more than five. No one

who does serious knowledge work thinks like that. And so I think part of the

like that. And so I think part of the reason we have to build our own, it's sort of like Luke Skywalker making his own lightsaber. You have to understand

own lightsaber. You have to understand how the system works to become a master at that system. And I am sure that there are startups out there that are working

on making parts of this simpler, parts of gathering the information simpler, parts of communicating that review step simpler, parts of checking the work and checking sort of the calculations on

Excel simpler. All of those pieces are

Excel simpler. All of those pieces are problems that we can get better at solving. But I don't want to lose sight

solving. But I don't want to lose sight of the fact that good deep knowledge work is tremendously detailed. reality

has a surprising amount of detail. Good

deep knowledge work is extremely detailed and it's difficult to generalize an abstraction like an abstracted knowledge work harness over that level of detail. So that's the honest answer for you. That's why we

can't just get a push button answer from Microsoft that makes Excel superpowered for this. There you go. I if wishes were

for this. There you go. I if wishes were fishes, right? I wish it was that way.

fishes, right? I wish it was that way.

But instead, we get to work at this. And

by the way, for everyone saying we'll lose our brains when we use AI, this is a great example of why you got to keep your brain turned on when you use AI.

Loading...

Loading video analysis...