The New Code — Sean Grove, OpenAI

By AI Engineer

Summary

## Key takeaways - **Code is only 10-20% of value; communication is 80-90%.**: The most valuable skill in software development isn't writing code, but structured communication. This includes understanding user challenges, ideating solutions, planning, sharing, and verifying outcomes, which constitute 80-90% of an engineer's contribution. [02:45], [03:59] - **Specifications align humans and models, code doesn't.**: Unlike code, which is a lossy projection of intent, written specifications serve as the definitive artifact for aligning human teams and communicating goals. They are the source of truth that can be used to generate code, documentation, and other outputs. [06:09], [07:01] - **OpenAI's Model Spec: Markdown for human and AI alignment.**: OpenAI's Model Spec, written in human-readable Markdown, functions as a living document to align intentions and values. It includes specific clauses with IDs linked to challenging prompts, serving as executable success criteria for AI models. [09:42], [11:01] - **Sophistication in AI erodes trust; specs prevent it.**: AI models exhibiting 'sophistry' can erode user trust by prioritizing praise over impartial truth. A clear specification, like OpenAI's clause against being 'sophistic,' acts as a trust anchor, defining expected behavior and flagging deviations as bugs. [11:55], [13:36] - **Specs are the new superpower: executable, testable, composable.**: Specifications are becoming the fundamental unit of programming, analogous to code. They are executable, testable, composable, and can be used to generate various outputs, making spec-writing the critical skill for future programmers. [15:34], [19:07] - **The US Constitution: A national model specification.**: The US Constitution serves as a national model specification with clear policy, versioned amendments, and judicial review acting as a grader. This structure aligns citizens and adjudicates compliance, demonstrating how specifications can evolve safely over time. [16:48], [17:55]

Topics Covered

Code isn't the value; communication is the bottleneck.
Why source specifications are more valuable than generated code.
How specifications align both humans and AI models.
All professionals are programmers, using universal specifications.
Future IDEs will clarify thought, not just code.

Full Transcript

[Music]

[Music]

Hello everyone. Thank you very much for

having me. Uh it's a very exciting uh

place to be. very exciting time to be uh

second uh I mean this has been like a

pretty intense couple of days I don't

know if you feel the same way uh but

also very energizing so I want to take a

little bit of your time today uh to talk

about what I see is the coming of the

new code uh in particular specifications

which sort of hold this promise uh that

it has been the dream of the industry

where you can write your your code your

intentions once and run them everywhere

uh Quick intro. My name is Sean. I work

at uh OpenAI uh specifically in

alignment research. And today I want to

talk about sort of the value of code

versus communication and why

specifications might be a little bit of

a better approach in general.

Uh I'm going to go over the anatomy of a

specification and we'll use the uh model

spec as the example. uh and we'll talk

about communicating intent to other

humans and we'll go over the 40 syphency

issue uh as a case study.

Uh we'll talk about how to make the

specification executable, how to

communicate intent to the models uh and

how to think about specifications as

code even if they're a little bit

different. Um and we'll end on a couple

of open questions. So let's talk about

code versus communication

real quick. Raise your hand if you write

code and vibe code counts.

Cool. Keep them up if your job is to

write code.

Okay. Now, for those people, keep their

head up if you feel that the most

valuable professional artifact that you

produce is code.

Okay. There's quite a few people and I

think this is quite natural. We all work

very very hard to solve problems. We

talk with people. We gather

requirements. We think through

implementation details. We integrate

with lots of different sources. And the

ultimate thing that we produce is code.

Code is the artifact that we can point

to, we can measure, we can debate, and

we can discuss. Uh it feels tangible and

real, but it's sort of underelling the

job that each of you does. Code is sort

of 10 to 20% of the value that you

bring. The other 80 to 90% is in

structured communication. And this is

going to be different for everyone, but

a process typically looks something like

you talk to users in order to understand

their challenges. You distill these

stories down and then ideulate about how

to solve these problems. What what is

the goal that you want to achieve? You

plan ways to achieve those goals. You

share those plans with your colleagues.

uh you translate those plans into code.

So this is a very important step

obviously and then you test and verify

not the code itself right no one cares

actually about the code itself. What you

care is when the code ran did it achieve

the goals did it alleviate the

challenges of your user. You look at the

the effects that your code had on the

world. So talking, understanding

distilling ideulating

planning sharing translating testing

verifying, these all sound like

structured communication to me. And

structured communication is the

bottleneck.

knowing what to build, talking to people

and gathering requirements, knowing how

to build it, knowing why to build it

and at the end of the day, knowing if it

has been built correctly and has

actually achieved the intentions that

you set out with.

And the more advanced AI models get, the

more we are all going to starkly feel

this bottleneck.

Because in the near future, the person

who communicates most effectively is the

most valuable programmer. And literally

if you can communicate effectively, you

can program.

So, let's take uh vibe coding as an

illustrative example. Vibe coding tends

to feel quite good. And it's worth

asking why is that? Well, vibe coding is

fundamentally about communication first.

And the code is actually a secondary

downstream artifact of that

communication.

We get to describe our intentions and

our the outcomes that we want to see and

we let the model actually handle the

grunt work for us. And even so, there is

something strange about the way that we

do vibe coding. We communicate via

prompts to the model

and we tell them our intentions and our

values and we get a code artifact out at

the end and then we sort of throw our

prompts away they're ephemeral

and if you've written TypeScript or Rust

once you put your your code through a

compiler or it gets down into a binary

no one is happy with that binary. That

wasn't the purpose. It's useful. In

fact, we always regenerate the binaries

from scratch every time we compile or we

run our code through V8 or whatever it

might be from the source spec. It's the

source specification that's the valuable

artifact.

And yet when we prompt elements, we sort

of do the opposite. We keep the

generated code and we delete the prompt.

And this feels like a little bit like

you shred the source and then you very

carefully version control the binary.

And that's why it's so important to

actually capture the intent and the

values in a specification.

A written specification is what enables

you to align humans on the shared set of

goals and to know if you are aligned if

you actually synchronize on what needs

to be done. This is the artifact that

you discuss that you debate that you

refer to and that you synchronize on.

And this is really important. And so I

want to nail this this home that a

written specification effectively aligns

humans

and it is the artifact that you use to

communicate and to discuss and debate

and refer to and synchronize on. If you

don't have a specification, you just

have a vague idea.

Now let's talk about why specifications

are more powerful in general than code.

Because code itself is actually a lossy

projection from the specification.

In the same way that if you were to take

a compiled C binary and decompile it

you wouldn't get nice comments and uh

well-n named variables. You would have

to work backwards. You'd have to infer

what was this person trying to do? Why

is this code written this way? It isn't

actually contained in there. It was a

lossy translation. And in the same way

code itself, even nice code, typically

doesn't embody all of the intentions and

the values in itself. You have to infer

what is the ultimate goal that this team

is trying to achieve. Uh when you read

through code

so communication, the work that we

establish, we already do when embodied

inside of a written specification is

better than code. It actually encodes

all of the the necessary requirements in

order to generate the code. And in the

same way that having a source code that

you pass to a compiler allows you to

target multiple different uh

architectures, you can compile for ARM

64, x86 or web assembly. The source

document actually contains enough

information to describe how to translate

it to your target architecture.

In the same way, a a sufficiently robust

specification given to models will

produce good TypeScript, good Rust

servers clients documentation

tutorials, blog posts, and even

podcasts.

Uh, show of hands, who works at a

company that has developers as

customers?

Okay. So, a a quick like thought

exercise is if you were to take your

entire codebase, all of the the

documentation, oh, so all of the code

that runs your business, and you were to

put that into a podcast generator, could

you generate something that would be

sufficiently interesting and compelling

that would tell the users how to

succeed, how to achieve their goals, or

is all of that information somewhere

else? It's not actually in your code.

And so moving forward, the new scarce

skill is writing specifications that

fully capture the intent and values. And

whoever masters that again becomes the

most valuable programmer

and there's a reasonable chance that

this is going to be the coders of today.

This is already very similar to what we

do. However, product managers also write

specifications. Lawmakers write legal

specifications.

This is actually a universal principle.

So with that in mind, let's look at what

a specification actually looks like. And

I'm going to use the OpenAI model spec

as an example here. So last year, OpenAI

released the model spec. And this is a

living document that tries to clearly

and unambiguously

express the intentions and values that

OpenAI hopes to imbue its models with

that it ships to the world.

and it was updated in in uh February and

open sourced. So you can actually go to

GitHub and you can see the

implementation of uh the model spec and

surprise surprise it's actually just a

collection of markdown files just looks

like this. Now markdown is remarkable.

It is human readable. It's versioned.

It's change logged and because it is

natural language everyone in not just

technical people can contribute

including product legal safety research

policy they can all read discuss debate

and contribute to the same source code.

This is the universal artifact that

aligns all of the humans as to our

intentions and values inside of the

company.

Now, as much as we might try to use

unambiguous language, there are times

where it's very difficult to express the

nuance. So, every clause in the model

spec has an ID here. So, you can see

sy73 here. And using that ID, you can

find another file in the repository

sy73.mmarkdown

or md uh that contains one or more

challenging prompts

for this exact clause. So the document

itself actually encodes success criteria

that the the model under test has to be

able to answer this in a way that

actually adheres to that clause.

So let's talk about uh syphy. Uh

recently there was a update to 40. I

don't, know if, you've, heard, of, this., Uh

there uh caused extreme syphy. uh and we

can ask like what value is the model

spec in this scenario and the model spec

serves to align humans around a set of

values and intentions.

Here's an example of syphy where the

user calls out the behavior of being uh

syphants uh or sophantic at the expense

of impartial truth and the model very

kindly uh praises the user for their

insight.

There have been other esteemed

researchers uh who have found similarly

uh

similarly uh concerning examples

and this hurts. Uh shipping sycopency in

this manner erodess trust.

It hurts.

So and it also raises a lot of questions

like was this intentional? you could see

some way where you might interpret it

that way. Was it accidental and why

wasn't it caught?

Luckily, the model spec actually

includes a section dedicated to this

since its release that says don't be

sick of fantic and it explains that

while syopency might feel good in the

short term, it's bad for everyone in the

long term. So, we actually expressed our

intentions and our values and were able

to communicate it to others through this

So people could reference it and if we

have it in the model spec specification

if the model specification is our agreed

upon set of intentions and values and

the behavior doesn't align with that

then this must be a bug.

So we rolled back we published some

studies and some blog post and we fixed

it.

But in the interim, the specs served as

a trust anchor, a way to communicate to

people what is expected and what is not

expected.

So if just if the only thing the model

specification did was to align humans

along those shared sets of intentions

and values, it would already be

incredibly useful.

But ideally we can also align our models

and the artifacts that our models

produce against that same specification.

So there's a technique a paper that we

released uh called deliberative

alignment that sort of talks about this

how to automatically align a model and

the technique is uh such where you take

your specification and a set of very

challenging uh input prompts and you

sample from the model under test or

training.

You then uh take its response, the

original prompt and the policy and you

give that to a greater model and you ask

it to score the response according to

the specification. How aligned is it? So

the document actually becomes both

training material and eval material

and based off of this score we reinforce

those weights and it goes from you know

you could include your specification in

the context and then maybe a system

message or developer message in every

single time you sample and that is

actually quite useful. a prompted uh

model is going to be somewhat aligned

but it does detract from the compute

available to solve the uh problem that

you're trying to solve with the model.

And keep in mind, these specifications

can be anything. They could be code

style or testing requirements or or

safety requirements. All of that can be

embedded into the model. So through this

technique you're actually moving it from

a inference time compute and actually

you're pushing down into the weights of

the model so that the model actually

feels your policy and is able to sort of

muscle memory uh style apply it to the

problem at hand.

And even though we saw that the model

spec is just markdown it's quite useful

to think of it as code. It's quite

analogous.

uh these specifications they compose

they're executable as we've seen uh they

are testable they have interfaces where

they they touch the real world uh they

can be shipped as modules

and whenever you're working on a model

spec there are a lot of similar sort of

uh problem domains so just like in

programming, where, you, have a, type

checker the type checker is meant to

ensure consistency where if interface A

has a dependent uh module B they have to

be consistent in their understanding of

one another. So if department A writes a

spec and department B writes a spec and

there is a conflict in there you want to

be able to pull that forward and maybe

block the publication of the the

specification as we saw the policy can

actually embody its own unit tests and

you can imagine sort of various llinters

where if you're using overly ambiguous

language you're going to confuse humans

and you're going to confuse the model

and the artifacts that you get from that

are going to be less satisfactory.

So specs actually give us a very similar

tool chain but it's targeted at

intentions rather than syntax.

So let's talk about lawmakers as

programmers. Uh

the US constitution is literally a

national model specification. It has

written text which is aspirationally at

least clear and unambiguous policy that

we can all refer to. And it doesn't mean

that we agree with it but we can refer

to it as the current status quo as the

reality. Uh there is a versioned way to

make amendments to bump and to uh

publish updates to it. There is judicial

review where a a grader is effectively

uh grading a situation and seeing how

well it aligns with the policy. And even

though the again because or even though

the source policy is meant to be

unambiguous sometimes you don't the

world is messy and maybe you miss part

of the distribution and a case falls

through and in that case the there is a

lot of compute spent in judicial review

where you're trying to understand how

the law actually applies here and once

that's decided it sets a precedent and

that precedent is effectively an input

output pair that serves as a unit test

that disambiguates and rein reinforces

the original policy spec. Uh it has

things like uh chain of command embedded

in it and the enforcement of this over

time is a training loop that helps align

all of us towards a shared set of

intentions and values. So this is one

artifact that communicates intent. It

adjudicates compliance and it has a way

of uh evolving safely.

So it's quite possible that lawmakers

will be programmers or inversely that

programmers will be lawmakers in the

future.

And actually this apply this is a very

universal concept. Programmers are in

the business of aligning silicon via

code specifications. Product managers

align teams via product specifications.

Lawmakers literally align humans via

legal specifications. And everyone in

this room whenever you are doing a

prompt it's a sort of proto

specification. You are in the business

of aligning AI models towards a common

set set of intentions and values. And

whether you realize it or not you are

spec authors in this world and specs let

you ship faster and safer. Everyone can

contribute and whoever writes the spec

be it a

uh a PM uh a lawmaker an engineer a

marketer is now the programmer

and software engineering has never been

about code. Going back to our original

question a lot of you put your hands

down when you thought well actually the

thing I produced is not code.

Engineering has never been about this.

Coding is an incredible skill and a

wonderful asset, but it is not the end

goal. Engineering is the precise

exploration by humans of software

solutions to human problems. It's always

been this way. We're just moving away

from sort of the disperate machine

encodings to a unified human encoding uh

of how we actually uh solve these these

problems. Uh I want to thank Josh for

this uh credit. So I want to ask you

put this in action. Whenever you're

working on your next AI feature, start

with the specification.

What do you actually expect to happen?

What's success criteria look like?

Debate whether or not it's actually

clearly written down and communicated.

Make the spec executable. Feed the spec

to the model

and test against the model or test

against the spec. And there's an

interesting question sort of in this

world given that there's so many uh

parallels between programming and spec

authorship.

I wonder what is the what does the IDE

look like in the future. you know, an

integrated development environment. And

I'd like to think it's something like an

inte like integrated thought clarifier

where whenever you're writing your

specification, it sort of ex pulls out

the ambiguity and asks you to clarify it

and it really clarifies your thought so

that you and all human beings can

communicate your intent to each other

much more effectively and to the models.

And I have a a closing request for help

which is uh what is both amenable and in

desperate need of specification. This is

aligning agent at scale. Uh I love this

line of like you then you realize that

you never told it what you wanted and

maybe you never fully understood it

anyway. This is a cry for specification.

Uh we have a new agent robustness team

that we've started up. So please join us

and help us deliver safe AGI for the

benefit of all humanity.

And thank you. I'm happy to chat.

[Music]

Loading...

Loading video analysis...