The New Code — Sean Grove, OpenAI
By AI Engineer
Summary
## Key takeaways - **Code is only 10-20% of value; communication is 80-90%.**: The most valuable skill in software development isn't writing code, but structured communication. This includes understanding user challenges, ideating solutions, planning, sharing, and verifying outcomes, which constitute 80-90% of an engineer's contribution. [02:45], [03:59] - **Specifications align humans and models, code doesn't.**: Unlike code, which is a lossy projection of intent, written specifications serve as the definitive artifact for aligning human teams and communicating goals. They are the source of truth that can be used to generate code, documentation, and other outputs. [06:09], [07:01] - **OpenAI's Model Spec: Markdown for human and AI alignment.**: OpenAI's Model Spec, written in human-readable Markdown, functions as a living document to align intentions and values. It includes specific clauses with IDs linked to challenging prompts, serving as executable success criteria for AI models. [09:42], [11:01] - **Sophistication in AI erodes trust; specs prevent it.**: AI models exhibiting 'sophistry' can erode user trust by prioritizing praise over impartial truth. A clear specification, like OpenAI's clause against being 'sophistic,' acts as a trust anchor, defining expected behavior and flagging deviations as bugs. [11:55], [13:36] - **Specs are the new superpower: executable, testable, composable.**: Specifications are becoming the fundamental unit of programming, analogous to code. They are executable, testable, composable, and can be used to generate various outputs, making spec-writing the critical skill for future programmers. [15:34], [19:07] - **The US Constitution: A national model specification.**: The US Constitution serves as a national model specification with clear policy, versioned amendments, and judicial review acting as a grader. This structure aligns citizens and adjudicates compliance, demonstrating how specifications can evolve safely over time. [16:48], [17:55]
Topics Covered
- Code isn't the value; communication is the bottleneck.
- Why source specifications are more valuable than generated code.
- How specifications align both humans and AI models.
- All professionals are programmers, using universal specifications.
- Future IDEs will clarify thought, not just code.
Full Transcript
[Music]
[Music]
Hello everyone. Thank you very much for
having me. Uh it's a very exciting uh
place to be. very exciting time to be uh
second uh I mean this has been like a
pretty intense couple of days I don't
know if you feel the same way uh but
also very energizing so I want to take a
little bit of your time today uh to talk
about what I see is the coming of the
new code uh in particular specifications
which sort of hold this promise uh that
it has been the dream of the industry
where you can write your your code your
intentions once and run them everywhere
uh Quick intro. My name is Sean. I work
at uh OpenAI uh specifically in
alignment research. And today I want to
talk about sort of the value of code
versus communication and why
specifications might be a little bit of
a better approach in general.
Uh I'm going to go over the anatomy of a
specification and we'll use the uh model
spec as the example. uh and we'll talk
about communicating intent to other
humans and we'll go over the 40 syphency
issue uh as a case study.
Uh we'll talk about how to make the
specification executable, how to
communicate intent to the models uh and
how to think about specifications as
code even if they're a little bit
different. Um and we'll end on a couple
of open questions. So let's talk about
code versus communication
real quick. Raise your hand if you write
code and vibe code counts.
Cool. Keep them up if your job is to
write code.
Okay. Now, for those people, keep their
head up if you feel that the most
valuable professional artifact that you
produce is code.
Okay. There's quite a few people and I
think this is quite natural. We all work
very very hard to solve problems. We
talk with people. We gather
requirements. We think through
implementation details. We integrate
with lots of different sources. And the
ultimate thing that we produce is code.
Code is the artifact that we can point
to, we can measure, we can debate, and
we can discuss. Uh it feels tangible and
real, but it's sort of underelling the
job that each of you does. Code is sort
of 10 to 20% of the value that you
bring. The other 80 to 90% is in
structured communication. And this is
going to be different for everyone, but
a process typically looks something like
you talk to users in order to understand
their challenges. You distill these
stories down and then ideulate about how
to solve these problems. What what is
the goal that you want to achieve? You
plan ways to achieve those goals. You
share those plans with your colleagues.
uh you translate those plans into code.
So this is a very important step
obviously and then you test and verify
not the code itself right no one cares
actually about the code itself. What you
care is when the code ran did it achieve
the goals did it alleviate the
challenges of your user. You look at the
the effects that your code had on the
world. So talking, understanding
distilling ideulating
planning sharing translating testing
verifying, these all sound like
structured communication to me. And
structured communication is the
bottleneck.
knowing what to build, talking to people
and gathering requirements, knowing how
to build it, knowing why to build it
and at the end of the day, knowing if it
has been built correctly and has
actually achieved the intentions that
you set out with.
And the more advanced AI models get, the
more we are all going to starkly feel
this bottleneck.
Because in the near future, the person
who communicates most effectively is the
most valuable programmer. And literally
if you can communicate effectively, you
can program.
So, let's take uh vibe coding as an
illustrative example. Vibe coding tends
to feel quite good. And it's worth
asking why is that? Well, vibe coding is
fundamentally about communication first.
And the code is actually a secondary
downstream artifact of that
communication.
We get to describe our intentions and
our the outcomes that we want to see and
we let the model actually handle the
grunt work for us. And even so, there is
something strange about the way that we
do vibe coding. We communicate via
prompts to the model
and we tell them our intentions and our
values and we get a code artifact out at
the end and then we sort of throw our
prompts away they're ephemeral
and if you've written TypeScript or Rust
once you put your your code through a
compiler or it gets down into a binary
no one is happy with that binary. That
wasn't the purpose. It's useful. In
fact, we always regenerate the binaries
from scratch every time we compile or we
run our code through V8 or whatever it
might be from the source spec. It's the
source specification that's the valuable
artifact.
And yet when we prompt elements, we sort
of do the opposite. We keep the
generated code and we delete the prompt.
And this feels like a little bit like
you shred the source and then you very
carefully version control the binary.
And that's why it's so important to
actually capture the intent and the
values in a specification.
A written specification is what enables
you to align humans on the shared set of
goals and to know if you are aligned if
you actually synchronize on what needs
to be done. This is the artifact that
you discuss that you debate that you
refer to and that you synchronize on.
And this is really important. And so I
want to nail this this home that a
written specification effectively aligns
humans
and it is the artifact that you use to
communicate and to discuss and debate
and refer to and synchronize on. If you
don't have a specification, you just
have a vague idea.
Now let's talk about why specifications
are more powerful in general than code.
Because code itself is actually a lossy
projection from the specification.
In the same way that if you were to take
a compiled C binary and decompile it
you wouldn't get nice comments and uh
well-n named variables. You would have
to work backwards. You'd have to infer
what was this person trying to do? Why
is this code written this way? It isn't
actually contained in there. It was a
lossy translation. And in the same way
code itself, even nice code, typically
doesn't embody all of the intentions and
the values in itself. You have to infer
what is the ultimate goal that this team
is trying to achieve. Uh when you read
through code
so communication, the work that we
establish, we already do when embodied
inside of a written specification is
better than code. It actually encodes
all of the the necessary requirements in
order to generate the code. And in the
same way that having a source code that
you pass to a compiler allows you to
target multiple different uh
architectures, you can compile for ARM
64, x86 or web assembly. The source
document actually contains enough
information to describe how to translate
it to your target architecture.
In the same way, a a sufficiently robust
specification given to models will
produce good TypeScript, good Rust
servers clients documentation
tutorials, blog posts, and even
podcasts.
Uh, show of hands, who works at a
company that has developers as
customers?
Okay. So, a a quick like thought
exercise is if you were to take your
entire codebase, all of the the
documentation, oh, so all of the code
that runs your business, and you were to
put that into a podcast generator, could
you generate something that would be
sufficiently interesting and compelling
that would tell the users how to
succeed, how to achieve their goals, or
is all of that information somewhere
else? It's not actually in your code.
And so moving forward, the new scarce
skill is writing specifications that
fully capture the intent and values. And
whoever masters that again becomes the
most valuable programmer
and there's a reasonable chance that
this is going to be the coders of today.
This is already very similar to what we
do. However, product managers also write
specifications. Lawmakers write legal
specifications.
This is actually a universal principle.
So with that in mind, let's look at what
a specification actually looks like. And
I'm going to use the OpenAI model spec
as an example here. So last year, OpenAI
released the model spec. And this is a
living document that tries to clearly
and unambiguously
express the intentions and values that
OpenAI hopes to imbue its models with
that it ships to the world.
and it was updated in in uh February and
open sourced. So you can actually go to
GitHub and you can see the
implementation of uh the model spec and
surprise surprise it's actually just a
collection of markdown files just looks
like this. Now markdown is remarkable.
It is human readable. It's versioned.
It's change logged and because it is
natural language everyone in not just
technical people can contribute
including product legal safety research
policy they can all read discuss debate
and contribute to the same source code.
This is the universal artifact that
aligns all of the humans as to our
intentions and values inside of the
company.
Now, as much as we might try to use
unambiguous language, there are times
where it's very difficult to express the
nuance. So, every clause in the model
spec has an ID here. So, you can see
sy73 here. And using that ID, you can
find another file in the repository
sy73.mmarkdown
or md uh that contains one or more
challenging prompts
for this exact clause. So the document
itself actually encodes success criteria
that the the model under test has to be
able to answer this in a way that
actually adheres to that clause.
So let's talk about uh syphy. Uh
recently there was a update to 40. I
don't, know if, you've, heard, of, this., Uh
there uh caused extreme syphy. uh and we
can ask like what value is the model
spec in this scenario and the model spec
serves to align humans around a set of
values and intentions.
Here's an example of syphy where the
user calls out the behavior of being uh
syphants uh or sophantic at the expense
of impartial truth and the model very
kindly uh praises the user for their
insight.
There have been other esteemed
researchers uh who have found similarly
uh
similarly uh concerning examples
and this hurts. Uh shipping sycopency in
this manner erodess trust.
It hurts.
So and it also raises a lot of questions
like was this intentional? you could see
some way where you might interpret it
that way. Was it accidental and why
wasn't it caught?
Luckily, the model spec actually
includes a section dedicated to this
since its release that says don't be
sick of fantic and it explains that
while syopency might feel good in the
short term, it's bad for everyone in the
long term. So, we actually expressed our
intentions and our values and were able
to communicate it to others through this
So people could reference it and if we
have it in the model spec specification
if the model specification is our agreed
upon set of intentions and values and
the behavior doesn't align with that
then this must be a bug.
So we rolled back we published some
studies and some blog post and we fixed
it.
But in the interim, the specs served as
a trust anchor, a way to communicate to
people what is expected and what is not
expected.
So if just if the only thing the model
specification did was to align humans
along those shared sets of intentions
and values, it would already be
incredibly useful.
But ideally we can also align our models
and the artifacts that our models
produce against that same specification.
So there's a technique a paper that we
released uh called deliberative
alignment that sort of talks about this
how to automatically align a model and
the technique is uh such where you take
your specification and a set of very
challenging uh input prompts and you
sample from the model under test or
training.
You then uh take its response, the
original prompt and the policy and you
give that to a greater model and you ask
it to score the response according to
the specification. How aligned is it? So
the document actually becomes both
training material and eval material
and based off of this score we reinforce
those weights and it goes from you know
you could include your specification in
the context and then maybe a system
message or developer message in every
single time you sample and that is
actually quite useful. a prompted uh
model is going to be somewhat aligned
but it does detract from the compute
available to solve the uh problem that
you're trying to solve with the model.
And keep in mind, these specifications
can be anything. They could be code
style or testing requirements or or
safety requirements. All of that can be
embedded into the model. So through this
technique you're actually moving it from
a inference time compute and actually
you're pushing down into the weights of
the model so that the model actually
feels your policy and is able to sort of
muscle memory uh style apply it to the
problem at hand.
And even though we saw that the model
spec is just markdown it's quite useful
to think of it as code. It's quite
analogous.
uh these specifications they compose
they're executable as we've seen uh they
are testable they have interfaces where
they they touch the real world uh they
can be shipped as modules
and whenever you're working on a model
spec there are a lot of similar sort of
uh problem domains so just like in
programming, where, you, have a, type
checker the type checker is meant to
ensure consistency where if interface A
has a dependent uh module B they have to
be consistent in their understanding of
one another. So if department A writes a
spec and department B writes a spec and
there is a conflict in there you want to
be able to pull that forward and maybe
block the publication of the the
specification as we saw the policy can
actually embody its own unit tests and
you can imagine sort of various llinters
where if you're using overly ambiguous
language you're going to confuse humans
and you're going to confuse the model
and the artifacts that you get from that
are going to be less satisfactory.
So specs actually give us a very similar
tool chain but it's targeted at
intentions rather than syntax.
So let's talk about lawmakers as
programmers. Uh
the US constitution is literally a
national model specification. It has
written text which is aspirationally at
least clear and unambiguous policy that
we can all refer to. And it doesn't mean
that we agree with it but we can refer
to it as the current status quo as the
reality. Uh there is a versioned way to
make amendments to bump and to uh
publish updates to it. There is judicial
review where a a grader is effectively
uh grading a situation and seeing how
well it aligns with the policy. And even
though the again because or even though
the source policy is meant to be
unambiguous sometimes you don't the
world is messy and maybe you miss part
of the distribution and a case falls
through and in that case the there is a
lot of compute spent in judicial review
where you're trying to understand how
the law actually applies here and once
that's decided it sets a precedent and
that precedent is effectively an input
output pair that serves as a unit test
that disambiguates and rein reinforces
the original policy spec. Uh it has
things like uh chain of command embedded
in it and the enforcement of this over
time is a training loop that helps align
all of us towards a shared set of
intentions and values. So this is one
artifact that communicates intent. It
adjudicates compliance and it has a way
of uh evolving safely.
So it's quite possible that lawmakers
will be programmers or inversely that
programmers will be lawmakers in the
future.
And actually this apply this is a very
universal concept. Programmers are in
the business of aligning silicon via
code specifications. Product managers
align teams via product specifications.
Lawmakers literally align humans via
legal specifications. And everyone in
this room whenever you are doing a
prompt it's a sort of proto
specification. You are in the business
of aligning AI models towards a common
set set of intentions and values. And
whether you realize it or not you are
spec authors in this world and specs let
you ship faster and safer. Everyone can
contribute and whoever writes the spec
be it a
uh a PM uh a lawmaker an engineer a
marketer is now the programmer
and software engineering has never been
about code. Going back to our original
question a lot of you put your hands
down when you thought well actually the
thing I produced is not code.
Engineering has never been about this.
Coding is an incredible skill and a
wonderful asset, but it is not the end
goal. Engineering is the precise
exploration by humans of software
solutions to human problems. It's always
been this way. We're just moving away
from sort of the disperate machine
encodings to a unified human encoding uh
of how we actually uh solve these these
problems. Uh I want to thank Josh for
this uh credit. So I want to ask you
put this in action. Whenever you're
working on your next AI feature, start
with the specification.
What do you actually expect to happen?
What's success criteria look like?
Debate whether or not it's actually
clearly written down and communicated.
Make the spec executable. Feed the spec
to the model
and test against the model or test
against the spec. And there's an
interesting question sort of in this
world given that there's so many uh
parallels between programming and spec
authorship.
I wonder what is the what does the IDE
look like in the future. you know, an
integrated development environment. And
I'd like to think it's something like an
inte like integrated thought clarifier
where whenever you're writing your
specification, it sort of ex pulls out
the ambiguity and asks you to clarify it
and it really clarifies your thought so
that you and all human beings can
communicate your intent to each other
much more effectively and to the models.
And I have a a closing request for help
which is uh what is both amenable and in
desperate need of specification. This is
aligning agent at scale. Uh I love this
line of like you then you realize that
you never told it what you wanted and
maybe you never fully understood it
anyway. This is a cry for specification.
Uh we have a new agent robustness team
that we've started up. So please join us
and help us deliver safe AGI for the
benefit of all humanity.
And thank you. I'm happy to chat.
[Music]
Loading video analysis...