Lab-in-the-Loop: Smarter AI for Antibody Design
By Twist Bioscience
Summary
## Key takeaways - **Lab-in-the-Loop accelerates antibody design**: The 'Lab-in-the-Loop' framework integrates AI-driven design with targeted lab testing, creating an iterative workflow that streamlines antibody optimization and reduces experimental burden. [01:35] - **AI models predict antibody properties**: Custom-built generative AI models produce millions of antibody designs, which are then evaluated by 'pseudo-oracle' models predicting properties like binding affinity and expression levels, replacing slow experimental assays. [09:44] - **Active learning optimizes antibody candidates**: An active learning framework ranks AI-generated antibody designs to maximize expected improvement over existing candidates, guiding the selection of designs for laboratory synthesis and testing. [10:37] - **Iterative optimization improves antibody variants**: Across four key targets, iterative application of the 'Lab-in-the-Loop' process led to significant improvements, with up to 27% of designs in the fourth iteration showing three times better binding affinity than the original seed antibodies. [30:36] - **Multi-objective optimization ensures therapeutic viability**: The 'Lab-in-the-Loop' system employs multi-objective optimization to simultaneously improve multiple antibody properties, such as binding affinity and expression yield, while maintaining therapeutic constraints and avoiding issues like non-specificity. [20:25], [34:33] - **Propen method enhances AI antibody design**: The Propen method, a discriminator-free generative model, is particularly effective in low-data scenarios, enabling direct learning of property improvement gradients and generating significantly improved antibody variants. [38:18]
Topics Covered
- Lab in the Loop: Accelerating Antibody Design to Weeks
- Multi-Objective Optimization: Designing Better Therapeutic Antibodies
- Propen: AI Designs Better Antibodies in Low-Data Settings
- De Novo Design: Antibodies From Scratch, Faster
- Streamlining Wet Lab Data: The Speed Challenge
Full Transcript
Hello and welcome to the latest
technology networks webinar lab in the
loop smarter AI for antibbody design.
I'm today's moderator Dr. Steven Gibney
science writer for technology networks
and I'm excited to be here to host
today's session. We have a fantastic
presenter, Dr. Vladimir Gleorovich, who
will be sharing some of his valuable
insights with us. Vladimir is a senior
director of AI and machine learning,
leading a team of ML scientists and
structural computational biologists who
focus on large molecule drug discovery
problems. His team focus on developing
AI and ML tools for optimization and
design of both therapeutic and
diagnostic antibodies, bringing the lab
in the loop approach to the large
molecule drug discovery portfolio.
After Vladmare's presentation, we'll
have a short Q&A session. We encourage
you to submit your questions at any
point during the presentation. To do so,
just type your question into the box on
the right hand side of the screen and
click send. We'll do our best to answer
as many questions in the time that we
have available today. If you happen to
encounter any technical difficulties
during the webinar, click the chat box
on the right hand side of your screen to
request assistance from our support
team. Without further introduction, I'm
now going to hand over to our speaker,
Dr. Vlad. Over to you.
Hello everyone. Thanks so much for the
introduction and thanks very much for
the opportunity to speak here today.
So I'll be talking about our lab in the
loop framework for large molecules uh
and how we use it to accelerate
antibbody design.
Uh first of all let me tell you a little
bit about why is it uh important to
design antibodies and why antibodies are
our molecules of obsession. Um
so of course we have lab in the loop in
the company that we build for different
drug modalities but in this presentation
I'll be focusing only on uh antibodies.
Uh so on the left hand side here you
could see the cartoon picture of an
antibbody. We often say it's a it's a
Y-shaped molecule. It's the molecule
like I said of our obsession. Uh this is
the molecule that protects us from the
from the world. it's it's generated by
our immune response, our immune system
to block foreign pathogens. Uh and uh on
the redhead side, you could see the
statistics from the report from the last
year of uh how many antibodies were FDA
approved and you could see that
basically almost 20% of the FDA approved
drugs are are antibodies. Uh and um in
the previous year it was even more it
was about maybe 30%. So they are one of
the fastest growing class of drug
molecules over the past decade. Uh and
that's mainly because they can be used
to target very complex diseases such as
uh uh autoimmune disease and uh cancers
and as of very recently Alzheimer
disease as well. Uh for example for
cancers um one of the very first
antibbody designed by my company by
Genentech is a transducer mapab and it
was exactly used for breast cancer. Um
the promising results of the for example
aanimab uh that uh are kind of reported
in a year or two years ago are showing
that basically this uh antibbody can
effectively uh bind to the amoid beta
plaques accumulated in a neuross and uh
basically can remove them and improve
the clinical outcome of the of the
patients.
Uh so the typical way to uh discover
antibodies uh are there are basically
two ways. Uh the first one is either to
use some kind of library uh design uh
and uh display technologies and the
other way that is very commonly used in
my company is uh um um uh basically uh
immunizing animals. Um and uh usually
that's done by either you let's say
having a a specific uh target in mind
that is implied in some disease. We know
the pathway. We know the mechanism of
action. Then we inject the target into
animal and then we let the immune system
basically uh generate uh the antibodies
that we then can test for other
functions and uh uh uh properties.
uh this is uh basically the mechanism
that we use that is underlying this uh
uh process. It's called VDG
recombination.
Uh since I'm a machine learning
scientist, the way I like to think about
it is basically that you know our immune
system is some very big generative model
that can compute or that can actually
give you a bunch of variants of uh uh uh
of antibodies. But basically you know
this uh different there are three
different copies of uh sorry three
different segments in the of genes
called D and J that are basically uh uh
these copies they form three different
types of uh um these three different
types of gene segments are basically
building the variable region of an
antibbody and through DNA rearrangements
they cause one antibbody of each of
these gene segments to end up in a B
cell. Uh and as you can imagine this
sort of combinatorial space of combining
these different gene segments can be
very vast and so we end up in something
we call uh um repertoire which could be
of the size of 10 to 11 10 to 12
different antibodies that can be
extracted from these uh immunization
campaigns. Uh these antibodies that we
extract from immunization campaigns are
very diverse in sequence and and
structure and which makes them very nice
data set for training machine learning
models. Um
uh so that's the basically the
mechanism. So that mechanism the VDJ re
combination plus hypermutation
uh uh affinity maturation is the two
processes that are basically generating
a bunch of antibodies in this
immunization campaigns that we are then
extracting and testing for other uh uh
purposes. Um basically uh the uh a
little bit about our our our team. So we
are passion design uh we are uh for a
very brief period of time we were a
startup company that was uh acquired by
Genentech now we're part of the much
bigger computational department uh we
are called actually AI for DD or AI for
drug discovery um and broadly we are
interested in basically developing
machine learning tools making use of all
available data across Genentech and RO
to accelerate drug discovery
uh so we do focus on other drug
modalities like I said in the beginning
uh we do have lab in the loop for small
molecules for peptides but uh uh
basically
uh in this presentation I'll be talking
only about antibodies and this is very
first problem that we were working on or
we were focusing on when we first landed
in genentech almost four years ago now
and when we uh uh started this journey
basically we were given these three
problems uh one was called binary
identification binder the second one is
binary optimization. The third one is
denovo design. In the first problem, the
goal is to develop machine learning
models that can effectively mine these
huge immune repertoires of NGS sequences
for the purposes of basically uh finding
better hits um and identifying new
potential lead antibbody molecules.
In the second problem, we are often
given an antibbody. Uh we also call it
an antibbody seed that usually has some
uh property uh we're trying to improve
or basically it's it's uh it's limited
in some way. So for example, it could be
that we it has a low affinity and we
want to improve affinity. And so the
goal there is to uh develop machine
learning models and use lab in the loop
process to improve that property or a
range of different properties in
something we call multi-objective
optimization process. And the third
problem is the one that basically is now
very very popular. Many other companies
are uh um announcements
about this and that's a denovo design
problem. It's a very it's a very
challenging problem. uh it's a problem
which you're just given a target and
maybe a location on that target
uh uh like an epitop basically and the
idea is to develop an antibbody from
scratch uh uh start starting from target
only so no available uh no no data
available from we at that stage we don't
have any data available for the model uh
so our mission is basically to transform
the conventional drug discovery process
by developing these machine learning
tools um that unite both wet and lab uh
uh uh scientists in a in a thing we call
lab in the loop that can be that can be
used to address to address these three
uh different problems.
Uh so this is the way how we bring lab
in the loop to a typical portfolio
project. So, so like I said in the
beginning after you have a specific
target in mind that uh that went through
the target assessment that is implied or
that is basically part of some pathway
or imply some disease we do some of
these discovery campaigns. So it could
be immunization or something else,
right? That gets us a hit and then we
apply lab in the loop to basically
improve that molecule and optimize it
for other properties including affinity
and other developability properties by
running this cycle of design, screen,
analyze and make basically process. So
going into a little bit of details of
this uh uh lab in the loop framework
basically. So the first component of our
lab in the low framework are these
customuilt generative models
that produce uh antibbody sequence
designs. They can produce basically
millions of of different designs. Um
these models usually start from some
seed uh molecule or in case if we run a
denovo model then you just can start
from a target basically. uh but if you
start from a seed, you can then use
these generative models uh to generate
multiple different variants of of that
seed. Then uh designs uh are fed into a
property predictor um and we have
different kind of property predictors.
These are thing called we call also
pseudo oracles and because they can
produce or they can predict some
properties of antibodies including
including expression levels uh
non-specificity binding affinity etc.
And these predictive properties are then
uh used to rank designs uh in a thing we
call active learning such that we
maximize the expected improvement
compared to the seed antibbody right. Um
after ranking uh we uh decide on the
number of designs we want to send to the
lab. uh first we do send the designs for
like to twist to get the the the
molecules to measure the expression and
then uh after that we can measure the
binding affinity against the target uh
using SPR or surface plasma resonance
and of course we can do also additional
functional screening and other uh
characterization of these antibodies.
The most critical part is getting the
data back from the lab and feeding it
into back into the generative models and
property predictor oracles and um such
that these models can uh improve over
iterations uh of lab in the loop. Uh
one important thing here that I want to
uh uh emphasize is that first of all our
approach uh definitely takes steps
towards uh respecting important
therapeutic constraints because we
optimize different you know antibodies
for different type of properties and
you'll see that in a few slides uh uh
demonstrates the general aspects because
it can be I'll show you how it works
basically on different antigens on
different epitopes in this different
disease programs uh on which we were
running this lab in the loop.
uh and ena it enables this autonomous
antibody engineering basically we can
automate a lot of this stuff. The second
thing I want to uh uh highlight here
basically is uh this thing that is
written right below the experimental
characterization which is the timeline.
Uh and uh uh the whole loop basically
now like so we get the results back from
the lab in about four weeks uh with a
lot of things we streamlined uh and this
whole process takes basically now four
to five weeks like the entire loop to
get basically start from the seed and
get an antibbody. As you know machine
learning models can generate things from
a and from milliseconds to a few hours
right we can get designs but the most
important part is making also the
experimental characterization very fast
so that we can actually uh uh repeat or
or run multiple lab in the loops
basically. Um
now uh I did mention in the beginning
that there is a um an ensemble of
different uh models uh generative models
that we use and the main reason for that
is because we don't want to uh uh limit
ourselves on on any particular model or
limitations of of one particular model
and and so we want to really diversify
the sequences in the beginning and for
that purpose we basically use different
types of models. Um sometimes also these
what what models we will use uh will be
dictated by the project needs. uh uh and
so in that way we have a number of
different uh uh generative models uh
that can do unguided sampling uh such as
for example walk jump sampler or CPDM or
some protein language models or some
guided multi-propy optimization such as
uh Lambo property enhancer and other
other methods um as well. Um the other
uh thing is that uh in many cases in a
portfolio project if we are given only
the sequence we run sequence based
models. If we maybe have a structure or
maybe a co- crystal structure of a lead
antibody or potential lead antibbody
with a target then we can actually
couple these generative models with
other physics based models and we call
these hybrid models. uh we also have a
range of different structure-based
generative models and these are the
models that we also use for denovo
design and these are uh uh uh such as
for example funifold AB diffuser etc. Um
so um the reason why I'm mentioning all
these names of the methods is because
they are kind of published uh we like to
publish a lot in Janentech and so we
have a number of papers machine learning
papers uh that uh uh came out in the
last couple of years that are basically
uh custom build in the sense that we are
basically designing our own methods
always uh taking into account the needs
for each individual project and
understanding a little bit the data and
and uh how we can model these
complicated uh uh distributions of
antibbody sequences or antibbody
structures. Um so these are some of the
the papers here. Uh there's of course uh
the list is much bigger. uh if I have
time uh I will uh maybe focus on propen
uh maybe towards the end of this
presentation and because this uh is been
one of the most promising methods in our
lab in the loop basically with the
highest binding rate and highest
expression um uh of antibodies that have
been produced by this model. Uh, of
course, many of these models uh are part
of this big publication uh that is right
now in the late stages of a peerreview
process and uh um there we're focusing
more on the on the this lab and loop
procedure with all of these methods uh
and this is a basically a huge
collaboration between pressure uh uh and
antibbody engineering in in Janentech
and many of the results I'm going to
show you here are coming from this paper
but if I don't mention something you can
certainly check it out there in this
publication. Uh so going a little bit
into details of each individual
component of our lab in loop process. So
like I said the first component are
these custom build generative models and
uh as you can see uh in the previous
slides right uh we have actually a
number of these different models. Uh so
what we do in every iteration of lab in
the loop starting from some specific
targets uh uh and the seeds what we do
is basically we generate a bunch of
proposals uh we do restrict uh uh models
on 30,000 potential designs right so
each model depending on uh how many
models we have we will end up with uh
more designs but uh basically each model
is restricted to generate about 30,000
different possible candidates
Then these designs are like I said are
fed into the these uh um oracles
uh that are basically very useful
because they're replacing very slow and
expensive functional screening assays.
So these are the models that are trained
on the u uh in-house uh uh data. Uh uh
sometimes uh basically the the models
are coming from historic data from many
different projects. Uh and and basically
we then uh use this data to train these
different kind of oracles. Um this
combined library that comes from all of
these different models uh is uh first
filtered and ranked by our active
learning framework which is a very
important piece of our lab in the loop.
uh I will be uh mentioning that in the
next few slides. Um but just to focus
here on oracles right so we have oracles
sort of oracles for that can test by no
bind these are basically classifiers uh
that can predict the affinity level or
basically that can predict the KD values
or dissociation values the it can
predict the expression yield uh by some
regression models. Uh in addition to uh
all of these uh uh oracles right that
are measuring different properties of
antibodies, we also have some quality
control filters and these are the
filters that are checking for chemical
liability motifs and things like that.
Things that we often want to eliminate
if we see in the sequences. Um so
variable region sequences are then
synthesized as um um uh so after that we
pass the selection and ranking uh the
variable region sequences are then
synthesized a linear DNA fragments uh
and this is a sort of the job of twist
by senses right and so this this kind of
a linear DNA fragment based expression
workflow trims multiple cloning steps
saving time a lot and costs while also
being very amendable for automation. uh
and after expression antibodies are
affinity purified and uh concentration
is measured by optical density and then
we basically binding affinity of these
designs are and lead antibodies is also
is measured by surface plasma resonance
or SPR. Um after we get all these
measurements, like I said, the most
important step is putting that back into
the models and and retraining and then
repeat. Um of course, designing an
antibbody uh with generative AI is uh I
would I want to say easy. However,
designing a therapeutic antibbody is
very difficult. uh well there is a whole
list of different properties and the
values for these properties that uh that
needs to be in a specific ranges such
that the the project requirements are
satisfied and u uh uh a lot of time
we're basically given a molecule with
some in early stage or maybe even some
late stage uh uh development that has
some problems right it could be some uh
affinity problems uh or it could be some
developability uh uh problems so here
you could see some number of these
different uh characteristics that we're
measuring in addition to expression and
and binding. So that could be offtarget
specificity, thermal stability,
imogenicity,
uh self association, high concentration
uh uh properties like say high
concentration aggregation etc. Right? So
these are all very important to be
measured basically and included in this
uh lab in the loop. So that such that we
can actually optimize uh uh uh these
antibodies for multiple different
properties and uh uh this is where we
basically need something called multi
property optimization. Um so how can
this sort of uh lead to better molecules
right? So first of all it can reduce
time to clinic uh because we can
optimize these things in a very early
stages of the uh portfolio project. we
can increase clinical success rate right
for example if our generative models are
designing something that could be uh
indicating for let's say high
imunogenity risk or ADA uh uh risks
right we can immediately eliminate those
right in the uh uh in this optimization
steps right also uh with some of the
generality models the noble models we
can also target uh previously
undruggable diseases basically uh so um
now focusing so we covered the sampling
part, we cover maybe the the inference
part, uh the lab part as well. Now the
most important part is also the active
learning part and so how are designs
basically ranked um and uh um um so
first we uh have some predictions from
these oracles and we use these
predictions to rank the designs uh to
maximize basically the expected
improvement compared to the seed
antibody also to ensure that the
improvement of these models is that the
models are basically improving in the
next round. the data that we get from
the from the lab is of course then
reused and retrained and then we hope to
achieve that that that that goal. Uh so
in many cases actually we're given an
antibbody where you know there is for
example a a problem right it could be
the affinity or it could be some
developability uh uh uh issue. Uh the
important part here is basically that we
optimize that antibbody for one property
that is problematic here while
maintaining other properties the way
they are. For example, if you have a
high affinity antibbody but it has some
developability issues. The whole idea is
that we use the lab in the loop or
multibjective optimization in a way that
will introduce some mutations and
changes to the antibbody such that you
will improve that problematic property.
But you will never do that at the cost
of let's say losing affinity. Right? Now
as you know maybe uh the drug discovery
is not really done like that right it's
actually done very sequentially right
where you start from a number of designs
and then you're checking from one
property uh uh after another right and
this problem introduces this uh uh
issues later on that needs to be
engineered uh out somehow because uh you
will introduce some problems into the
antibbody for example the antibody that
initially didn't have high concentration
issues uh or uh can can have it in some
stage of the of the process if you
really don't take into account all of
these uh uh objectives and all these
properties at the same time from the
beginning. Uh so that's basically the
way how we like to say that we like to
stay at the Pareto frontier right uh
where we need to ensure that basically
uh we are not making changes uh in a way
that we are destroying other properties
while improving other uh some other
properties right and so for that
purposes we use a blackbox optimization
vision optimization with multi-objective
uh function um and uh our acquisition uh
uh function that I'll explain how we
define is the one that we are basically
here optimi optimizing with this
blackbox optimization such that we stay
at the partoal frontier of this property
space. So we want to avoid expensive
experimental assays for designs that are
strictly dominated. So basically it's
it's good or better in all respects that
the candidate design right so that's
basically what it means to stay at the
partoal frontier. So the acquisition
function here that we use for this
purposes is called expected hypervol
improvement. Um and this is the
indicator. This is the thing we're
trying to optimize from round to round.
Um
uh and here is a very easy way to
explain this. So of course this is a
work by our uh uh uh colleagues from
Prussian Gwan and Natasha. Uh so these
different properties like affinity,
expression, developability properties
are basically uh uh uh forming a space
in which if we put our designs they can
form something called hypervol.
Um so basically this this this this
hypervol uh is is basically occupied by
these uh different designs that we are
getting from uh our generative models.
And then we have these values of these
properties from our pseudo oracles. Uh
so we need to uh since we need to look
at all the objectives at the same time
right uh here this just uh this figure
here shows just two properties right it
could be expression and affinity for
example and the blue area here is the
one that is formed by added designs. So
the two orange one, two orange dots and
the one blue one that are forming
basically that is hypervol in this case
it's a it's a plane uh uh that uh we're
trying to maximize basically uh with uh
when we are running basically lab in the
loop process. Um so the way how this is
sort of done is basically for example if
in cases where we are running round two
of lab in the loop uh we might have
let's say three designs uh that's shown
here on the uh left hand side uh so we
have let's say two orange designs and
one blue one. So how do we decide which
one we want to measure? Uh so uh these
three designs are forming the
hypervolium that is shown here in that
blue as a blue area. But we know that uh
our p uh uh our previous uh round
basically is forming the hypervol
uh that is shown here in orange. And so
by substracting basically these two
hypervols, we can see where this is the
this this hypervol is improving or
basically which designs specifically are
contributing to improving or increasing
this hypervol. And you could see that
that's only the blue uh dot here and
that's the one that will be basically
selected for ranking uh and for uh uh
further experimentation.
in a very specific case of let's say
just looking at expression or uh uh
expression yield and let's say affinity
or the KD values that we're getting from
SPR measurements uh uh right you could
see here for example the black points
are the ones that are kind of improving
maybe the hypervol in the second round
of of this iteration so this whole ben
uh this whole process the mathematics
behind it the optimization is sort of
kind of described in these two papers
and it's a great work by uh G1 and and
and Natasha. Uh so where do we believe
that this multi-objective optimization
uh can bring more value uh the most
value to the portfolio project? Like I
said, the typical way to run drug
discovery process is mostly sequential.
Uh and maybe I don't know couple of
years or down the road or a year, one
year down the road after you uh uh I
don't know spend x amount of uh uh
dollars, right? You realize basically
that you don't have your lead molecule.
And that's because I don't know you
started from um let's say a sub a pool
of antibodies from the that you
discovered from like let's say
immunization campaign and then uh you
selected some uh then you let's say did
humanization
uh then there was a drop in affinity
then you tried to improve affinity but
then you introduce other mutations that
are causing some other problems and then
you're testing these other problems
later on and then eventually you might
realize that you actually picked
completely wrong clones from the
beginning because they were not uh
falling into the uh uh the the set of
designable basically antibodies. So how
machine learning can help in this
process? Well, the first step is that
you we always want to run lab in the
loop as early as possible in a drug
discovery process and we could use it
for example because by the time we come
to lead optimization and when there are
these other problems we might be very
much limited by how much we can actually
fix the antibbody right once the all
these problems are already introduced
right so the whole idea is that we don't
introduce these problems by just running
AI and lab in the loop as early as as in
the discovery process as we can And the
typical ways of how we do this is by
let's say diversifying
uh the uh the the the heat expansion
step with our generative AI models or by
doing constraint multi-objective
optimization and as early as possible
right uh by checking all these
properties or running all these filters
or any kind of biochemical uh features
that could later on be an indicators of
problems for some other biological uh or
some basically developability properties
for um antibodies.
Uh
so over the so let me show you some
results now basically so over the years
we've been running this lab in the loop
um since we joined basically uh uh
German
um um we've been running this on
basically multiple different targets. So
here you could see some of the targets
that we were uh uh testing this model
on. Right now, love in the loop is used
is almost every active portfolio
project. Uh uh and instead of just maybe
optimizing one or two properties, it's
been used for actually optimizing
multiple different properties at once.
uh and we what we observed is that this
framework is very successful is that
over time as you're running this thing
the uh uh over and over again basically
you are improving always the expression
or the number of antibodies that will
express and of course you're re
improving the number of antibodies uh
that will bind and of course you're
improving the antibodies that are better
than the seed and that's exactly what
I'm going to show you in the next few
slides. So for example here for these uh
three targets we've been running or
sorry for these four targets that I
showed you in the previous slide we've
been running lab in the loop uh uh like
we had like four maybe iterations of lab
in the loop uh and then we uh uh
measured uh uh how well lab in loop does
basically uh in each of these uh
iterations and basically how many times
uh uh so we introduce some metrics of
success right so the first one of course
is uh u the improvement on the affinity
and this is sort of computed by delta
pkd. So first of all we compute the k kd
which is a dissociation constant from
the kinetics uh parameters uh of the spr
curves. Uh then we transform lo
transform this value. So everything I'm
showing you here is is on a log scale
basically. And so we have a PKD value of
the seed molecule and we have a PKD
values of our designs. And so we can now
measure all the improvements uh between
the the the our designs and the seed.
And so uh uh of course we could also
measure things like how many times or
how many designs we have that have three
times better affinity, right? And so on
the left hand side uh the the the
histogram that you see is basically
measuring exactly that. So in each
iteration how many times did we design
three times better binders and you could
see that by the iteration number four
this number goes to 25 27% for example
uh if you look at the middle plot right
uh this is showing basically the from
all of the designs from all these
targets and different campaigns
basically we're measuring the difference
in PKD between the designs and the seed
and uh as a vertical line a dash
vertical line here you the things that
are three times better. Uh number two
here indicates
100 times better stuff. Remember this is
in a log scale, right? So as you can
see, not only did we manage to design a
lot of number, you know, a lot of B a
lot of antibodies that are three times
better, but we are also good at
designing antibodies that are better
than three times, right? Uh also we
measure the expression yield. And also
here it's the same uh uh histogram
showing basically uh the the difference
in expression yield measured in
milligrams right between the seed and
the design antibbody. Uh and the dashed
vertical line is showing basically the
0.01 let's say improvement in milligrams
uh between the seed and the design. Uh
as you can see there's also designs that
are worse than the seed uh that we
designed over the process. uh but we
also like were able to rank or filter
those out uh as we are selecting things
for the next uh round. Uh here another
very promising result right so on the
left hand side are actually u different
seeds from different targets
um and you could see all the designs
that are uh have improved affinity right
so only the designs with delta pkd
higher than zero and you could see that
for example for some seeds we achieving
things that are definitely uh three
times better two times better but there
are some time cases where We designed an
antibbody in just a few iterations of
lab in the loop that are 100 times
better uh uh uh in affinity. Uh also you
could see how this is sort of uh uh how
these numbers are improving over rounds
for different specific seeds. So here
you have these four targets and
different seeds uh uh so different
starting antibodies and you can actually
measure the maximum improvement in the
PKD value over rounds and you could see
basically that this is improving uh uh
uh beyond something that is three times
better which is again shown here is a
horizontal dash line. Um
uh another very important thing to check
when we're running lab in the loop is
whether we achieve the multi-objective
optimization.
Uh so like I said multi-objective
optimization will constrain our designs
to account for non-specificity
expression yield uh expression yield for
example that can impact downstream
characterization essays that require
specific amounts of protein material for
example and some incilico developability
risks right that we're also running. So
the way that we usually test this right
so we first of all we do have uh uh uh
models that can check for let's say
non-specificity these are the things
that are coming from DDLISA uh essay and
so this is kind of shown here at the
bottom left panel uh and so the estimate
uh to estimate the risk for
non-specificity we actually run this
BVLISA oracle uh uh and the score that
is above 0.1 would indicate a
non-specificity risk
So we confirmed basically that all of
our designs that that basically fall
below that predicted threshold. Uh for
some of the designs I think there was a
subset of them we were also able to
engineer out some chemical abilities
while also maintaining our our binding
affinity. Uh another way we check
whether we are basically in the range of
therapeutic antibodies right is that we
run something called therapeutic
antibbody profiler or TAP. And this is
basically just the the um the way to uh
uh sort of uh fe u basically get a lot
of parameters from our design antibodies
and compare them to the biochemical
features and parameters from uh uh
antibodies that are therapeutic
antibodies that are for example already
used uh or that went through the
successfully through the clinical stages
and then we can look at the statistics
basically how far away in the
distribution of our our designs versus
like let's say therapeutic antibodies
and here I again the plot on the top is
showing exactly that we or majority
cases for majority of our designs are in
the right range for uh therapeutic
antibodies.
uh
uh I do want to focus uh on in the next
maybe 5 minutes or 10 minutes on maybe
uh our pro one of our very promising
method and dive deeper into machine
learning maybe a little bit. So um as
you know for example if you want to do
some kind of conditional uh uh design uh
uh uh nowadays is the common p practice
is to do the uh guided design that
consists of two parts right say if
you're designing a machine learning
system it should have two parts usually
one is the generative part the the the
model that generates proposals and the
discriminative model that checks for
different properties right and can
actually filter out designs that are not
satisfying specific uh requirements and
so the most other setup is to have these
two different models. However, in most
low data settings, these models are not
doing the right job. For example, the
discriminator model uh cannot be
reliable if you don't have that many
data points. So, if you're working in a
low data regime, which is often the case
in many of the of these projects. Um so
uh the other issue is that the these uh
discriminatory model that is guiding the
generative model can basically guide the
generative model in directions where
there are no data points. So that's we
call out of distribution of of off
manifold. Uh and of course this
discriminatory model if you don't have
enough data points cannot learn very
complex uh complex distributions of
these different properties which is
often the case for some of these
developability properties. So we
introduced to to basically improve this
or basically to uh uh um um um um avoid
having some problems like that, right?
We introduced a model called Propen. Uh
basically uh that uh we realized that
having a discriminator being a
bottleneck or or uh uh is is basically
uh you know the bottleneck for this. And
we decided to find a workaround
uh and completely sidestep the this
explicit guidance. Uh and so we
successfully do that by incorporating a
tech technique called that is basically
coming from causal inference called
matching. And this lets us train a
discriminator free model. So we only
have one model here that is suitable
very suitable for low data regime. um uh
it doesn't uh propose designs that are
from out of distribution that has no
problem with basically rigid property
landscapes. It's very general approach.
So you can actually use it not only for
antibodies but you can train a similar
model for other drug modalities. Uh and
we call it property enhancer because it
can explicitly learn the to approximate
the gradient of improvement. Uh the it
consists basically on on just these
three steps. Um the first step is
matching and we split the data. Let's
say we have a data of antibodies like a
bunch of antibodies with let's say
affinity measurements. So we split the
data in low property values. So the
antibodies with low values of affinity
and high values of affinity and then for
each antibbody in the in the first set
of the low antibody data set, right? We
find the matching partner in the high
affinity data set, right? So these are
usually the antibodies with a low edit
distance or that are very very similar
uh but that have very that have
drastically higher values specific
values that could be affinity or
something or or it could be also other
properties as well and uh we call this
match data set. Uh and then we move to
the next step in which we train some
kind of an encoder decoder model in
which we map one antibbody to another.
And this explicitly basically implicitly
learns the the the gradient of
improvement. And then once you train a
model in this way, you're actually
increasing the number of data points
because you might have single you know
that your data set initially could have
some number of antibodies but when you
make the number of pairs you actually
are increasing the data set effectively
and you can train a model that can learn
this directly. uh of how to basically
improve an antibbody. Once you train a
model like this, you can use it in an
inference stage where you just start
from your seed molecule and then you
plug it into the model and then it's
basically you can use it for sampling
antibodies that are supposedly have the
higher affinity, right? And then you can
repeat this step by recycling.
Basically, you can put the generated
antibodies back into the encoder and
then generate another batch of
antibodies, etc., etc. And one very
interesting step here is that basically
if you look at the table uh uh in many
cases this uh method is generating
really high binding rates and expression
rates. Uh for example in three rounds of
u lab in the loop we were able to uh uh
basically get to something that is like
even 40 times better binders than the
ones we started from the seed molecules
basically. So this is a very interesting
and promising method. We also have a
publication about it. So you can read
more about the technical details, the
math behind it. Uh and also the the the
details of how we train this model and
how we use it in a in a on some of the
projects. Uh maybe last thing I want to
mention is the novo. This is something
we're also working on. Uh I just want to
uh say how this would be very
transformative for basically u uh
antibody design. So right now the
procedure I described
with lab in the loop um that uh uh are
used in lead optimization are basically
that we from a repertoire set right
after we do immunization we train some
models and then we use it for basically
optimization and getting better binders.
Um so we use it for binder discovery and
optimization. basically these two
challenges that I mentioned in the
beginning. The the Novo uh long-term
goal of course is that we do want to
start the let's say a portfolio project
with just a target. We don't need to do
the immunization or any library design
and we just run our denovo model with
let's say a painted epitope for which we
can design an antibbody from scratch and
this could drastically increase the the
speed in uh of generating antibodies and
optimizing them because we can then in
that step also include all these
multi-objective optimization and so the
way we think that the novel will enable
antibody design is that yes it will be
used for binary optimization
diversification the same way we use lab
in the loop but We can start from an
antibbody scratch antibbody basically
that is is from scratch. We can design
an antibody without any of the discovery
campaigns. Uh uh it could be used for
binded identification. So we'll be able
to generate antibodies let's say that
could diversify for different epitopes.
It could be very diverse right and
that's very important for hypothesis
testing and for other things as well. We
could easily design different parts of
an antibbody and then assemble them into
complex formats and we can also run
denovo for that as well.
Um and yeah, we are hoping that this
will uh uh uh drastically increase the
time from a target to the lead uh
molecule. Uh and then there's a lot of
ways and we can and we can do this uh uh
structure activity hypothesis because we
can generate multiple designs. We can
maybe even simulate some dynamics with
it. Uh and uh uh all of that can
basically come from from training these
models. uh maybe augmented with some of
the MD designs as well uh uh uh uh that
can basically improve the trainings or
increase the training set or improve
these models. Uh I'm going to stop here
and just acknowledge the support from
actually multiple different teams from
uh Genentech actually uh these are all
the people I mean predominantly the
collaboration between antibbody
engineering and preession is the one
that was driving all of this uh but of
course other other uh departments are
significantly contributing to building
different parts of our lab in the loop
system.
Thank you very much for listening and
I'm happy to take any questions.
[Music]
High throughput screening is fundamental
to biomedical research, particularly for
the development of proteinbased
therapeutics.
By iteratively screening protein
variants, researchers explore a
protein's vast sequence space in search
of a novel variant with desired
optimized properties. However, finding
the right protein variant is difficult
even with the latest AI tools. Largecale
screens and training data sets for
machine learning require pools of DNA
encoding thousands of defined protein
variants to effectively sample the
protein sequence space. There are many
ways that researchers can generate their
screening libraries, but current
technologies leave much of the sequence
space out of reach and underexplored.
If your protein is longer than 300
nucleotides, your options for building
screens are either very expensive, very
limited, or very biased. As a result,
the sequence space of most known protein
sequences cannot be effectively studied
in high throughput. To advance
therapeutic protein development and
AIdriven screening, researchers need
tools to economically synthesize longer
sequences of scale. At Twist, we
constantly ask, "How can we give more
scientists access to longer synthetic
DNA on a large scale?" Our team used our
proprietary siliconbased platform,
optimized chemistry and automation to
become the only company able to
synthesize precisely designed pulled
sequences that push the boundary beyond
300 nucleotides.
This innovative new product called
Multiplex Gene Fragments consists of
uniform pools of doublestranded DNA that
are designed to your sequence
specifications.
In just one week, you can receive pulled
liophalized gene fragments, up to 500
base pairs ready for use in your high
throughput screen. With access to
significantly more sequence space, you
could precisely synthesize a library to
generate training data sets to improve
your machine learning model or screen
single domain heavy chain antibbody
variable regions and make discoveries
that will lead to the next blockbuster
protein therapeutic. So what will you do
with significantly more sequence space?
Get started with Multiplex Gene
Fragments today.
[Music]
Thank you everyone for your attention
and thank you Vlad for that incredible
presentation. We're now going to move on
to the Q&A session. Feel free to submit
your questions using the box on the
right hand side of your screen and just
click send. Uh we've got uh just under
10 minutes, maybe five to 10 minutes, so
we'll do our best to address as many
questions as we can. Uh we've already
had quite a few questions. So if it's
all right with you Vlad, I think we're
just going to dive straight into
answering them.
So the first one we've had are what were
some of the key challenges in
integrating wet lab data with the
machine learning models during iterative
optimization.
Um yes see so I would say there was a a
challenge before that uh that was
basically first gathering all the
historic data so that we can have some
initial set of models uh uh and then
basically uh uh using these models for
the first basically version of the lab
in the loop uh system. Um
uh the the challenges were basically the
speed right we wanted really to um run
this really fast get the uh validation
of our models. So in the first I would
the the first basically year of building
this system the challenges were mainly
about um uh like how do we quickly test
some hypothesis? How do we quickly do
these things and so there was a lot of
um improvements uh um on the to
streamline the whole process. There was
also done like the experimental part uh
speeding up that a little bit of like
how do we submit these things building
the whole platform that gets to
antibbody engineering for further
testings and then it gets submitted for
let's say twist and then back to the lab
and then measuring SPR uh that process
first was a bit longer and then we
streamlined the whole process and we got
like down to four weeks uh which is
amazing I and then you can actually run
this really fast right now. Uh so I
guess there was a few different uh
difficulties that I guess were not
solved by me but my job was more like to
uh
basically filter out some of the good
generative models that we could use here
get the initial data sets that we can
run that we can train these models on.
uh and yeah the linear DNA fragments
that were that was done by um
uh some of the members of the antibbody
engineering in in genetic significantly
for example improved the time uh for for
getting the results uh back and doing
all these things but um
uh yeah so so the challenges were always
the speed and it still is today but I
think a lot of stuff are automated since
the since that first time we started
doing this That makes a lot of sense.
Again, a lot of work there. A lot of
work went obviously. Um, so we've had a
similar question um in terms of it's on
challenges. So, someone's asked, "What
are the major challenges when applying
AI models to predict bacterial
resistance mechanisms?"
>> Uh, I don't I'm I specifically I don't
work on that like bacterial resistance
mechanisms, right? Uh so we test mostly
stuff for antibodies and these are
completely different set of functions
and properties we're sort of interested
in.
>> Um and so I cannot really answer that
question. Uh but there are different
teams in genetic that are working on
this problem. I know that for sure. So
not unfortunately that is not something
I'm focusing on.
>> That's fine. It actually goes into
potentially another question of what
other projects do you plan um or do you
think that could use the lab in the loop
system?
Oh, there are many projects I mean in
the company uh that are using labin
look. It could be that there is a
project that I'm maybe not that much of
aware for bacterial resistance as well
but I'm pretty sure you could you could
use it for that as well. Um uh other
projects yes in in in my group in in my
team we use it for small molecules for
large molecules or antibodies and for of
course uh peptides and there is some
projects on RNA design as well. So it's
pretty much drugg modality
uh agnostic right. So you set up
generative models, you set up uh um the
oracles or oracles. Uh you could have
some active learning approach depending
on you know your your just parameters
will change your your parameter space
would would change depending on you know
the field or whatever you're you're
using. But basically I know for sure
that this whole concept is used in many
many different problems in the company.
We are mostly using it for drug
discovery uh right now and that's small
molecule large molecules.
Fantastic. We've had two more questions
that are kind of linked here. So the
first one is um what is the success rate
of AI derived antibodies from design
through to clinical translation?
>> Um I think we are still in the process
of evaluating that. What what I showed
you here is that
um
you could improve an antibbody uh you
could save a a project a portfolio
project right that's been stuck uh in
some stage where um you want to let's
say engineer out some part of the
antibbody that is related to some uh
developability issues. Right? So you
could use this approach for that that uh
things like for these problems basically
specifically if you want to
engineer out some things that are
causing poor clearance or or or high
concentration issues or stability issues
or affinity right so you could use this
approach for sure to improve these
things uh I think we'll take uh uh so we
do have some number of antibodies that
reach the pre I wouldn't say they
reached the pre-clinical stage but I
think it's too early I mean the answer
for this question is like it's too early
maybe to evaluate this uh uh but we have
a number of projects that are basically
in the stage where
uh you know we're doing good with
functional assays that are really well
optimized that we use on animals uh and
that still are uh doing uh well until we
get the real feedback on for example
some immunization uh imunogenicity sorry
issues or some uh kind of ADA risks it
will take some more time to see how this
will be done in the pre-clinical stages
but I would say the antibodies we
developed are still kind of in the stage
where we can measure the success only in
the in this kind of a pre uh up until
pre-clinical stage basically
um so it will take some more time to
kind to kind of see that but we have a
lot of ongoing work like ongoing things
where we'll we'll we'll get that
feedback soon on on how well the lab in
the loop did in this case to um to
improve the antibbody all the way to the
clinical development.
>> Well, one of the questions we've had
actually is kind of a two-part and
you've answered it a little bit there.
So, someone says how many real world
working drugs have been designed by lab
in the loop and but obviously you've
said it's still very early days and so
the second you said is what makes it so
precise this system? What is it that
makes the lab in the loop so precise?
not sure what what uh it's precise in
designing uh or improving antibodies.
Uh I I I like I said the metrics we used
here
are uh when we look at um
have we actually improved that antibbody
uh uh mostly in the early days we
focused only on expression and affinity
and we wanted to see what are the limits
of of uh of this how much we can
actually with generative models and this
oracles and active learning we can
actually end a few iterations of lab we
can actually improve this uh these
properties. Uh uh of course other things
are uh uh also extremely important and
this is something we're constantly
testing for but with but like low
throughput stuff um and so I am not sure
how how still to measure fully uh
success here. um uh uh what makes it
precise?
There's of course explanations. I mean
maybe the question is why this is
working or let's put it like that. Well,
there are many reasons for that, right?
It's the uh uh first of all in each
iteration here like in most cases you're
dealing with a low data regime, right?
And uh around that seed molecule that we
we want to improve, you don't have that
many data points, right? And what lobby
loop loop helps you it's not of course
you don't expect to immediately design
the best antibbody that's why we run
this iteratively right uh but what's
happening in this process is that you
are collecting more data points around a
very valuable seed molecule that you're
trying to improve in these iterations
because you are measuring
and it's important part is that these
generative models and discriminatory
models are helping you push this in the
right direction in which you are making
your models better and you are going in
the direction to get uh an improved
antibbody, right? Because that's how
these things are designed to like with
the guidance in the generative models is
exactly designed to go in the direction
that you want, right? While you're also
increasing data points and increasing
the chances for the next round. So
that's why I believe this is often
leading to success or to the examples
that I showed you here are the ones
where we really saw the improvement in
antibodies. improvement in affinity,
improvement in expression and so uh uh
um yeah so so these are some of the I
guess the reasons
>> yeah we got time for one more question
so we'll just um a bit more bigger
picture um so just to finish on how
generalizable is the lab in the loop
approach to other types of biologics or
therapeutic targets beyond just
antibodies
>> um
I would say
that's a that's A very good question and
like I said we do run separate lab in
the loops for different modalities. I
don't think you can generalize it for
other modalities but some models that
are building blocks of this lab middle
loop can generalize to other
uh u other drug modalities right so you
could have a generative model or that is
a denovo model that is trained on
everything basic it could be both large
molecules small molecules etc and then
as it sees more data it can become
better and better right and you can run
it for same model for both small
molecule and large molecules for
example. Uh but the very interesting
issue here I mean very I mean the
problematic thing actually but it's also
interesting in a sense that it would be
nice if we could uh uh if it could be
improved is that some models uh and that
could be let's say property predictors
uh or oracles uh don't even generalize
between different
um projects even within the antibbody
space right and and that's I would say a
bigger issue here because if I'm running
this for antibodies right? Uh I need to
train some models specifically for each
project, right? And so we do have many
models are actually generalizable and
you can use them for predicting uh
different things, right? Uh let's say
affinity and so on. Uh but some projects
are very different. Uh and for example,
functional assays depend really on the
project or a disease program on a lot of
stuff and they cannot really you cannot
use the same thing
um that you use for one project. it
doesn't generalize well on another
project. Um so yeah so there are some
issues on generalizability within
antibbody space but between different
projects and but also there are some
good stuff that are working uh well and
they can generalize let's say affinity
prediction for example
um other other things are um um yeah for
other things you might need sort of a
different lab in the loop system so that
you can actually start optimizing these
things.
>> Fantastic. That seems like a perfect
opportunity to wrap up. That is all the
time we've got uh today for questions.
Um just remember that any questions that
we didn't get to will be answered
offline as soon as possible. You can
also continue to ask questions even if
you're watching this webinar on demand.
And don't forget to download your
certificate of attendance which is
available from the handouts tab on the
right hand side of the platform. Once
again, a big thank you to everyone for
listening and Vlad again a big thank you
for your time today. It's been really
interesting talking to you. Thank you
and goodbye.
Thank you for inviting me.
Loading video analysis...