LongCut logo

Lab-in-the-Loop: Smarter AI for Antibody Design

By Twist Bioscience

Summary

## Key takeaways - **Lab-in-the-Loop accelerates antibody design**: The 'Lab-in-the-Loop' framework integrates AI-driven design with targeted lab testing, creating an iterative workflow that streamlines antibody optimization and reduces experimental burden. [01:35] - **AI models predict antibody properties**: Custom-built generative AI models produce millions of antibody designs, which are then evaluated by 'pseudo-oracle' models predicting properties like binding affinity and expression levels, replacing slow experimental assays. [09:44] - **Active learning optimizes antibody candidates**: An active learning framework ranks AI-generated antibody designs to maximize expected improvement over existing candidates, guiding the selection of designs for laboratory synthesis and testing. [10:37] - **Iterative optimization improves antibody variants**: Across four key targets, iterative application of the 'Lab-in-the-Loop' process led to significant improvements, with up to 27% of designs in the fourth iteration showing three times better binding affinity than the original seed antibodies. [30:36] - **Multi-objective optimization ensures therapeutic viability**: The 'Lab-in-the-Loop' system employs multi-objective optimization to simultaneously improve multiple antibody properties, such as binding affinity and expression yield, while maintaining therapeutic constraints and avoiding issues like non-specificity. [20:25], [34:33] - **Propen method enhances AI antibody design**: The Propen method, a discriminator-free generative model, is particularly effective in low-data scenarios, enabling direct learning of property improvement gradients and generating significantly improved antibody variants. [38:18]

Topics Covered

  • Lab in the Loop: Accelerating Antibody Design to Weeks
  • Multi-Objective Optimization: Designing Better Therapeutic Antibodies
  • Propen: AI Designs Better Antibodies in Low-Data Settings
  • De Novo Design: Antibodies From Scratch, Faster
  • Streamlining Wet Lab Data: The Speed Challenge

Full Transcript

Hello and welcome to the latest

technology networks webinar lab in the

loop smarter AI for antibbody design.

I'm today's moderator Dr. Steven Gibney

science writer for technology networks

and I'm excited to be here to host

today's session. We have a fantastic

presenter, Dr. Vladimir Gleorovich, who

will be sharing some of his valuable

insights with us. Vladimir is a senior

director of AI and machine learning,

leading a team of ML scientists and

structural computational biologists who

focus on large molecule drug discovery

problems. His team focus on developing

AI and ML tools for optimization and

design of both therapeutic and

diagnostic antibodies, bringing the lab

in the loop approach to the large

molecule drug discovery portfolio.

After Vladmare's presentation, we'll

have a short Q&A session. We encourage

you to submit your questions at any

point during the presentation. To do so,

just type your question into the box on

the right hand side of the screen and

click send. We'll do our best to answer

as many questions in the time that we

have available today. If you happen to

encounter any technical difficulties

during the webinar, click the chat box

on the right hand side of your screen to

request assistance from our support

team. Without further introduction, I'm

now going to hand over to our speaker,

Dr. Vlad. Over to you.

Hello everyone. Thanks so much for the

introduction and thanks very much for

the opportunity to speak here today.

So I'll be talking about our lab in the

loop framework for large molecules uh

and how we use it to accelerate

antibbody design.

Uh first of all let me tell you a little

bit about why is it uh important to

design antibodies and why antibodies are

our molecules of obsession. Um

so of course we have lab in the loop in

the company that we build for different

drug modalities but in this presentation

I'll be focusing only on uh antibodies.

Uh so on the left hand side here you

could see the cartoon picture of an

antibbody. We often say it's a it's a

Y-shaped molecule. It's the molecule

like I said of our obsession. Uh this is

the molecule that protects us from the

from the world. it's it's generated by

our immune response, our immune system

to block foreign pathogens. Uh and uh on

the redhead side, you could see the

statistics from the report from the last

year of uh how many antibodies were FDA

approved and you could see that

basically almost 20% of the FDA approved

drugs are are antibodies. Uh and um in

the previous year it was even more it

was about maybe 30%. So they are one of

the fastest growing class of drug

molecules over the past decade. Uh and

that's mainly because they can be used

to target very complex diseases such as

uh uh autoimmune disease and uh cancers

and as of very recently Alzheimer

disease as well. Uh for example for

cancers um one of the very first

antibbody designed by my company by

Genentech is a transducer mapab and it

was exactly used for breast cancer. Um

the promising results of the for example

aanimab uh that uh are kind of reported

in a year or two years ago are showing

that basically this uh antibbody can

effectively uh bind to the amoid beta

plaques accumulated in a neuross and uh

basically can remove them and improve

the clinical outcome of the of the

patients.

Uh so the typical way to uh discover

antibodies uh are there are basically

two ways. Uh the first one is either to

use some kind of library uh design uh

and uh display technologies and the

other way that is very commonly used in

my company is uh um um uh basically uh

immunizing animals. Um and uh usually

that's done by either you let's say

having a a specific uh target in mind

that is implied in some disease. We know

the pathway. We know the mechanism of

action. Then we inject the target into

animal and then we let the immune system

basically uh generate uh the antibodies

that we then can test for other

functions and uh uh uh properties.

uh this is uh basically the mechanism

that we use that is underlying this uh

uh process. It's called VDG

recombination.

Uh since I'm a machine learning

scientist, the way I like to think about

it is basically that you know our immune

system is some very big generative model

that can compute or that can actually

give you a bunch of variants of uh uh uh

of antibodies. But basically you know

this uh different there are three

different copies of uh sorry three

different segments in the of genes

called D and J that are basically uh uh

these copies they form three different

types of uh um these three different

types of gene segments are basically

building the variable region of an

antibbody and through DNA rearrangements

they cause one antibbody of each of

these gene segments to end up in a B

cell. Uh and as you can imagine this

sort of combinatorial space of combining

these different gene segments can be

very vast and so we end up in something

we call uh um repertoire which could be

of the size of 10 to 11 10 to 12

different antibodies that can be

extracted from these uh immunization

campaigns. Uh these antibodies that we

extract from immunization campaigns are

very diverse in sequence and and

structure and which makes them very nice

data set for training machine learning

models. Um

uh so that's the basically the

mechanism. So that mechanism the VDJ re

combination plus hypermutation

uh uh affinity maturation is the two

processes that are basically generating

a bunch of antibodies in this

immunization campaigns that we are then

extracting and testing for other uh uh

purposes. Um basically uh the uh a

little bit about our our our team. So we

are passion design uh we are uh for a

very brief period of time we were a

startup company that was uh acquired by

Genentech now we're part of the much

bigger computational department uh we

are called actually AI for DD or AI for

drug discovery um and broadly we are

interested in basically developing

machine learning tools making use of all

available data across Genentech and RO

to accelerate drug discovery

uh so we do focus on other drug

modalities like I said in the beginning

uh we do have lab in the loop for small

molecules for peptides but uh uh

basically

uh in this presentation I'll be talking

only about antibodies and this is very

first problem that we were working on or

we were focusing on when we first landed

in genentech almost four years ago now

and when we uh uh started this journey

basically we were given these three

problems uh one was called binary

identification binder the second one is

binary optimization. The third one is

denovo design. In the first problem, the

goal is to develop machine learning

models that can effectively mine these

huge immune repertoires of NGS sequences

for the purposes of basically uh finding

better hits um and identifying new

potential lead antibbody molecules.

In the second problem, we are often

given an antibbody. Uh we also call it

an antibbody seed that usually has some

uh property uh we're trying to improve

or basically it's it's uh it's limited

in some way. So for example, it could be

that we it has a low affinity and we

want to improve affinity. And so the

goal there is to uh develop machine

learning models and use lab in the loop

process to improve that property or a

range of different properties in

something we call multi-objective

optimization process. And the third

problem is the one that basically is now

very very popular. Many other companies

are uh um announcements

about this and that's a denovo design

problem. It's a very it's a very

challenging problem. uh it's a problem

which you're just given a target and

maybe a location on that target

uh uh like an epitop basically and the

idea is to develop an antibbody from

scratch uh uh start starting from target

only so no available uh no no data

available from we at that stage we don't

have any data available for the model uh

so our mission is basically to transform

the conventional drug discovery process

by developing these machine learning

tools um that unite both wet and lab uh

uh uh scientists in a in a thing we call

lab in the loop that can be that can be

used to address to address these three

uh different problems.

Uh so this is the way how we bring lab

in the loop to a typical portfolio

project. So, so like I said in the

beginning after you have a specific

target in mind that uh that went through

the target assessment that is implied or

that is basically part of some pathway

or imply some disease we do some of

these discovery campaigns. So it could

be immunization or something else,

right? That gets us a hit and then we

apply lab in the loop to basically

improve that molecule and optimize it

for other properties including affinity

and other developability properties by

running this cycle of design, screen,

analyze and make basically process. So

going into a little bit of details of

this uh uh lab in the loop framework

basically. So the first component of our

lab in the low framework are these

customuilt generative models

that produce uh antibbody sequence

designs. They can produce basically

millions of of different designs. Um

these models usually start from some

seed uh molecule or in case if we run a

denovo model then you just can start

from a target basically. uh but if you

start from a seed, you can then use

these generative models uh to generate

multiple different variants of of that

seed. Then uh designs uh are fed into a

property predictor um and we have

different kind of property predictors.

These are thing called we call also

pseudo oracles and because they can

produce or they can predict some

properties of antibodies including

including expression levels uh

non-specificity binding affinity etc.

And these predictive properties are then

uh used to rank designs uh in a thing we

call active learning such that we

maximize the expected improvement

compared to the seed antibbody right. Um

after ranking uh we uh decide on the

number of designs we want to send to the

lab. uh first we do send the designs for

like to twist to get the the the

molecules to measure the expression and

then uh after that we can measure the

binding affinity against the target uh

using SPR or surface plasma resonance

and of course we can do also additional

functional screening and other uh

characterization of these antibodies.

The most critical part is getting the

data back from the lab and feeding it

into back into the generative models and

property predictor oracles and um such

that these models can uh improve over

iterations uh of lab in the loop. Uh

one important thing here that I want to

uh uh emphasize is that first of all our

approach uh definitely takes steps

towards uh respecting important

therapeutic constraints because we

optimize different you know antibodies

for different type of properties and

you'll see that in a few slides uh uh

demonstrates the general aspects because

it can be I'll show you how it works

basically on different antigens on

different epitopes in this different

disease programs uh on which we were

running this lab in the loop.

uh and ena it enables this autonomous

antibody engineering basically we can

automate a lot of this stuff. The second

thing I want to uh uh highlight here

basically is uh this thing that is

written right below the experimental

characterization which is the timeline.

Uh and uh uh the whole loop basically

now like so we get the results back from

the lab in about four weeks uh with a

lot of things we streamlined uh and this

whole process takes basically now four

to five weeks like the entire loop to

get basically start from the seed and

get an antibbody. As you know machine

learning models can generate things from

a and from milliseconds to a few hours

right we can get designs but the most

important part is making also the

experimental characterization very fast

so that we can actually uh uh repeat or

or run multiple lab in the loops

basically. Um

now uh I did mention in the beginning

that there is a um an ensemble of

different uh models uh generative models

that we use and the main reason for that

is because we don't want to uh uh limit

ourselves on on any particular model or

limitations of of one particular model

and and so we want to really diversify

the sequences in the beginning and for

that purpose we basically use different

types of models. Um sometimes also these

what what models we will use uh will be

dictated by the project needs. uh uh and

so in that way we have a number of

different uh uh generative models uh

that can do unguided sampling uh such as

for example walk jump sampler or CPDM or

some protein language models or some

guided multi-propy optimization such as

uh Lambo property enhancer and other

other methods um as well. Um the other

uh thing is that uh in many cases in a

portfolio project if we are given only

the sequence we run sequence based

models. If we maybe have a structure or

maybe a co- crystal structure of a lead

antibody or potential lead antibbody

with a target then we can actually

couple these generative models with

other physics based models and we call

these hybrid models. uh we also have a

range of different structure-based

generative models and these are the

models that we also use for denovo

design and these are uh uh uh such as

for example funifold AB diffuser etc. Um

so um the reason why I'm mentioning all

these names of the methods is because

they are kind of published uh we like to

publish a lot in Janentech and so we

have a number of papers machine learning

papers uh that uh uh came out in the

last couple of years that are basically

uh custom build in the sense that we are

basically designing our own methods

always uh taking into account the needs

for each individual project and

understanding a little bit the data and

and uh how we can model these

complicated uh uh distributions of

antibbody sequences or antibbody

structures. Um so these are some of the

the papers here. Uh there's of course uh

the list is much bigger. uh if I have

time uh I will uh maybe focus on propen

uh maybe towards the end of this

presentation and because this uh is been

one of the most promising methods in our

lab in the loop basically with the

highest binding rate and highest

expression um uh of antibodies that have

been produced by this model. Uh, of

course, many of these models uh are part

of this big publication uh that is right

now in the late stages of a peerreview

process and uh um there we're focusing

more on the on the this lab and loop

procedure with all of these methods uh

and this is a basically a huge

collaboration between pressure uh uh and

antibbody engineering in in Janentech

and many of the results I'm going to

show you here are coming from this paper

but if I don't mention something you can

certainly check it out there in this

publication. Uh so going a little bit

into details of each individual

component of our lab in loop process. So

like I said the first component are

these custom build generative models and

uh as you can see uh in the previous

slides right uh we have actually a

number of these different models. Uh so

what we do in every iteration of lab in

the loop starting from some specific

targets uh uh and the seeds what we do

is basically we generate a bunch of

proposals uh we do restrict uh uh models

on 30,000 potential designs right so

each model depending on uh how many

models we have we will end up with uh

more designs but uh basically each model

is restricted to generate about 30,000

different possible candidates

Then these designs are like I said are

fed into the these uh um oracles

uh that are basically very useful

because they're replacing very slow and

expensive functional screening assays.

So these are the models that are trained

on the u uh in-house uh uh data. Uh uh

sometimes uh basically the the models

are coming from historic data from many

different projects. Uh and and basically

we then uh use this data to train these

different kind of oracles. Um this

combined library that comes from all of

these different models uh is uh first

filtered and ranked by our active

learning framework which is a very

important piece of our lab in the loop.

uh I will be uh mentioning that in the

next few slides. Um but just to focus

here on oracles right so we have oracles

sort of oracles for that can test by no

bind these are basically classifiers uh

that can predict the affinity level or

basically that can predict the KD values

or dissociation values the it can

predict the expression yield uh by some

regression models. Uh in addition to uh

all of these uh uh oracles right that

are measuring different properties of

antibodies, we also have some quality

control filters and these are the

filters that are checking for chemical

liability motifs and things like that.

Things that we often want to eliminate

if we see in the sequences. Um so

variable region sequences are then

synthesized as um um uh so after that we

pass the selection and ranking uh the

variable region sequences are then

synthesized a linear DNA fragments uh

and this is a sort of the job of twist

by senses right and so this this kind of

a linear DNA fragment based expression

workflow trims multiple cloning steps

saving time a lot and costs while also

being very amendable for automation. uh

and after expression antibodies are

affinity purified and uh concentration

is measured by optical density and then

we basically binding affinity of these

designs are and lead antibodies is also

is measured by surface plasma resonance

or SPR. Um after we get all these

measurements, like I said, the most

important step is putting that back into

the models and and retraining and then

repeat. Um of course, designing an

antibbody uh with generative AI is uh I

would I want to say easy. However,

designing a therapeutic antibbody is

very difficult. uh well there is a whole

list of different properties and the

values for these properties that uh that

needs to be in a specific ranges such

that the the project requirements are

satisfied and u uh uh a lot of time

we're basically given a molecule with

some in early stage or maybe even some

late stage uh uh development that has

some problems right it could be some uh

affinity problems uh or it could be some

developability uh uh problems so here

you could see some number of these

different uh characteristics that we're

measuring in addition to expression and

and binding. So that could be offtarget

specificity, thermal stability,

imogenicity,

uh self association, high concentration

uh uh properties like say high

concentration aggregation etc. Right? So

these are all very important to be

measured basically and included in this

uh lab in the loop. So that such that we

can actually optimize uh uh uh these

antibodies for multiple different

properties and uh uh this is where we

basically need something called multi

property optimization. Um so how can

this sort of uh lead to better molecules

right? So first of all it can reduce

time to clinic uh because we can

optimize these things in a very early

stages of the uh portfolio project. we

can increase clinical success rate right

for example if our generative models are

designing something that could be uh

indicating for let's say high

imunogenity risk or ADA uh uh risks

right we can immediately eliminate those

right in the uh uh in this optimization

steps right also uh with some of the

generality models the noble models we

can also target uh previously

undruggable diseases basically uh so um

now focusing so we covered the sampling

part, we cover maybe the the inference

part, uh the lab part as well. Now the

most important part is also the active

learning part and so how are designs

basically ranked um and uh um um so

first we uh have some predictions from

these oracles and we use these

predictions to rank the designs uh to

maximize basically the expected

improvement compared to the seed

antibody also to ensure that the

improvement of these models is that the

models are basically improving in the

next round. the data that we get from

the from the lab is of course then

reused and retrained and then we hope to

achieve that that that that goal. Uh so

in many cases actually we're given an

antibbody where you know there is for

example a a problem right it could be

the affinity or it could be some

developability uh uh uh issue. Uh the

important part here is basically that we

optimize that antibbody for one property

that is problematic here while

maintaining other properties the way

they are. For example, if you have a

high affinity antibbody but it has some

developability issues. The whole idea is

that we use the lab in the loop or

multibjective optimization in a way that

will introduce some mutations and

changes to the antibbody such that you

will improve that problematic property.

But you will never do that at the cost

of let's say losing affinity. Right? Now

as you know maybe uh the drug discovery

is not really done like that right it's

actually done very sequentially right

where you start from a number of designs

and then you're checking from one

property uh uh after another right and

this problem introduces this uh uh

issues later on that needs to be

engineered uh out somehow because uh you

will introduce some problems into the

antibbody for example the antibody that

initially didn't have high concentration

issues uh or uh can can have it in some

stage of the of the process if you

really don't take into account all of

these uh uh objectives and all these

properties at the same time from the

beginning. Uh so that's basically the

way how we like to say that we like to

stay at the Pareto frontier right uh

where we need to ensure that basically

uh we are not making changes uh in a way

that we are destroying other properties

while improving other uh some other

properties right and so for that

purposes we use a blackbox optimization

vision optimization with multi-objective

uh function um and uh our acquisition uh

uh function that I'll explain how we

define is the one that we are basically

here optimi optimizing with this

blackbox optimization such that we stay

at the partoal frontier of this property

space. So we want to avoid expensive

experimental assays for designs that are

strictly dominated. So basically it's

it's good or better in all respects that

the candidate design right so that's

basically what it means to stay at the

partoal frontier. So the acquisition

function here that we use for this

purposes is called expected hypervol

improvement. Um and this is the

indicator. This is the thing we're

trying to optimize from round to round.

Um

uh and here is a very easy way to

explain this. So of course this is a

work by our uh uh uh colleagues from

Prussian Gwan and Natasha. Uh so these

different properties like affinity,

expression, developability properties

are basically uh uh uh forming a space

in which if we put our designs they can

form something called hypervol.

Um so basically this this this this

hypervol uh is is basically occupied by

these uh different designs that we are

getting from uh our generative models.

And then we have these values of these

properties from our pseudo oracles. Uh

so we need to uh since we need to look

at all the objectives at the same time

right uh here this just uh this figure

here shows just two properties right it

could be expression and affinity for

example and the blue area here is the

one that is formed by added designs. So

the two orange one, two orange dots and

the one blue one that are forming

basically that is hypervol in this case

it's a it's a plane uh uh that uh we're

trying to maximize basically uh with uh

when we are running basically lab in the

loop process. Um so the way how this is

sort of done is basically for example if

in cases where we are running round two

of lab in the loop uh we might have

let's say three designs uh that's shown

here on the uh left hand side uh so we

have let's say two orange designs and

one blue one. So how do we decide which

one we want to measure? Uh so uh these

three designs are forming the

hypervolium that is shown here in that

blue as a blue area. But we know that uh

our p uh uh our previous uh round

basically is forming the hypervol

uh that is shown here in orange. And so

by substracting basically these two

hypervols, we can see where this is the

this this hypervol is improving or

basically which designs specifically are

contributing to improving or increasing

this hypervol. And you could see that

that's only the blue uh dot here and

that's the one that will be basically

selected for ranking uh and for uh uh

further experimentation.

in a very specific case of let's say

just looking at expression or uh uh

expression yield and let's say affinity

or the KD values that we're getting from

SPR measurements uh uh right you could

see here for example the black points

are the ones that are kind of improving

maybe the hypervol in the second round

of of this iteration so this whole ben

uh this whole process the mathematics

behind it the optimization is sort of

kind of described in these two papers

and it's a great work by uh G1 and and

and Natasha. Uh so where do we believe

that this multi-objective optimization

uh can bring more value uh the most

value to the portfolio project? Like I

said, the typical way to run drug

discovery process is mostly sequential.

Uh and maybe I don't know couple of

years or down the road or a year, one

year down the road after you uh uh I

don't know spend x amount of uh uh

dollars, right? You realize basically

that you don't have your lead molecule.

And that's because I don't know you

started from um let's say a sub a pool

of antibodies from the that you

discovered from like let's say

immunization campaign and then uh you

selected some uh then you let's say did

humanization

uh then there was a drop in affinity

then you tried to improve affinity but

then you introduce other mutations that

are causing some other problems and then

you're testing these other problems

later on and then eventually you might

realize that you actually picked

completely wrong clones from the

beginning because they were not uh

falling into the uh uh the the set of

designable basically antibodies. So how

machine learning can help in this

process? Well, the first step is that

you we always want to run lab in the

loop as early as possible in a drug

discovery process and we could use it

for example because by the time we come

to lead optimization and when there are

these other problems we might be very

much limited by how much we can actually

fix the antibbody right once the all

these problems are already introduced

right so the whole idea is that we don't

introduce these problems by just running

AI and lab in the loop as early as as in

the discovery process as we can And the

typical ways of how we do this is by

let's say diversifying

uh the uh the the the heat expansion

step with our generative AI models or by

doing constraint multi-objective

optimization and as early as possible

right uh by checking all these

properties or running all these filters

or any kind of biochemical uh features

that could later on be an indicators of

problems for some other biological uh or

some basically developability properties

for um antibodies.

Uh

so over the so let me show you some

results now basically so over the years

we've been running this lab in the loop

um since we joined basically uh uh

German

um um we've been running this on

basically multiple different targets. So

here you could see some of the targets

that we were uh uh testing this model

on. Right now, love in the loop is used

is almost every active portfolio

project. Uh uh and instead of just maybe

optimizing one or two properties, it's

been used for actually optimizing

multiple different properties at once.

uh and we what we observed is that this

framework is very successful is that

over time as you're running this thing

the uh uh over and over again basically

you are improving always the expression

or the number of antibodies that will

express and of course you're re

improving the number of antibodies uh

that will bind and of course you're

improving the antibodies that are better

than the seed and that's exactly what

I'm going to show you in the next few

slides. So for example here for these uh

three targets we've been running or

sorry for these four targets that I

showed you in the previous slide we've

been running lab in the loop uh uh like

we had like four maybe iterations of lab

in the loop uh and then we uh uh

measured uh uh how well lab in loop does

basically uh in each of these uh

iterations and basically how many times

uh uh so we introduce some metrics of

success right so the first one of course

is uh u the improvement on the affinity

and this is sort of computed by delta

pkd. So first of all we compute the k kd

which is a dissociation constant from

the kinetics uh parameters uh of the spr

curves. Uh then we transform lo

transform this value. So everything I'm

showing you here is is on a log scale

basically. And so we have a PKD value of

the seed molecule and we have a PKD

values of our designs. And so we can now

measure all the improvements uh between

the the the our designs and the seed.

And so uh uh of course we could also

measure things like how many times or

how many designs we have that have three

times better affinity, right? And so on

the left hand side uh the the the

histogram that you see is basically

measuring exactly that. So in each

iteration how many times did we design

three times better binders and you could

see that by the iteration number four

this number goes to 25 27% for example

uh if you look at the middle plot right

uh this is showing basically the from

all of the designs from all these

targets and different campaigns

basically we're measuring the difference

in PKD between the designs and the seed

and uh as a vertical line a dash

vertical line here you the things that

are three times better. Uh number two

here indicates

100 times better stuff. Remember this is

in a log scale, right? So as you can

see, not only did we manage to design a

lot of number, you know, a lot of B a

lot of antibodies that are three times

better, but we are also good at

designing antibodies that are better

than three times, right? Uh also we

measure the expression yield. And also

here it's the same uh uh histogram

showing basically uh the the difference

in expression yield measured in

milligrams right between the seed and

the design antibbody. Uh and the dashed

vertical line is showing basically the

0.01 let's say improvement in milligrams

uh between the seed and the design. Uh

as you can see there's also designs that

are worse than the seed uh that we

designed over the process. uh but we

also like were able to rank or filter

those out uh as we are selecting things

for the next uh round. Uh here another

very promising result right so on the

left hand side are actually u different

seeds from different targets

um and you could see all the designs

that are uh have improved affinity right

so only the designs with delta pkd

higher than zero and you could see that

for example for some seeds we achieving

things that are definitely uh three

times better two times better but there

are some time cases where We designed an

antibbody in just a few iterations of

lab in the loop that are 100 times

better uh uh uh in affinity. Uh also you

could see how this is sort of uh uh how

these numbers are improving over rounds

for different specific seeds. So here

you have these four targets and

different seeds uh uh so different

starting antibodies and you can actually

measure the maximum improvement in the

PKD value over rounds and you could see

basically that this is improving uh uh

uh beyond something that is three times

better which is again shown here is a

horizontal dash line. Um

uh another very important thing to check

when we're running lab in the loop is

whether we achieve the multi-objective

optimization.

Uh so like I said multi-objective

optimization will constrain our designs

to account for non-specificity

expression yield uh expression yield for

example that can impact downstream

characterization essays that require

specific amounts of protein material for

example and some incilico developability

risks right that we're also running. So

the way that we usually test this right

so we first of all we do have uh uh uh

models that can check for let's say

non-specificity these are the things

that are coming from DDLISA uh essay and

so this is kind of shown here at the

bottom left panel uh and so the estimate

uh to estimate the risk for

non-specificity we actually run this

BVLISA oracle uh uh and the score that

is above 0.1 would indicate a

non-specificity risk

So we confirmed basically that all of

our designs that that basically fall

below that predicted threshold. Uh for

some of the designs I think there was a

subset of them we were also able to

engineer out some chemical abilities

while also maintaining our our binding

affinity. Uh another way we check

whether we are basically in the range of

therapeutic antibodies right is that we

run something called therapeutic

antibbody profiler or TAP. And this is

basically just the the um the way to uh

uh sort of uh fe u basically get a lot

of parameters from our design antibodies

and compare them to the biochemical

features and parameters from uh uh

antibodies that are therapeutic

antibodies that are for example already

used uh or that went through the

successfully through the clinical stages

and then we can look at the statistics

basically how far away in the

distribution of our our designs versus

like let's say therapeutic antibodies

and here I again the plot on the top is

showing exactly that we or majority

cases for majority of our designs are in

the right range for uh therapeutic

antibodies.

uh

uh I do want to focus uh on in the next

maybe 5 minutes or 10 minutes on maybe

uh our pro one of our very promising

method and dive deeper into machine

learning maybe a little bit. So um as

you know for example if you want to do

some kind of conditional uh uh design uh

uh uh nowadays is the common p practice

is to do the uh guided design that

consists of two parts right say if

you're designing a machine learning

system it should have two parts usually

one is the generative part the the the

model that generates proposals and the

discriminative model that checks for

different properties right and can

actually filter out designs that are not

satisfying specific uh requirements and

so the most other setup is to have these

two different models. However, in most

low data settings, these models are not

doing the right job. For example, the

discriminator model uh cannot be

reliable if you don't have that many

data points. So, if you're working in a

low data regime, which is often the case

in many of the of these projects. Um so

uh the other issue is that the these uh

discriminatory model that is guiding the

generative model can basically guide the

generative model in directions where

there are no data points. So that's we

call out of distribution of of off

manifold. Uh and of course this

discriminatory model if you don't have

enough data points cannot learn very

complex uh complex distributions of

these different properties which is

often the case for some of these

developability properties. So we

introduced to to basically improve this

or basically to uh uh um um um um avoid

having some problems like that, right?

We introduced a model called Propen. Uh

basically uh that uh we realized that

having a discriminator being a

bottleneck or or uh uh is is basically

uh you know the bottleneck for this. And

we decided to find a workaround

uh and completely sidestep the this

explicit guidance. Uh and so we

successfully do that by incorporating a

tech technique called that is basically

coming from causal inference called

matching. And this lets us train a

discriminator free model. So we only

have one model here that is suitable

very suitable for low data regime. um uh

it doesn't uh propose designs that are

from out of distribution that has no

problem with basically rigid property

landscapes. It's very general approach.

So you can actually use it not only for

antibodies but you can train a similar

model for other drug modalities. Uh and

we call it property enhancer because it

can explicitly learn the to approximate

the gradient of improvement. Uh the it

consists basically on on just these

three steps. Um the first step is

matching and we split the data. Let's

say we have a data of antibodies like a

bunch of antibodies with let's say

affinity measurements. So we split the

data in low property values. So the

antibodies with low values of affinity

and high values of affinity and then for

each antibbody in the in the first set

of the low antibody data set, right? We

find the matching partner in the high

affinity data set, right? So these are

usually the antibodies with a low edit

distance or that are very very similar

uh but that have very that have

drastically higher values specific

values that could be affinity or

something or or it could be also other

properties as well and uh we call this

match data set. Uh and then we move to

the next step in which we train some

kind of an encoder decoder model in

which we map one antibbody to another.

And this explicitly basically implicitly

learns the the the gradient of

improvement. And then once you train a

model in this way, you're actually

increasing the number of data points

because you might have single you know

that your data set initially could have

some number of antibodies but when you

make the number of pairs you actually

are increasing the data set effectively

and you can train a model that can learn

this directly. uh of how to basically

improve an antibbody. Once you train a

model like this, you can use it in an

inference stage where you just start

from your seed molecule and then you

plug it into the model and then it's

basically you can use it for sampling

antibodies that are supposedly have the

higher affinity, right? And then you can

repeat this step by recycling.

Basically, you can put the generated

antibodies back into the encoder and

then generate another batch of

antibodies, etc., etc. And one very

interesting step here is that basically

if you look at the table uh uh in many

cases this uh method is generating

really high binding rates and expression

rates. Uh for example in three rounds of

u lab in the loop we were able to uh uh

basically get to something that is like

even 40 times better binders than the

ones we started from the seed molecules

basically. So this is a very interesting

and promising method. We also have a

publication about it. So you can read

more about the technical details, the

math behind it. Uh and also the the the

details of how we train this model and

how we use it in a in a on some of the

projects. Uh maybe last thing I want to

mention is the novo. This is something

we're also working on. Uh I just want to

uh say how this would be very

transformative for basically u uh

antibody design. So right now the

procedure I described

with lab in the loop um that uh uh are

used in lead optimization are basically

that we from a repertoire set right

after we do immunization we train some

models and then we use it for basically

optimization and getting better binders.

Um so we use it for binder discovery and

optimization. basically these two

challenges that I mentioned in the

beginning. The the Novo uh long-term

goal of course is that we do want to

start the let's say a portfolio project

with just a target. We don't need to do

the immunization or any library design

and we just run our denovo model with

let's say a painted epitope for which we

can design an antibbody from scratch and

this could drastically increase the the

speed in uh of generating antibodies and

optimizing them because we can then in

that step also include all these

multi-objective optimization and so the

way we think that the novel will enable

antibody design is that yes it will be

used for binary optimization

diversification the same way we use lab

in the loop but We can start from an

antibbody scratch antibbody basically

that is is from scratch. We can design

an antibody without any of the discovery

campaigns. Uh uh it could be used for

binded identification. So we'll be able

to generate antibodies let's say that

could diversify for different epitopes.

It could be very diverse right and

that's very important for hypothesis

testing and for other things as well. We

could easily design different parts of

an antibbody and then assemble them into

complex formats and we can also run

denovo for that as well.

Um and yeah, we are hoping that this

will uh uh uh drastically increase the

time from a target to the lead uh

molecule. Uh and then there's a lot of

ways and we can and we can do this uh uh

structure activity hypothesis because we

can generate multiple designs. We can

maybe even simulate some dynamics with

it. Uh and uh uh all of that can

basically come from from training these

models. uh maybe augmented with some of

the MD designs as well uh uh uh uh that

can basically improve the trainings or

increase the training set or improve

these models. Uh I'm going to stop here

and just acknowledge the support from

actually multiple different teams from

uh Genentech actually uh these are all

the people I mean predominantly the

collaboration between antibbody

engineering and preession is the one

that was driving all of this uh but of

course other other uh departments are

significantly contributing to building

different parts of our lab in the loop

system.

Thank you very much for listening and

I'm happy to take any questions.

[Music]

High throughput screening is fundamental

to biomedical research, particularly for

the development of proteinbased

therapeutics.

By iteratively screening protein

variants, researchers explore a

protein's vast sequence space in search

of a novel variant with desired

optimized properties. However, finding

the right protein variant is difficult

even with the latest AI tools. Largecale

screens and training data sets for

machine learning require pools of DNA

encoding thousands of defined protein

variants to effectively sample the

protein sequence space. There are many

ways that researchers can generate their

screening libraries, but current

technologies leave much of the sequence

space out of reach and underexplored.

If your protein is longer than 300

nucleotides, your options for building

screens are either very expensive, very

limited, or very biased. As a result,

the sequence space of most known protein

sequences cannot be effectively studied

in high throughput. To advance

therapeutic protein development and

AIdriven screening, researchers need

tools to economically synthesize longer

sequences of scale. At Twist, we

constantly ask, "How can we give more

scientists access to longer synthetic

DNA on a large scale?" Our team used our

proprietary siliconbased platform,

optimized chemistry and automation to

become the only company able to

synthesize precisely designed pulled

sequences that push the boundary beyond

300 nucleotides.

This innovative new product called

Multiplex Gene Fragments consists of

uniform pools of doublestranded DNA that

are designed to your sequence

specifications.

In just one week, you can receive pulled

liophalized gene fragments, up to 500

base pairs ready for use in your high

throughput screen. With access to

significantly more sequence space, you

could precisely synthesize a library to

generate training data sets to improve

your machine learning model or screen

single domain heavy chain antibbody

variable regions and make discoveries

that will lead to the next blockbuster

protein therapeutic. So what will you do

with significantly more sequence space?

Get started with Multiplex Gene

Fragments today.

[Music]

Thank you everyone for your attention

and thank you Vlad for that incredible

presentation. We're now going to move on

to the Q&A session. Feel free to submit

your questions using the box on the

right hand side of your screen and just

click send. Uh we've got uh just under

10 minutes, maybe five to 10 minutes, so

we'll do our best to address as many

questions as we can. Uh we've already

had quite a few questions. So if it's

all right with you Vlad, I think we're

just going to dive straight into

answering them.

So the first one we've had are what were

some of the key challenges in

integrating wet lab data with the

machine learning models during iterative

optimization.

Um yes see so I would say there was a a

challenge before that uh that was

basically first gathering all the

historic data so that we can have some

initial set of models uh uh and then

basically uh uh using these models for

the first basically version of the lab

in the loop uh system. Um

uh the the challenges were basically the

speed right we wanted really to um run

this really fast get the uh validation

of our models. So in the first I would

the the first basically year of building

this system the challenges were mainly

about um uh like how do we quickly test

some hypothesis? How do we quickly do

these things and so there was a lot of

um improvements uh um on the to

streamline the whole process. There was

also done like the experimental part uh

speeding up that a little bit of like

how do we submit these things building

the whole platform that gets to

antibbody engineering for further

testings and then it gets submitted for

let's say twist and then back to the lab

and then measuring SPR uh that process

first was a bit longer and then we

streamlined the whole process and we got

like down to four weeks uh which is

amazing I and then you can actually run

this really fast right now. Uh so I

guess there was a few different uh

difficulties that I guess were not

solved by me but my job was more like to

uh

basically filter out some of the good

generative models that we could use here

get the initial data sets that we can

run that we can train these models on.

uh and yeah the linear DNA fragments

that were that was done by um

uh some of the members of the antibbody

engineering in in genetic significantly

for example improved the time uh for for

getting the results uh back and doing

all these things but um

uh yeah so so the challenges were always

the speed and it still is today but I

think a lot of stuff are automated since

the since that first time we started

doing this That makes a lot of sense.

Again, a lot of work there. A lot of

work went obviously. Um, so we've had a

similar question um in terms of it's on

challenges. So, someone's asked, "What

are the major challenges when applying

AI models to predict bacterial

resistance mechanisms?"

>> Uh, I don't I'm I specifically I don't

work on that like bacterial resistance

mechanisms, right? Uh so we test mostly

stuff for antibodies and these are

completely different set of functions

and properties we're sort of interested

in.

>> Um and so I cannot really answer that

question. Uh but there are different

teams in genetic that are working on

this problem. I know that for sure. So

not unfortunately that is not something

I'm focusing on.

>> That's fine. It actually goes into

potentially another question of what

other projects do you plan um or do you

think that could use the lab in the loop

system?

Oh, there are many projects I mean in

the company uh that are using labin

look. It could be that there is a

project that I'm maybe not that much of

aware for bacterial resistance as well

but I'm pretty sure you could you could

use it for that as well. Um uh other

projects yes in in in my group in in my

team we use it for small molecules for

large molecules or antibodies and for of

course uh peptides and there is some

projects on RNA design as well. So it's

pretty much drugg modality

uh agnostic right. So you set up

generative models, you set up uh um the

oracles or oracles. Uh you could have

some active learning approach depending

on you know your your just parameters

will change your your parameter space

would would change depending on you know

the field or whatever you're you're

using. But basically I know for sure

that this whole concept is used in many

many different problems in the company.

We are mostly using it for drug

discovery uh right now and that's small

molecule large molecules.

Fantastic. We've had two more questions

that are kind of linked here. So the

first one is um what is the success rate

of AI derived antibodies from design

through to clinical translation?

>> Um I think we are still in the process

of evaluating that. What what I showed

you here is that

um

you could improve an antibbody uh you

could save a a project a portfolio

project right that's been stuck uh in

some stage where um you want to let's

say engineer out some part of the

antibbody that is related to some uh

developability issues. Right? So you

could use this approach for that that uh

things like for these problems basically

specifically if you want to

engineer out some things that are

causing poor clearance or or or high

concentration issues or stability issues

or affinity right so you could use this

approach for sure to improve these

things uh I think we'll take uh uh so we

do have some number of antibodies that

reach the pre I wouldn't say they

reached the pre-clinical stage but I

think it's too early I mean the answer

for this question is like it's too early

maybe to evaluate this uh uh but we have

a number of projects that are basically

in the stage where

uh you know we're doing good with

functional assays that are really well

optimized that we use on animals uh and

that still are uh doing uh well until we

get the real feedback on for example

some immunization uh imunogenicity sorry

issues or some uh kind of ADA risks it

will take some more time to see how this

will be done in the pre-clinical stages

but I would say the antibodies we

developed are still kind of in the stage

where we can measure the success only in

the in this kind of a pre uh up until

pre-clinical stage basically

um so it will take some more time to

kind to kind of see that but we have a

lot of ongoing work like ongoing things

where we'll we'll we'll get that

feedback soon on on how well the lab in

the loop did in this case to um to

improve the antibbody all the way to the

clinical development.

>> Well, one of the questions we've had

actually is kind of a two-part and

you've answered it a little bit there.

So, someone says how many real world

working drugs have been designed by lab

in the loop and but obviously you've

said it's still very early days and so

the second you said is what makes it so

precise this system? What is it that

makes the lab in the loop so precise?

not sure what what uh it's precise in

designing uh or improving antibodies.

Uh I I I like I said the metrics we used

here

are uh when we look at um

have we actually improved that antibbody

uh uh mostly in the early days we

focused only on expression and affinity

and we wanted to see what are the limits

of of uh of this how much we can

actually with generative models and this

oracles and active learning we can

actually end a few iterations of lab we

can actually improve this uh these

properties. Uh uh of course other things

are uh uh also extremely important and

this is something we're constantly

testing for but with but like low

throughput stuff um and so I am not sure

how how still to measure fully uh

success here. um uh uh what makes it

precise?

There's of course explanations. I mean

maybe the question is why this is

working or let's put it like that. Well,

there are many reasons for that, right?

It's the uh uh first of all in each

iteration here like in most cases you're

dealing with a low data regime, right?

And uh around that seed molecule that we

we want to improve, you don't have that

many data points, right? And what lobby

loop loop helps you it's not of course

you don't expect to immediately design

the best antibbody that's why we run

this iteratively right uh but what's

happening in this process is that you

are collecting more data points around a

very valuable seed molecule that you're

trying to improve in these iterations

because you are measuring

and it's important part is that these

generative models and discriminatory

models are helping you push this in the

right direction in which you are making

your models better and you are going in

the direction to get uh an improved

antibbody, right? Because that's how

these things are designed to like with

the guidance in the generative models is

exactly designed to go in the direction

that you want, right? While you're also

increasing data points and increasing

the chances for the next round. So

that's why I believe this is often

leading to success or to the examples

that I showed you here are the ones

where we really saw the improvement in

antibodies. improvement in affinity,

improvement in expression and so uh uh

um yeah so so these are some of the I

guess the reasons

>> yeah we got time for one more question

so we'll just um a bit more bigger

picture um so just to finish on how

generalizable is the lab in the loop

approach to other types of biologics or

therapeutic targets beyond just

antibodies

>> um

I would say

that's a that's A very good question and

like I said we do run separate lab in

the loops for different modalities. I

don't think you can generalize it for

other modalities but some models that

are building blocks of this lab middle

loop can generalize to other

uh u other drug modalities right so you

could have a generative model or that is

a denovo model that is trained on

everything basic it could be both large

molecules small molecules etc and then

as it sees more data it can become

better and better right and you can run

it for same model for both small

molecule and large molecules for

example. Uh but the very interesting

issue here I mean very I mean the

problematic thing actually but it's also

interesting in a sense that it would be

nice if we could uh uh if it could be

improved is that some models uh and that

could be let's say property predictors

uh or oracles uh don't even generalize

between different

um projects even within the antibbody

space right and and that's I would say a

bigger issue here because if I'm running

this for antibodies right? Uh I need to

train some models specifically for each

project, right? And so we do have many

models are actually generalizable and

you can use them for predicting uh

different things, right? Uh let's say

affinity and so on. Uh but some projects

are very different. Uh and for example,

functional assays depend really on the

project or a disease program on a lot of

stuff and they cannot really you cannot

use the same thing

um that you use for one project. it

doesn't generalize well on another

project. Um so yeah so there are some

issues on generalizability within

antibbody space but between different

projects and but also there are some

good stuff that are working uh well and

they can generalize let's say affinity

prediction for example

um other other things are um um yeah for

other things you might need sort of a

different lab in the loop system so that

you can actually start optimizing these

things.

>> Fantastic. That seems like a perfect

opportunity to wrap up. That is all the

time we've got uh today for questions.

Um just remember that any questions that

we didn't get to will be answered

offline as soon as possible. You can

also continue to ask questions even if

you're watching this webinar on demand.

And don't forget to download your

certificate of attendance which is

available from the handouts tab on the

right hand side of the platform. Once

again, a big thank you to everyone for

listening and Vlad again a big thank you

for your time today. It's been really

interesting talking to you. Thank you

and goodbye.

Thank you for inviting me.

Loading...

Loading video analysis...