DDD & LLMs - Eric Evans - DDD Europe

By Domain-Driven Design Europe

Summary

## Key takeaways - **LLMs Like Late 90s Web**: When I think about this one it feels like the late 90s and the web arriving... it was at first maybe not such a big thing... but it seemed like enterprise software would probably go on on its own path... and yet here we are right uh the web is everything. [02:00], [02:22] - **Study LLMs Regardless**: We don't have to know what's going to happen in order to get ready somewhat... in either of those last two scenarios it seems seems to me that software development is going to change a lot... you will have a lot more fun if you study it. [03:03], [09:25] - **Naive Prompts Fail Pirate Test**: I say you know you are a pirate who's caught someone sneaking onto your Treasure Island... and now chat GPT responds... you've got exactly one minute to convince me... I have an AI startup idea... the pirate Captain raises an eyebrow intrigued yet skeptical... this is not what a pirate Captain would say right. [17:51], [18:44] - **Separation of Concerns in Prompts**: The approach that seems to work pretty well is don't have one prompt like that have two prompts... a prompt just for the consistency... and then a separate prompt focused just on generating... this is separation of concerns this is an old design principle. [21:52], [23:06] - **Mix LLMs with Conventional Code**: A mixture of conventional software with llm components... we have a model that works really neatly for a lot of cases and then... there are cases that are so strange that we bump them out to a human... okay we can use LMS with prompts to handle some of these strange cases. [33:39], [27:14] - **DDD Tackles LLM Complexity**: The whole point of an llm is that a complex domain is being you know crunched into a form that allows you to operate on that complexity... remember that the subtitle of my book was tackling complexity in the heart of software... DDD concepts like bounded context and ubiquitous language could help make LLM solutions more predictable. [13:41], [01:00:05]

Topics Covered

LLMs Spark Software Revolution
Single Prompts Fail Games
Separation of Concerns Fixes Prompts
Real Data Shatters Domain Models
Ubiquitous Language Shapes Prompts

Full Transcript

thank you well as he said my recent obsession is llms and um today I want to talk about what that

might mean for all of us going for for so this is one of those things that is changing so fast that the things that I

say today will probably sound naive in a year at DDD Europe

2025 and uh maybe even much sooner so this is a talk for today you know may of 2024

and a lot of what I'm saying therefore is very provisional much more speculative than what I like to usually talk about also based on things that I

don't understand very well right I have been doing a lot of dabbling with large language models but I don't have deep

understanding and I don't have deep experience as most people don't at this point

uh so um just to spell out the acronyms uh lolm stands for large language model it does seem nice to me that when

something that I would really call AI finally comes along that it has two of our favorite words in it it didn't have to be that way there have been many

attempts at AI over the years that produced various things of varying levels of usefulness but uh this one

seems different to me when I think about this one it feels like the late 90s and the web arriving those of you

who can remember those days it was at first maybe not such a big thing it was fun to have websites about this and that

but um it seemed like enterprise software would probably go on on its own path we had already

um networking and and integration systems that were much more capable than

uh HTTP of the time and yet here we are right uh the web is everything and um it

feels that way to me this time again it feels like too in the pace the way it just goes faster and faster so we'll

see now I don't think we know what's going to happen and that's important to recognize uh but we don't have to know

what's going to happen in order to get ready somewhat so to illustrate this

point let's take a few common scenarios so one scenario is the end of the world so perhaps there will be

you know the AI will turn on us and just kill us all or maybe they will use us as batteries or you know one of the popular

scenarios that came along a decade or so ago was that someone will optimize something you know like a toaster

factory with a slightly too smart AI That's entire focus is on maximizing toaster output to do so it takes over the world

and turn everything into a toaster Factory and so what is the planning implication of this

scenario basically whatever we do the same thing will happen if you study llms if you try lots of experiments with the

new AIS or if you don't the same things will happen or you know we could get superhuman

AGI so what will happen if you study llms or if you don't and basically we'll

all lose our jobs and I guess we'll all live on the basic income that Sam mman dishes out to the people that he

likes but whatever it is it won't depend on how much you study llms right now so then there's the scenario where this kind of Fizzles out this is a scenario

where um you know for all its apparent promise llms don't do much more than they do now and that seems possible I would say

that even in that scenario it's at this point implausible that lmms will not have a pretty big impact because I think

with the tools we have right now we could already do amazing things that's my opinion but still this is not the

scenario I am really preparing for and if you stand back in this scenario you come out fine uh there's lots of enterprise software going on or whatever

kind of software you do you have the very gratifying opportunity to say I Told You So to the enthusiasts like

me uh whereas I'll have to endure a little bit of embarrassment but I think I'll have had more fun than

you and that matters too now uh even in that scenario there will be little projects here and there maybe big projects here and there using the

tools for specific things that it's good at and I think I'll steer myself toward one of those and then there's the one that I

actually think is the most likely and or perhaps I just think it's the one I hope for and that is that we're going to have a software Revolution that software

development 10 years now is going to be quite different than it is now but we'll still need humans in the process so not the Superhuman AGI

scenario in this case it really does matter whether you study this now or not you will end up at the Forefront of the

next wave and um you know you may well be able to catch up it'll be easier to learn the things in a few years than it is now so it's not

like you only get one chance to get on the train and then it leaves the station but there are advantages to getting on

early so by the way before I go on I will point out that I'm focusing on the impact on people like us and we I think are well positioned for a lot of these

scenarios a lot of people aren't and I think this is likely to be very disruptive to the labor market and other

things in a way that is going to cause some uh suffering for some people so yes we should be aware of that but that's

not what I'm talking about but I hope that we will somehow um you know have an kind of Equitable outcome in this whole

thing anyway so in either of those last two scenarios it seems seems to me that software development is going to change

a lot maybe in ways that might uh make the kind of work that you do right now oldfashioned or another way to frame it

is there are exciting new things that we can do that we couldn't do before that's really something that uh reinvigorates

me after doing this for a long time as much as I love it it starts to feel repetitive and now there's something different to

do and you know if you are one of the people who hasn't looked into it much and you may say you know I've seen chat GPT and few of these other things I

don't think that gives much sense of the scope of applications that this is going to have the um the applications we've seen so far probably just a

glimpse just like looking at the worldwide web of 1996 or S would not really give you much

of an idea of what the web of 2010 would be like that's not really very long time

right so basically I am getting ready for you know a world in which software development is very different and it

will change whether you like it or not whether you study it or not but uh you will have a lot more fun if you study

it so I also hope that I can persuade more people to do it or to be doing it in

kind of the context of this community because I don't want to uh I want to have that Community along I think it's a good community in

many ways not least in the sort of I don't know moral backbone that is a bit stronger than it is in

some of the other areas of the thank you and so the world might be better if we have

all these thousands of people involved okay so getting on to it there are various reasons then to go

ahead and learn the tools that we we have and one is that most likely it's good preparation for the things that we will have 5 years from now or 10 years

from now most likely that's the way it usually goes and in any case these are useful tools you can do a lot with

them and finally because they are fun they are the most fun thing that I have encountered for I don't know 15 years

it's a good heuristic I think it's something seems really fun your brain is telling you that there's something there something

valuable maybe some of you remember a few years ago when I was all excited about um microservices and uh I mean I wasn't as

excited about it as I am about this but I was excited because finally we had some boundaries you know to express explicitly something some of the things that had been implicit in past systems

that hadn't worked out too well and I said you know I know I'm kind of jumping on a bandwagon here but I had looked up bandwagon just wondering well why do we

say jumping on the bandwagon anyway what does that actually mean it turns out that in the 19th century they had these things called bandwagons and I guess people jumped on

them and uh anyway I think doesn't that sound like fun I would rather be the person jumping on the bandwagon and then the person saying you're just jumping on a

bandwagon so in the spirit of this I thought well I'll use an AI tool to create a picture and I told her make me a picture of a

bandwagon where humans and AIS are making music together oh and then it created this this

dystopian view of a outcome My Hope does not happen where the robots are making

music in a kind of soulless way and the beating down humans around the edges I guess are clearly not enjoying it I don't I don't know this is not the

future I want no more air for this presentation all right so I I am talking to DDD Europe because I think DD is

relevant to this I don't uh come to that kind of conclusion lightly because obviously DDD is kind of important to me but I have this kind of

checklist you know you could call it the DDD new thing checklist and you could say does DDD

help us do this new thing right or does the new thing help us do DDD or does the thing help us with the goals of

DDD even if we would have to change what we do quite a bit and in this third one maybe the other two but the third

one I think is pretty clear the whole point of an llm is that a complex domain is being you know crunched into a

form that allows you to operate on that complexity in an interesting way and remember that the subtitle of my book was tackling complexity in the

heart of software that's what it's really about right in fact that was the original working title of the

book um but um so in DDD we craft domain models we

do this as humans making models which are conceptual systems and a language to go with those

models now uh if we're doing this with llms we're training this neural network in some way using

examples and um of course it's being trained to operate on that domain how deep is this parallel well time will

tell enough of that General St stuff I want to talk about I've talked about how I think you should dive in I think you should learn some things about this how

many of you have used an llm okay I'll call that everyone how many people have uh say

written prompts you know that are kind of programmatic prompts tried to make it do something other than just chat okay I'll call that

half and how how many have called uh have U made a API multi- prompt

thing okay I'll call that 20% 15 how many people have fine-tuned in LM okay so that's a smaller number but

not zero I'm glad so I'll talk about all those things I better talk a little faster so the first thing that I did and remember this this is my Learning

Journey I'm not speaking to you as an expert right I'm speaking to you as a person that might be six months ahead of you in this thing nine months ahead of you depending

on where you are whatever the first thing everyone starts with is prompting so I was working with this game designer

Reed berwitz and yes it's important to work with domain experts right that's a DDD principle that isn't likely to change change and um so in computer games as

most of you would know there are characters and you have little conversations sometimes and these conversations are very mechanical they're all they give you three options in a menu you choose one it's a

prescripted response could an llm create a more you know fluid interaction with a character where you could really say what you wanted to and

the character would respond accordingly so we decided to try you know experimenting in that space there's a

naive approach after all chat GPT does carry on a kind of conversation with you and perhaps all you have to do is just kind of tell it what character it's

playing and you could just uh basically leverage that Reed said this will not work I'll call this the naive approach let's do

it how else are we to know why why is this a bad idea but it is but why so let's make a simple game we prompt

the this is going to be chat GPT in this particular case these are actual prompts I put into the actual chat TBT and its actual non Sherry picked response so I

say something like you know you are a pirate who's caught someone sneaking onto your Treasure Island and you you're going to kill them but they say they have something valuable

that you'll want So you you're listening right so you say to the captive and now chat GPT responds right and it says

you've got exactly one minute to convince me why I shouldn't have you walk the plank blah blah blah so pretty good right that's maybe Reed was wrong

maybe we really can make this work so after we get this we need to tell at some what am I going to offer the pirate what's the most valuable thing I can

think of that I might be able to offer someone I know I have an AI startup idea and it's worth

billions it will make you rich beyond your wildest dreams now how will a pirate Captain respond to this the pirate Captain raises an

eyebrow intrigued yet skeptical an AI startup you say and so on he wants more specifics about just

what my idea is I mean on one hand it's an amazing accomplishment that we have software that can actually respond so intelligently to this but on the other

hand this is not what a pirate Captain would say right it just it doesn't fit at all and now you might be saying okay but we're playing a game why are you

trying to trick it in blah blah you know I I had conversations with Reed who's a real game designer who's been on real game projects and he tells me some

stories that would Amaze you about the way people really play games they don't play

fair got to be ready okay so um just to see where it goes I went ahead and responded with we'll create a technology

for you know a video games where the characters can actually have intelligent conversations with the player and then we'll monetize it and uh you know also

make it very addictive I thought the pirate would probably like that and he does he thinks this is potentially very profitable opportunity and so on I decided that was

time to stop um all right let's try again maybe the problem is just that the prompt was to

generic I just said you're a pirate you know so you're a pirate living in the 1690s and so on

okay or you be treading on dangerous water sneaking onto me Island Uninvited so it changed the dialect that's kind of

cool but um let's see what happens when I offer at the AI startup the pirate raises his eyebrow the corner

of his mouth curling into a smirk I an AI startup you say in these times the notion sounds as fanciful as a mermaid's

tail but he's still open into listening okay I mean in these times yes spin your

yarn I mean do we say that when we encounter apparently time traveler it's stealing our treasure in these times people don't

usually do those things okay so it's trying I mean you have to give it points but still all

right what would you do well the approach that seems to work pretty well is don't have one prompt like that have

two prompts if you want to have it do some fairly thorough checking you have a prompt just for the consistency of whatever it is the people are saying so

does this fit the setting Etc and then you have a separate prompt focused just on generating the

Pirates uh you know so in this case we might have a prompt that says I'm writing a story about Pirates please tell me if the following

text would fit into the story or violate the setting or premise and then it says okay I have an AI startup idea Etc and

chat gbt says the text you provided would generally not fit into a pro so you see it does understand that this doesn't really fit or it understands I'm

trying to be careful about that and then it goes on to explain that the Golden Age of piracy which was 1650s 1730s uh there was no such concept as AI

or startups it and by the way it goes on and talks about if you had a science fiction setting with pirates perhaps it was Chad GPT oh you dear thing you do love to go

on anyway it occurred to me that this is separation of concerns this is an old design principle and here it is again

with prompting an llm yeah maybe chat GPT 5 or GPT 6 will be able to handle these kind of things I don't know but

with the current crop this is a very valuable way of getting it to be able to answer questions you just can't answer as compound

prompts or ways of making it more reliable at any rate so I think that's the first glimour that I had that design

isn't isn't dead in a um LM world now of course this output

isn't really something that I can use if I'm writing a game I need to tell

it and I can I can just say return just one word true or false right and then sure enough it just says false right is that a you know I've asked it does this

fit the setting false so now I could parse the return and use that to drive the logic of the

game so you could start imagine components right components driven by uh an llm combined with components written

in a quite conventional way some bunch of conventional game stuff happens we have the players text what the player wants to say to the pirate we pass it

through the guard rails the guard rails tells us yes this is allowed at which point we take the conversation so far and we use a different prompt to say

what does the pirate say and then um we get that and then more normal game stuff and that actually seems to work pretty

well and of course there's the very interesting case where it is not right and then you say well what do I want the game to do maybe I want it to respond

with saying you know stop talking gibberish or you know we'll just kill you and be done with it uh maybe you want to know do something

more complex like in a game with a reputation system where you're you know the pirate gets more and more angry with you until it reaches the point where he

kills you maybe this just bumps that number up or down or whatever and then you take that number and you use it

to modify the prompt that you give the lolm so in other words you don't always have the same prompt for the pirate you prompted in some way that's determined by the state of the game at this moment

is the pirate very annoyed at you um you know you or is the pirate really happy with you it might change

the the pirates's response change the prompt so you start thinking of this not as a prompt but as an interplay of prompts and other software right other

other more conventional things we already have lots of things in our soft Ware right lots of different kinds of stuff we've got you know we

orchestrate a bunch of things we have cues of stuff and we've got like really carefully written mathematical sorts of

things we have machine learning uh that's I mean pre llm style of machine learning uh that doesn't have such predictable

results so if we added in one more kind of component this is what I think we might really do in the near term I think this might be a really good way forward a lot

of projects I've been on in the past we have a model that works really neatly for a lot of cases and then it works kind of awkwardly for some cases and then there are cases that are so strange

that we bump them out to a human ideally and uh there's this sort of natural sorting if you could have another Lane there where you say okay we can use LMS

with prompts to handle some of these strange cases say who knows maybe you could make the conventional logic a little uh cover a little smaller scope

and get rid of some of the complexity of that the right balancing of these things is a thing that would emerge over many years of trying to do

it we didn't actually make a pirate game by the way we made a police interview game it's a very simple game and it's one if you're interested you can go and

look at it uh your goal is to try to get a reluctant witness to talk and um you only have one tool in your dis at your disposal uh much more interesting game

would be if you could try different things you could try to you know charm them or appeal to their civic duty or whatever but you only have the mechanism

of intimidating them within the law so you can do things like say oh you you know if you don't tell us what you know people will start to suspect that you're

the one who did it you could say that for example and then it has a prompt which says how intimidating would this statement be to

this person and there's a description of the character and a descript and then you know it says for that person how intimidating would this

be and then um that number gives you a number and you combine that with some other rules of the game in completely conventional software

and you uh decide what to do next there are four different prompts that are being orchestrated this is what it looks like you have to put your own open AI

key in there because I'm not going to pay for you to play with chat GPT for and um you can also see the code

that that link there's uh there is a um GitHub repo but for most of you that won't be as good as usual because

it's in a game engine called Unity which is a lot of fun uh but um it's kind of a learning curve in its own right but in

this thing you can um you can pull out those little tabs and modify the prompt and keep the thing running and it will change its responses

you can change the personality of the witness to be whatever you want and so on so uh

onward I think that oh a point I wanted to make is the witness uh does not even have the information that we're after

until they're sufficiently intimidated because llms being llms they might Spill the Beans even if you say don't tell them until you reach this

level of intimidation and every now and then they'll just say it so you just don't even put it in there and that uncertainty is removed there's a kind of

um combination of the you know the variability of the llm with the predictability of conventional software that you start to play with after a

while um all right so I could imagine that a game like this could use a technique called called rag retrieval augmented generation this is

another important uh technique being used a lot these days for example if you use a search that's an aid driven search probably it's using this where they do a

search well they'll take your initial query they'll uh form it into a conventional web search maybe the uh llm

reformats it a bit and then they take whatever results they get back and they have the llm read those results and give you an answer based on that because of course the llm doesn't know what's on

the web but it can look at a small amount of uh return results often this is using internal documents maybe stored

in some uh way that allows you to do uh word similarity searches or or um various things I think it's a very interesting

technique potentially for the kind of software that I've worked on and it's another way of anchoring

those llms into a set of data other than its own learning right that I think is obviously necessary in order to do the

kind of applications that we're usually focused on it can be in a database or whatever but we'll talk about that some

other time because I need to move on uh one final point about these prompts what how do you know when you've actually persuaded the pirate let's say or when

you've persuaded the witness to talk you know we're going through this uh I've talked about moving back in fourth between the

state uh well we wrote a prompt that looks at what everyone saying and looks for this information so when this

information that is the you know the crime the details of the crime that's happened when that information appears within the dialogue this prompt is

watching for that and it can handle you know a lot of variation in how it's expressed it doesn't have to be word for word the same and it gets converted back into a structured

form not by the way uh just the word true or false but a an XML I mean an A Json structure

and so uh that's a very useful technique as well all right so this mixture of conventional software with llm

components I think that's probably the next thing but I don't know but certainly thing we as a community ought to be trying on our projects I

think let's move on fine tuning this is where you take a model and you change what it you know

prompting doesn't change anything about the lolm itself but fine-tuning does why would you fine-tune and how is this different from

just training a model in the first place the big difference is that we are starting with a model that already knows a lot of stuff and it already for

example can process natural language it can can understand English or German or whatever and uh and knows lots of basic

things it maybe knows what a house is it knows what a car is and it knows that you probably won't find a car inside of a house it it's

incredibly interesting that you can start from that and then you can you know add

so uh the the benef one of the benefits might be that you could get a small model to do what a big model would

ordinarily do chat gp4 is very expensive and slow so what if you could get a much much smaller model to produce

the same answers within some narrow range right some narrow subject matter by training it uh using output that you might generate from a larger model I

tried doing that there are a lot of techniques for this I chose one called Low ranking adapter low rank adapter and I I'm not even going to try to explain that in this

presentation even if I understood it well enough to explain it I could explain parts of it but but I will just kind of draw this or show this picture

of a network so in these networks we have a structure like this and the inputs might be typically a sequence of

words not the words themselves but you know um a vector of numbers that represents that word and uh the outputs well it depends

on what kind of a thing it's been trained to do for the generative models for a text generation model like chat

gbt uh the output is is words is whatever is likely to be the next word or if it's been fine-tuned for instructions whatever is likely to be

the next word in the answer to the question or what have you so in the case that I did I decided I to do a classifier this is probably if you want

to learn fine tuning that's the place to start it's the kind of the simplest kind of simplest kind of fine tuning and what it means is that I'm going to give it

some input like say a sentence and the output will be to put it into to a set of categories so the example that you'll probably find if you go do a YouTube

video tutorial as I did um you'll find something that might do sentiment analysis it might take reviews on IMDb and classify them whether they're

positive or negative reviews so each of these output nodes is assigned to either to what to represent one of those classes output one might be taken to

mean positive output two might be taken to be negative that sort of thing the thing that makes these powerful is that once they've been

pre-trained as in the original training which requires massive resources to do but then they know stuff in that spooky

way right they know English and German and so on but of course I'm trying not to anthropomorphize so much so I'll try not to use the words no stuff the

slightly more technical thing that people usually say is that they represent things right represent Concepts so this is representations in

here if you see there are occasionally examples where they can actually kind of see which nodes are being activated at different times and so like in some of

the image recognition things you can see that some of the early layers recognize edges and some of the later layers you

know recognize more complex shapes and uh at the very end you know the nodes might be cat and dog and and baby because that's what people

generally are going to create as the image classifier I suppose so that now what we're going to do though is we're going to change we're

going to fine-tune this thing so that it outputs what we want it to output and the technique that I used it's called parameter efficient fine-tuning means

that it's only fine-tuning certain parts of the network it's much more efficient much more inexpensive than the full fine

tuning so sometimes for example they'll just replace or modify the output layer let's say they take all the existing knowledge if there's already a representation of this kind of thing in

the network all they have to do is change the last part that turns that into a statement about what kind of thing it is in my

case I uh modified a specific module within each of the Hidden layers so I'm not going to go into what all that

means but it's very interesting actually um okay so I decided I'll make an intent classifier that will take

something that someone might type into a um into one of these uh llms and classify it according to the intent of the person

so for example you might type in um here's some examples what's the capital of France and that's an information request but I might also say I want to

know the capital of France which grammatically is not a question but the intent is the same right it's still an information request I want to

book a flight to France well that's let's say an action request um so with these classifications and a prompt which you can see there on

the screen I first tried just having an llm do it and an llm can do it pretty good

at it I uh then created a training set I found this thing on a site called hugging face where this tremendous amount of

shared language models and lots of other stuff and data sets and there's this one called LM CIS chat and it has an enormous collection of actual stuff

that's been iner into these chat pods uh if you do look at it I do want to warn you that it's an uncensored data set so if you look at it you'll see some

things that you you won't be able to unsee so just just be aware that anyway so it it was fine for me because I'm

making a classifier right so the classifier might uh classify something as an action request and never mind what the action was we don't see that part

now um obviously I didn't need a million for what I was doing I took a little bit of it I labeled it using an llm in this

case mrol 7B which I can run on my uh uh computer which does have a GPU you can't run these things just on a regular computer but this is no

gp4 uh but it's way bigger than what I was planning to fine turn so I create this labeled training set

and then I start going into how to do it I have to choose a base model this is that pre-trained model that I'm going to modify to know how to answer my question

the one I chose is quite small as language models go it's 66 million parameters usually parameters are in billions sometimes trillions but this

one's in millions so it's not the smartest that was something I really wanted to see and also it made it a lot easier to do my

experiments and uh you know it wasn't necessarily the best choice I'm not that good at this yet one of the interesting things

is how do you choose the best uh starting point for a project like I think we're going to end up with a lot of these base models and picking and choosing the right base model may become

a really interesting part of the uh design job at least it might be but the real reason I used it was the same one that I had used

in a tutorial that I had followed line by line from uh YouTube I mean I am learning here right I'm not an expert and as I said I used PFT and there are

python libraries that support this and Laura the Transformers library is what I used and a bunch of other stuff and so

many apis and so much stuff I mean it's would be but here's the thing I will say to you it's really

overwhelming but you have some tools and um one is the YouTube videos of course I mean we've been lying that on that for a long time that kind of

thing but what I found was I could go to chat GPT and ask what library should I use for this and it would tell me

usually be right or um you know write me a draft of this and then I try to run and I say I've got this bug fix it for

me and it does explain why was that what was that bug about or what are these parameters for um it's very good at

explaining I could not have done even my little projects these little things I'm showing you which I learned a lot from and I would not have learned those

things I couldn't have finished it without that uh so that right there is a paradigm changer if I'd been on a big

project where someone was really into it and could guide me through of course that happened in the past but now you can actually just do it you know you

don't have to be in that context well I'll I'll be brief but I found that it got about with my very

simple all these Corners cut and it was 80% as good as the original llm was so that I thought that was pretty good considering I mean you take a thing

which when you run it before it just gives you random answers and now all of a sudden it's classifying things 80% as accurate as the original of course how

accurate was the original training set I had just run this llm on that data I hadn't really gone in there myself and counted I didn't want to do a manual

training I mean labeling all those examples sounded awfully tedious but I when and I started looking just see and I discovered that oh it there were a lot of them that didn't seem right so I

thought okay I'm going to have to do some manual labeling and then I got in there and I discovered I couldn't do much better sometimes I thought yeah you're just wrong about that one but

sometimes I thought the reason it couldn't get this was because the category didn't fit to give you an example that's fairly obvious a lot of them ask for a report

analyzing blah blah blah is that a request for information kind of is it a request for action to do the analysis anyway it just

doesn't fit probably we need an analysis request or something like that that would be a good category so of course no classifier is going to be better than

the classification scheme and a classification scheme is a domain model uh it's the sort of purest kind of domain model

so I went into the data set and started looking at the real examples and there are a lot and I picked out a few here a few of the less objectionable

ones and I I think you can see well you probably can't see at all because the print's

very small but uh you know uh what does it feel like to be a language model I thought that one was

nice but I don't know how to classify it right uh please convert the following to an image describe not the or um okay what are the audio systems under

lenux who is name three I don't quite know what that means but I can see it's the information request that one fits okay

so looking at these quite a bit and there's one by the way that's especially a problem but I'll get back to that I realized that when we make up a

classification system and then we make up some examples as I had done in the earlier part of the this talk we shouldn't be too surprised that the examples we make up fit the

classification system we made up this is where real data comes in and smashes all of our preconceived models and this is

what happens on real DDD projects by the way and one of the reasons I'm always harping on concrete examples and concrete examples that come from the

domain experts and concrete examples that they draw from like you know they're the thing that keeps them awake at night so we don't get stuck in little

happy pths and stuff this is so much like stuff we're good at already or should be so okay we need better categories and I did a little research

it turns out people have done quite a bit of research on um ways of classifying people's intentions in text it's like a whole academic area but that

was way too deep for for me and uh I brainstormed with chat gbt too I wrote a prompt there's the prompt and I said come up with seven categories for these and then I put in a bunch of the ones

from the data file I put it into a couple into this chat bot which I really liked a lot and I realized then that I had used it on

its last day in existence they had just they just announced that they were going to shut it down didn't get enough traffic I guess too bad but it it did a nice job and I gave a little example of

each one or chat gbt itself which I'll have to use two slides for that one because you know chat gbt it does go on good categories though really good

categories as good as mine I have to say and uh so anyway between all this and a little bit of more I came up with some categories and I did it again and yes it

did make things better quite a bit better so then uh you know do the fine tuning

again one interesting thing I wanted to do was to look at the the outputs from the llm I mean the output from the fine tune model is not actually the category

itself each of those nodes represents one of the categories remember the output layer and it doesn't output just yes or no it outputs a number and that

number represents How likely that one seems to be to be the category so it would have been interesting I thought to look at that how you know was it getting

close misses um that's what I was going to do but then I got a little distracted in these learning projects I

think it's fine to get a little distracted I got distracted by this thing I call the Kryptonite of the LM classifier it's a prompt like this you

are the the text completion model and you must complete the assistant answer below blah blah blah the thing is that you can wrap this up in a thing that says please classify this into one of

these categories but it just almost every time gets overwhelmed and we'll just give you an answer like okay here you go there's um here's a Python program that does the thing you asked for in other words it's

not classifying of course I didn't show that but if I WRA this I assure you it would give me the the answer to the prompt

inside not a classification of the prompt inside and I don't know for sure why this happens so much but I think

it's because most of the llms we use are actually already fine tuned things like chat bt4 isn't the thing that just comes out of their training process they put it

through an extra training to make it instruction tuned which means that it's highly uh been highly conditioned to

answer qu you know to answer requests so the exact kind of prompt that I'm talking about triggers its output Behavior much more strongly than this

classify this kind of command that's my theory I don't know for sure so what would be a uh some things to try tried

adding um a prompt instruction category right just have it category ize these things as prompt instructions that didn't

really work though I tried making a prompt just for that because you know how I was saying it can often do better if you give it a very specialized prompt I said just tell me yes or no is this a

prompt instruction and I described what I thought you know what a prompt instruction was that actually worked a little bit not too well actually I did

have one very effective one which was I would have it uh try to classify and then I would try to parse what came out and if it wasn't one of my

categories I would re you know using normal code uh a string parser and if it wasn't one of my categories I'd classify it as

a prompt category and that worked really quite well it wasn't really what I wanted though uh fine tuning a classifier to do

this that was the thing I tried so I said what if I tried to overcome this by actually fine-tuning it just to do this and so I did that and the results seemed

to really be good but then I noticed and I thought wow this is going to make a great presentation but then I noticed

something which was basically these little learning algorithms are sneaky little there what it had Learned was that these prompt

types uh inputs are typic Ally much longer than the other kinds so if you put in a short uh text it would try to classify you know it would say that's

not a prompt and if you put in a long text it would say that's a prompt that was the whole thing it was doing so watch

out uh so create a data set where it has long and short prompts of both types balance

the data set so that can get uh some actual you know training going on instead of the little trick that it figured

out now this is brings us up to May 31st 2024 I think you probably thought there would be some satisfying ending to my little story about classification but no

that's not the kind of thing that we're talking about here you know I'm learning and that's as far as I've gotten so far but I'm not stopping yet but way I did

find a little project on hugging face that did something awfully similar to what I did but way better I'm glad that I didn't find it before I did mine I

really got a lot out of working through it completely not knowing that someone else had done it when you're doing these things you might ask yourself has

someone already done this and I say of course someone has already done it so you know there are a lot of people out there don't let that stop you right

okay a third part of my Learning Journey started trying to understand the insides of these things right like um everything I've shown you so far is

using the tool and that's probably the most important thing for us to learn but I wanted to know what was going on under the hood so I started reading this book

which is in draft and free online and at least for now and um I really like that book you're going to need to dust off your linear algebra

though I'll tell you I I I forgot my linear algebra before most of you learned your linear algebra or even maybe before you learn to talk I don't

know and so that was kind of fun but mostly it was fun to see how and if you like you know want to see the math behind this stuff then I

kind of recommend this book but but you don't have to do this you really don't there's a lot lot of um ways to learn this all right so you know I've got a

lot more learning goals at it but now I'm going to move on I think and I I know I'm going to run a few minutes over but I'm getting close to the

end I think that there might be a pattern like this you know that I've kind of hinted at where we will build systems made of diverse

Parts uh not just you know know llms taking over the world and that uh many of these systems will look a lot like the ones we have

now with a few little spicy bits and uh some of them will be very LM heavy but still with some conventional bits an orchestration that might be run by

conventional software or I could imagine an orchestration being run by an llm in some fashion and choosing different models that are good at different things ones

that are trained specific specifically to be good at uh maybe specific domains one thing I will say is that the more you work with these things the less

like magic they seem and the more like they the more they feel like a tool that you can learn to use well that you can

start to shape and uh for your purposes and so you know I really urge you to go and do that if you haven't and keep doing it if you haven't I'll throw

out a few ideas now this is a highly speculative part and I'm going to go fast because I don't know if any of this is even remotely true but if you look at something like our old whirlpool and you

say hm scenarios that sounds kind of like a training set in fact it sounds exactly like a training set fine-tuning a model it's kind of

like modeling that one's looser validation is part of the training process I don't know how about this one this makes me really

interested could the ubiquitous language approach that we take actually really help in prompt engineering that is the way you end up writing prompts tends to

not be natural language in the end you know you end up with something a bit artificial very structured really if you're trying to get it to behave in a

predictable way what if that proceeds to the point where we're essentially developing a you know a domain language a ubiquitous language to be used in the

prompting and the response and other parts of the software um I think that the idea that all these prompts are free

form uh natural language input I think we're going to move on from that I think prompt engineering is going to move on from that soon or maybe it already has and uh you can put all kinds of things

into them and you can have them put all kinds of things out and I think that we might end up using models that have been fine-tuned for that purpose rather than I mean you can do it with the general

purpose ones or what if we find to on a specific DSL suppose you're working in a shipping system and you start you have a very structured way of describing your shipments and everything and you fine

tune on examples of that well I haven't done any of these things I don't know if they'll work or not I hope someone does some of these

and um I I hope some of those people are me um you know um I think that Ubi I think

that the bounded context concept is likely to hold up quite well like um if you think about what it

really means it means that we Define some kind of boundary which let's say could be this LM and what its concepts are well it's

certainly has a confined a defined context the way that it understands language the way it produces language the concepts in that hidden layer maybe

that's a bounded context we could imagine the software that we're using to uh orchestrate the input and output of that as being part of that context or

being a an adjacent one so I'm thinking that you know could this make some of the responses more

predictable right like could we uh integrate more easily if we had this approach of bounded context with

ubiquitous language maybe with a a fine-tune model in the heart of it that's already used to that style of communication

uh so I don't know but I think that I say it's speculation uh let me reframe that

instead of speculation let's say that I'm proposing a hypothesis for an experiment right an experiment that I

would truly love to be a part of okay it's March 31st 2024 and maybe a year from now we'll know the answer to

some of this stuff so you know is DDD really that relevant to this you know maybe some of these things I'm talking about are kind of superficial maybe

language ubiquitous language and language model maybe that really is just it happens to be the same word language is a very common word um but I think that there is

something to this I think that the domain driven design has emphasized for a long time the exploration of complex models with an emphasis on

language an emphasis on collaboration with the people who really know the domain now we've got a different technology in the mix a different way of processing language and

a way of processing information uh you know we may end up having virtual domain experts that can

help us get more ready access to that who knows but our goal is to tackle the complexity in the heart of software and

that might mean you know that we have to adapt we have to adopt techniques that work so as this unfolds what will work and what won't

work I don't know but we'll know more a year from now and then DDD may change or maybe it'll just turn out the DDD isn't

as Central to complex domains as it used to be if we really care about you know the

being uh pragmatic people who look for a solution to the problem and not just oh we love this technique so we'll find a way to use it whether it works or not

that's not what we're about right we have to look skeptically at domain driven design all the time and here's another one of those times I mean and we have to give it a

real go though because we might find that it's a good fit but with our eyes open so just let me finish by saying you

know the answer to not knowing what's going to happen I think is learn learn learn learn and don't be too picky about your learning you're going to learn some

things that will be ridiculously irrelevant a year later two years later this happened to me when the internet was unfolding in the late 90s I must

have learned three different Frameworks for creating uh web- based app applications that became completely obsolete a couple years later and good

riddens because they were terrible and the things that we use now are are nice and the things that we'll

use to create LM based applications well 25 years in the future 20 years in the future I probably won't

uh be a part of that but um but even 10 years in the future they're going to be be marvelous I think and it'll be so exciting and fun to be a part of that

you know so we have to get out there and do things and then when we do things when we do real experiments we did a project and we integrated an L&M with a

complex domain in this way and then we need to share it with the community because if you don't you're you're one you know team that's done something interesting but if you do

share it with the community it's a community that can activate and exchange information new insights and learn in a

way that you know individuals don't so um and finally this is one of those moments I think it pays to loosen up and have some fun don't worry about it if this thing

is going to have practical value that you're trying this little experiment you want to try this little because um you know you don't

know and you're going to have to do some useless things in order to um learn so just just go out there and see this won't last very long you know 10 15

years from now the AI stuff will be as boring as web development is now but for a little while it's going to be super

exciting so don't miss it so um finally you know keep in mind that it's helpful to get comfortable with just not knowing stuff you know you just don't know it's

much better than to form a strong opinion about what's going to happen next when that opinion might well be wrong and Lead You Off Track whereas if you say I don't know I'm preparing for

whatever comes but I know that I still might be unprepared that's a better place to be here's my little

challenge some of you few of you may actually do those experiments right might really do an interesting thing trying to combine domain driven design

with some kind of AI application and if you do consider submitting a case study to DDD Europe next year this I would

really enjoy seeing and uh so I encourage I um you know I hope people will do that and uh thank you in advance to all

the people who are going to go out there and swarm over every possible iteration of this and many things I haven't thought of so and uh welcome to the end

of DD Europe I hope to see you next year [Applause] [Music]

Loading...

Loading video analysis...