Why You Should Build Agents on the JVM by Rod Johnson
By Devoxx
Summary
## Key takeaways - **GenAI Fails Enterprises**: Gen AI projects in enterprises tend to fail, as shown by surveys like the recent MIT one. Personal assistants like ChatGPT work with human oversight, but business automation lacks rollbacks like git, as in Air Canada's chatbot court loss. [04:26], [03:51] - **Non-Determinism Inherent Challenge**: Interacting with LLMs makes everything non-deterministic, unlike past corner cases like race conditions. Hallucinations persist, and prompt engineering is alchemy, not engineering. [05:05], [05:34] - **Avoid Green Field Fallacy**: Building agents ignoring existing Java assets and databases is doomed, as enterprises run 70% Java yet AI teams often don't know. Integrate with what's there instead of starting green field. [07:29], [10:23] - **Embabel Tackles Non-Determinism**: Embabel uses Gulp goal-oriented action planning for deterministic yet smart planning, breaking tasks into steps with smaller prompts and code over LLM calls. It emphasizes domain modeling for existing assets. [16:36], [16:07] - **JVM Beats Python for Enterprise**: Python suits prototyping but not enterprise apps; Java/Embabel examples have fewer lines of code and YAML than Crew AI or Pydantic. Next AI phase won't be in Jupyter notebooks. [15:06], [19:03] - **Domain-Integrated Context Engineering**: Make domain models central to LLMs with tools on domain objects for type-safe integration. Start agent design with domain objects; agents fall out naturally. [12:52], [19:52]
Topics Covered
- GenAI Fails Enterprises
- Non-Determinism Now Everywhere
- Avoid Central AI Silos
- Domain-Integrated Context Engineering
- Imbabel Tackles Non-Determinism
Full Transcript
[Music] Hi. Well, it's good to be here. It's
Hi. Well, it's good to be here. It's
quite a few uh years since I spoke here at DevOps and I'm just reminded what an amazing amazing show it is. So, what I want to do now is jump several levels
up. like we've heard from some really
up. like we've heard from some really really cool things um that are happening in the JVM and are also relevant to AI
that of course gives us the underpinning that means that we are able to run on Java on the JVM much of the world's business logic but I'm going to go up
several layers and I'm going to talk about agents and in fact really multi- aent systems so complex agents and talk
about why you should build those on the JVM. I probably don't need this slide
JVM. I probably don't need this slide after we've what we've already heard this morning. Um, Genai is not hype. I
this morning. Um, Genai is not hype. I
know there are a minority of developers who believe that they can just put their heads down and it will go away and eventually their managers will get sick of talking about it. It's not going to
happen. It changes the nature of how we
happen. It changes the nature of how we work. If it hasn't already changed how
work. If it hasn't already changed how you work, you really should learn more and about the tools that are available to you. Stefan obviously gave us some
to you. Stefan obviously gave us some great examples of that. So, you know, first thing, it is the elephant in the room. I really love what Stefan has done
room. I really love what Stefan has done with this conference making it about agents and AI. It is what we all need to care about most at this time.
So really the first killer use case with Gen AI was the personal assistant, right? Chat GPT, we could go and ask it
right? Chat GPT, we could go and ask it things. Then we could connect chat GPT
things. Then we could connect chat GPT or claude or other LLMs with tools and it could do web research and all these kind of things um in real time for us.
That all falls really in the use case of personal assistance.
It's fundamentally human in the loop, right? you are interacting with a model
right? you are interacting with a model and the tools the model is using, but you're really you're directing it. So,
it turns out that that use case is very broad. So, Claude code is basically just
broad. So, Claude code is basically just a fancy personal assistant. It's very
very capable in how it works, but it's just another example of this human in the loop case. And the technology behind it is really very clever. I'd encourage
you to read up on how Claude Code works.
It very much relies on LLM's calling tools, but LLM's creating dynamic to-do lists and then checking them off through tool calls. Extremely powerful, very
tool calls. Extremely powerful, very powerful tool, not very predictable, but you know, for this task, that's fine. I
use clawed code quite a bit. Anything
that I don't like never gets committed in git. I roll back lots and lots and
in git. I roll back lots and lots and that's great. What I commit is valuable
that's great. What I commit is valuable to me, saves me time, makes me move faster than I would otherwise. However,
this is not the reality in enterprise applications. So, for example, you've
applications. So, for example, you've got this magic thing. Imagine you're
working with git. You've got this magic thing called called a roll back. Bad
thing bad thing gone. Bad thing never happened. No record exists of bad thing.
happened. No record exists of bad thing.
That is not the case when you're trying to automate business processes using Gen AI. For example, you've probably heard
AI. For example, you've probably heard about Air Canada a year or two ago, offered somebody an absolutely amazing fair or their chatbot did. Air Canada
made the mistake of not honoring that fair which ended up in a court case which they lost. But that's an example, you know, that incorrect
message to a customer, that mistake, it's not going to go away.
You cannot cope with the level of unpredictability that are perfectly fine for coding agents. This obviously isn't just my opinion. There are a bunch of surveys. The particularly dire one was
surveys. The particularly dire one was the MIT survey recently. I don't know how accurate any of these are, but the fact is it is overwhelming. Gen AI
projects in enterprises tend to fail. So
why is this?
There are some unavoidable challenges.
Working with this technology is hard. So
once upon a time things that were known to be non-deterministic with just the nasty corner cases that we dreaded having to deal with like you know race conditions all these some maybe some of
the issues in distributed systems they were typically corner cases and they accounted for a lot of our time.
You could write most of your code pretending that it would execute in a predictable deterministic manner. That
is no longer true. If you're interacting with LLMs, everything you do is going to encounter non-determinism and therefore become less predictable.
So that's you know that is a genuine hard problem um that is inherent in the technology. Similarly, we all know about
technology. Similarly, we all know about hallucinations. Hallucinations are bad.
hallucinations. Hallucinations are bad.
Um, LLMs do love making stuff up that things are getting a bit better over time, but you know, there are quite a lot of techniques that you need to use
to mitigate that. Prompt engineering is a complete misnomer because it's not really engineering. It's really alchemy.
really engineering. It's really alchemy.
Prompt engineering is like essentially throwing things at the wall and hoping that you know you add something like take a deep breath or think step by step
or um put something in big you know capital letters to say do not do whatever you don't want it to do
inherently nasty and messy. Um and
similarly obviously the cost and environmental implications are also a problem.
There are also some avoidable challenges that we make for ourselves and many of these are organizational. So one of the reasons Gen AI projects tend to fail is they're driven very often from the top
down. The board wants
down. The board wants Gen AI and they want it now. And of
course we know how well that works. You
know before open source really fixed the problem in the early 2000s J2E was very much top down. And we know how well that era of of technology worked. Similarly,
from an organizational point of view, you get siloing. And this is really dangerous. I've seen the um what I think
dangerous. I've seen the um what I think is actually an antiattern where you have a central AI function, central AI group, and it's essentially disconnected from
the rest of the business.
I actually recently spoke to um the one of the AI leaders at a large organization in Australia.
They have been in the job for 10 and a half months. They were unaware of
half months. They were unaware of whether the um company had any Java in production.
I've never worked for that company and I happen to know about 70% of what they do is in Java. So you know this by definition will not work.
Similar related to that is the green field fallacy. So you get people trying
field fallacy. So you get people trying to build agents trying to execute on Gen AI imagining that they're in green field
like all those blogs you read they use a few MCP tools to do web search or the like. They virtually never talk to an
like. They virtually never talk to an existing database or enterprise system.
So you know if you start if people start feeling that genai is green field inherently and ignoring what's there they are bound to fail. So you know
everything in software tends to stick around. We need to build on what's
around. We need to build on what's there. You know every time someone says
there. You know every time someone says this time it's different typically they're going to lose a lot of money in the stock market or they're going to make some other appalling mistake. This time it's different in the
mistake. This time it's different in the sense that the technology is quasi miraculous in some ways but it's not different in that it is our
responsibility to take forward what we have what works and bring the new features to it. Turns out that there's a pretty big and important division here
between the personal assistance scenario and what you need to automate business processes. So for example, claude code
processes. So for example, claude code is great for what it does, but you cannot use that approach which is just based on giving LLMs lots of agency and
lots of tools. You can't use that approach to automate business processes.
Okay, told you what is wrong and what's scary. How do we fix it?
scary. How do we fix it?
Well, the first thing that we need to do if we want to u make our business processes more agentic is we need to attack non-determinism.
We're not going to be able to declare complete victory because LLMs are inherently unpredictable but we are going to be able to put a lot of runs on
the board. And the way in which we can
the board. And the way in which we can do that for example is we break complex tasks into multiple steps. We use
smaller prompts. We give each of those LLMs that we invoke fewer tools. And
where possible, if we can do something in code, we do it in code because if we can do it in code, it will be quicker, cheaper, and more reliable. It will also
be better for the planet. So, you know, I think one of the key things is really fight the battle as best we can fight it to make our systems as deterministic as
they can be.
This is something that in imbabel which I'll get to we have built very deeply into our concepts. We also can introduce guard rails build reliable testing
frameworks and build bring a lot of well-known good practices. Second
we could integrate with what works. Do
the opposite of the green field fallacy.
Start by saying okay we are as I imagine most of you are working for fairly large companies our problem is leveraging the promise of
genai technology in the context of this company's existing business and assets well a lot of those assets are written
in Java and we need to be able to connect to them in a very natural way so you know firstly I think we massively mitigate our risks if we adopt
incrementally but secondly we build out of what already works.
This in order to achieve both these goals we need to bring more structure into how we work with LLMs. LLMs have this you know almost magical facility in
natural languages. So, you know, not
natural languages. So, you know, not just not just English, any language.
Like in uh a workshop yesterday, I got um the imbabel write and review story to review the story in Dutch. And well, I can't read Dutch, but no one complained.
Uh so, you know, they have this amazing freakish ability, but it doesn't mean that we should talk to them in English.
Take for example, let's let's roll back the clock. Let's imagine we're talking
the clock. Let's imagine we're talking to a customer support agent and we're not thinking about Genai. We've called
up say our insurance company and we're talking about our policy. The person
that we're talking to isn't relying on their memory.
They are relying on structure. They're
sitting in front of a keyboard. The
keyboard's probably connected through some Java middleware to an Oracle or other database. And the things all the
other database. And the things all the way down are structured. They're
objects. They're tables. theory
structure. It's not just English. So,
you know, that person um would not be very popular with their shift supervisor if at the end of the shift they said, "Well, I didn't bother entering any forms, but this is what I know." Um I
can tell you in, you know, 700 words the key things that happened today. I don't
think that person would be popular. So,
we bring as much structure to LLM interactions as we can. And this means structure in terms of object types. This
brings us to a term that I introduced a couple of months ago called domain integrated context engineering which I think is really really important. So
this is the idea of taking our domain model and making it central to what we do with LLMs. In fact, we can even put
tools for our LLMs to use selectively on our domain objects and it works. It
works beautifully and it enables us to integrate with the domain models we've already got. Okay, what can we do as
already got. Okay, what can we do as Java developers? Well, the first thing
Java developers? Well, the first thing we could do would be imitate Python frameworks. So, you know, just look at
frameworks. So, you know, just look at what's out there in Python and try to do that in Java. Obviously, that's a pretty poor promise for us because what does it
mean? Does it mean that we're downstream
mean? Does it mean that we're downstream of where immigr innovation comes from?
Does it mean also that we're going to suffer from, you know, essentially the fact that a lot of things in Python are effectively dtyped? As I think you can
effectively dtyped? As I think you can guess, I don't think this is very exciting and it wouldn't get me out of bed in the morning. What I think we need
to do, can do and are doing is build better. Look, absolutely look at what
better. Look, absolutely look at what Python frameworks have to offer. Um, be
very familiar with that, but do better.
Build better frameworks in Java. Aim to
lead, not to follow. And aim to bring the skills that we have as enterprise developers. Remember we built the core
developers. Remember we built the core business apps. So really we are uniquely
business apps. So really we are uniquely placed to bring them into the world of Gen AI. Guess what? Everything we know
Gen AI. Guess what? Everything we know about building robust software. And look
at this room. There's a lot of knowledge about building robust software in this room. Everything we know still matters.
room. Everything we know still matters.
It's not different. It's different in the sense that it's incremental and an important new thing has emerged, but it's not fundamentally different. The
next phase of the AI revolution won't be written in Jupyter notebooks.
Python is an important language. I think
every developer needs to be familiar with Python. I believe it or not, when I
with Python. I believe it or not, when I first started working on Inbabel a couple of years ago, I was significantly more fluent in Python than Java because I hadn't done Java for a number of
years. Python's great for data science
years. Python's great for data science scripting and prototyping, but it is not great for enterprise applications. And
remember, GNAI is quite different from data science. A lot of people make this
data science. A lot of people make this mistake. Genai is really about
mistake. Genai is really about application development skills. Data
science different skill set. So, you
know, your organization very likely has zero enterprise apps in Python.
Probably a pretty good number.
So, okay. Now, on to what I am personally um endeavoring to do about this. And I would like to introduce my
this. And I would like to introduce my new framework imbabel. Inbabel is a framework that is directly attended to address the key failure points of genai.
So obviously it's on the JVM. So I think you know as you know one of the key reasons for failure is distance from the critical technology that runs the
business but it also directly tackles the problem of non-determinism.
It really emphasizes domain modeling heavily which helps you expose your existing assets. Um, and it's designed
existing assets. Um, and it's designed around toolability and testability. As I
said, the goal is not to just copy what exists in Python. So whereas for example you know lang graph for J basically
takes the finite state machine approach of um lang chain for python.
Embabel introduces a new dynamic planning approach using a nonlm AI algorithm called gulp goal oriented action planning. It's really interesting
action planning. It's really interesting benefits and I don't have time to go into it here, but it gives you deterministic planning that's nevertheless smart. So you can add more
nevertheless smart. So you can add more actions and goals to your system and it can learn to do additional things but do them in a predictable way.
Compared to other frameworks, inbable is really more a server than a framework.
So for example, it builds on Spring AI.
But if you look at Spring AI, Spring AI is about taking processes and enabling them to invoke LLMs. So, Embable is a server that is managing
what we call agent processes and these potentially can be long running. So,
compared to other frameworks, the server knows about all the capabilities that were deployed to it, which means you can extend capabilities by adding actions to goals. So it's it's pretty ambitious
goals. So it's it's pretty ambitious project today. I would say that it
project today. I would say that it probably is the nicest way to do Gen AI on any platform, but tomorrow I think it
truly can extend to be the fabric that you Gen AI enable your um JVM centric enterprise with
it. Well, actually this screen's so big
it. Well, actually this screen's so big you can probably see it. We're very
proud of our API. It is a very modern API and actually it was great seeing some of those examples um of Java 25.
You know the way Java itself is changing and getting better and the way people write Java APIs is getting better. So
you know this is a really really nice API brilliant tool support um and a pleasure to program with. So compared to
Python frameworks, I've done a series of blogs where I'm taking Python frameworks and taking some of their examples and writing them in Java within Babel. So far I've done
three and I'll do many more. Two of them were Crew AI examples. Crew AI is a very popular uh framework in the Python space. The third one was paid AI. I
space. The third one was paid AI. I
would strongly encourage you to look at those blogs because for example the first one I did with crew moderately complex example.
The Java version has significantly fewer lines of Java code than of Python code and it also has significantly fewer lines of YAML than the Crew AI example.
So, you know, when people complain that Java is verbose, with a well-designed API and modern Java, it's not your
grandfather or grandmother's Java.
So, I what I would like to leave you with is Gen AI needs to grow up. It's
not working, right? It's working for the personal assistance. It's not working in
personal assistance. It's not working in enterprise. It needs to grow up. And
enterprise. It needs to grow up. And
really it is JVM developers who have the skills to do this because bringing domain integration um is absolutely
critical to success. I now when I'm writing a new agent I start by designing the domain objects that we'll use and then the agents fall out naturally. So
Embable aims to bring the JVM strengths to Gen AI and it is genuine innovation.
So finally I would say the future is up to you. I would strongly encourage you
to you. I would strongly encourage you to learn as much as you can about Gen AI. No single framework is going to
AI. No single framework is going to solve your problems. You need to educate yourself. Look, for example, at what
yourself. Look, for example, at what Stefan has been doing, how he's been exploring. You really need to be doing
exploring. You really need to be doing that kind of thing for yourselves. You
need to be reading blogs. You need to understand best practices. But then once you've got yourself up to speed, you should be able to pitch your boss on
doing Genai and Java. For example, this slide deck was largely generated by an imbabel agent. These are the steps that
imbabel agent. These are the steps that it went through. Our travel planner application is one of the nicest and most sophisticated um gen agent samples
I've seen anywhere. So you know not only can you persuade hopefully your boss that you can incrementally genai their existing applications don't be shy tell them hey have you
looked at this Java thing you know it's better than they have on Python okay so this was all slides no code please come
on Thursday afternoon to my session and I guarantee you there will not be a single slide there will be nothing but code and I will demonstrate how to get started with imbable Thank you. Great.
>> Thank you.
[Music]
Loading video analysis...