Why You Should Build Agents on the JVM by Rod Johnson

By Devoxx

Summary

## Key takeaways - **GenAI Fails Enterprises**: Gen AI projects in enterprises tend to fail, as shown by surveys like the recent MIT one. Personal assistants like ChatGPT work with human oversight, but business automation lacks rollbacks like git, as in Air Canada's chatbot court loss. [04:26], [03:51] - **Non-Determinism Inherent Challenge**: Interacting with LLMs makes everything non-deterministic, unlike past corner cases like race conditions. Hallucinations persist, and prompt engineering is alchemy, not engineering. [05:05], [05:34] - **Avoid Green Field Fallacy**: Building agents ignoring existing Java assets and databases is doomed, as enterprises run 70% Java yet AI teams often don't know. Integrate with what's there instead of starting green field. [07:29], [10:23] - **Embabel Tackles Non-Determinism**: Embabel uses Gulp goal-oriented action planning for deterministic yet smart planning, breaking tasks into steps with smaller prompts and code over LLM calls. It emphasizes domain modeling for existing assets. [16:36], [16:07] - **JVM Beats Python for Enterprise**: Python suits prototyping but not enterprise apps; Java/Embabel examples have fewer lines of code and YAML than Crew AI or Pydantic. Next AI phase won't be in Jupyter notebooks. [15:06], [19:03] - **Domain-Integrated Context Engineering**: Make domain models central to LLMs with tools on domain objects for type-safe integration. Start agent design with domain objects; agents fall out naturally. [12:52], [19:52]

Topics Covered

GenAI Fails Enterprises
Non-Determinism Now Everywhere
Avoid Central AI Silos
Domain-Integrated Context Engineering
Imbabel Tackles Non-Determinism

Full Transcript

[Music] Hi. Well, it's good to be here. It's

Hi. Well, it's good to be here. It's

quite a few uh years since I spoke here at DevOps and I'm just reminded what an amazing amazing show it is. So, what I want to do now is jump several levels

up. like we've heard from some really

up. like we've heard from some really really cool things um that are happening in the JVM and are also relevant to AI

that of course gives us the underpinning that means that we are able to run on Java on the JVM much of the world's business logic but I'm going to go up

several layers and I'm going to talk about agents and in fact really multi- aent systems so complex agents and talk

about why you should build those on the JVM. I probably don't need this slide

JVM. I probably don't need this slide after we've what we've already heard this morning. Um, Genai is not hype. I

this morning. Um, Genai is not hype. I

know there are a minority of developers who believe that they can just put their heads down and it will go away and eventually their managers will get sick of talking about it. It's not going to

happen. It changes the nature of how we

happen. It changes the nature of how we work. If it hasn't already changed how

work. If it hasn't already changed how you work, you really should learn more and about the tools that are available to you. Stefan obviously gave us some

to you. Stefan obviously gave us some great examples of that. So, you know, first thing, it is the elephant in the room. I really love what Stefan has done

room. I really love what Stefan has done with this conference making it about agents and AI. It is what we all need to care about most at this time.

So really the first killer use case with Gen AI was the personal assistant, right? Chat GPT, we could go and ask it

right? Chat GPT, we could go and ask it things. Then we could connect chat GPT

things. Then we could connect chat GPT or claude or other LLMs with tools and it could do web research and all these kind of things um in real time for us.

That all falls really in the use case of personal assistance.

It's fundamentally human in the loop, right? you are interacting with a model

right? you are interacting with a model and the tools the model is using, but you're really you're directing it. So,

it turns out that that use case is very broad. So, Claude code is basically just

broad. So, Claude code is basically just a fancy personal assistant. It's very

very capable in how it works, but it's just another example of this human in the loop case. And the technology behind it is really very clever. I'd encourage

you to read up on how Claude Code works.

It very much relies on LLM's calling tools, but LLM's creating dynamic to-do lists and then checking them off through tool calls. Extremely powerful, very

tool calls. Extremely powerful, very powerful tool, not very predictable, but you know, for this task, that's fine. I

use clawed code quite a bit. Anything

that I don't like never gets committed in git. I roll back lots and lots and

in git. I roll back lots and lots and that's great. What I commit is valuable

that's great. What I commit is valuable to me, saves me time, makes me move faster than I would otherwise. However,

this is not the reality in enterprise applications. So, for example, you've

applications. So, for example, you've got this magic thing. Imagine you're

working with git. You've got this magic thing called called a roll back. Bad

thing bad thing gone. Bad thing never happened. No record exists of bad thing.

happened. No record exists of bad thing.

That is not the case when you're trying to automate business processes using Gen AI. For example, you've probably heard

AI. For example, you've probably heard about Air Canada a year or two ago, offered somebody an absolutely amazing fair or their chatbot did. Air Canada

made the mistake of not honoring that fair which ended up in a court case which they lost. But that's an example, you know, that incorrect

message to a customer, that mistake, it's not going to go away.

You cannot cope with the level of unpredictability that are perfectly fine for coding agents. This obviously isn't just my opinion. There are a bunch of surveys. The particularly dire one was

surveys. The particularly dire one was the MIT survey recently. I don't know how accurate any of these are, but the fact is it is overwhelming. Gen AI

projects in enterprises tend to fail. So

why is this?

There are some unavoidable challenges.

Working with this technology is hard. So

once upon a time things that were known to be non-deterministic with just the nasty corner cases that we dreaded having to deal with like you know race conditions all these some maybe some of

the issues in distributed systems they were typically corner cases and they accounted for a lot of our time.

You could write most of your code pretending that it would execute in a predictable deterministic manner. That

is no longer true. If you're interacting with LLMs, everything you do is going to encounter non-determinism and therefore become less predictable.

So that's you know that is a genuine hard problem um that is inherent in the technology. Similarly, we all know about

technology. Similarly, we all know about hallucinations. Hallucinations are bad.

hallucinations. Hallucinations are bad.

Um, LLMs do love making stuff up that things are getting a bit better over time, but you know, there are quite a lot of techniques that you need to use

to mitigate that. Prompt engineering is a complete misnomer because it's not really engineering. It's really alchemy.

really engineering. It's really alchemy.

Prompt engineering is like essentially throwing things at the wall and hoping that you know you add something like take a deep breath or think step by step

or um put something in big you know capital letters to say do not do whatever you don't want it to do

inherently nasty and messy. Um and

similarly obviously the cost and environmental implications are also a problem.

There are also some avoidable challenges that we make for ourselves and many of these are organizational. So one of the reasons Gen AI projects tend to fail is they're driven very often from the top

down. The board wants

down. The board wants Gen AI and they want it now. And of

course we know how well that works. You

know before open source really fixed the problem in the early 2000s J2E was very much top down. And we know how well that era of of technology worked. Similarly,

from an organizational point of view, you get siloing. And this is really dangerous. I've seen the um what I think

dangerous. I've seen the um what I think is actually an antiattern where you have a central AI function, central AI group, and it's essentially disconnected from

the rest of the business.

I actually recently spoke to um the one of the AI leaders at a large organization in Australia.

They have been in the job for 10 and a half months. They were unaware of

half months. They were unaware of whether the um company had any Java in production.

I've never worked for that company and I happen to know about 70% of what they do is in Java. So you know this by definition will not work.

Similar related to that is the green field fallacy. So you get people trying

field fallacy. So you get people trying to build agents trying to execute on Gen AI imagining that they're in green field

like all those blogs you read they use a few MCP tools to do web search or the like. They virtually never talk to an

like. They virtually never talk to an existing database or enterprise system.

So you know if you start if people start feeling that genai is green field inherently and ignoring what's there they are bound to fail. So you know

everything in software tends to stick around. We need to build on what's

around. We need to build on what's there. You know every time someone says

there. You know every time someone says this time it's different typically they're going to lose a lot of money in the stock market or they're going to make some other appalling mistake. This time it's different in the

mistake. This time it's different in the sense that the technology is quasi miraculous in some ways but it's not different in that it is our

responsibility to take forward what we have what works and bring the new features to it. Turns out that there's a pretty big and important division here

between the personal assistance scenario and what you need to automate business processes. So for example, claude code

processes. So for example, claude code is great for what it does, but you cannot use that approach which is just based on giving LLMs lots of agency and

lots of tools. You can't use that approach to automate business processes.

Okay, told you what is wrong and what's scary. How do we fix it?

scary. How do we fix it?

Well, the first thing that we need to do if we want to u make our business processes more agentic is we need to attack non-determinism.

We're not going to be able to declare complete victory because LLMs are inherently unpredictable but we are going to be able to put a lot of runs on

the board. And the way in which we can

the board. And the way in which we can do that for example is we break complex tasks into multiple steps. We use

smaller prompts. We give each of those LLMs that we invoke fewer tools. And

where possible, if we can do something in code, we do it in code because if we can do it in code, it will be quicker, cheaper, and more reliable. It will also

be better for the planet. So, you know, I think one of the key things is really fight the battle as best we can fight it to make our systems as deterministic as

they can be.

This is something that in imbabel which I'll get to we have built very deeply into our concepts. We also can introduce guard rails build reliable testing

frameworks and build bring a lot of well-known good practices. Second

we could integrate with what works. Do

the opposite of the green field fallacy.

Start by saying okay we are as I imagine most of you are working for fairly large companies our problem is leveraging the promise of

genai technology in the context of this company's existing business and assets well a lot of those assets are written

in Java and we need to be able to connect to them in a very natural way so you know firstly I think we massively mitigate our risks if we adopt

incrementally but secondly we build out of what already works.

This in order to achieve both these goals we need to bring more structure into how we work with LLMs. LLMs have this you know almost magical facility in

natural languages. So, you know, not

natural languages. So, you know, not just not just English, any language.

Like in uh a workshop yesterday, I got um the imbabel write and review story to review the story in Dutch. And well, I can't read Dutch, but no one complained.

Uh so, you know, they have this amazing freakish ability, but it doesn't mean that we should talk to them in English.

Take for example, let's let's roll back the clock. Let's imagine we're talking

the clock. Let's imagine we're talking to a customer support agent and we're not thinking about Genai. We've called

up say our insurance company and we're talking about our policy. The person

that we're talking to isn't relying on their memory.

They are relying on structure. They're

sitting in front of a keyboard. The

keyboard's probably connected through some Java middleware to an Oracle or other database. And the things all the

other database. And the things all the way down are structured. They're

objects. They're tables. theory

structure. It's not just English. So,

you know, that person um would not be very popular with their shift supervisor if at the end of the shift they said, "Well, I didn't bother entering any forms, but this is what I know." Um I

can tell you in, you know, 700 words the key things that happened today. I don't

think that person would be popular. So,

we bring as much structure to LLM interactions as we can. And this means structure in terms of object types. This

brings us to a term that I introduced a couple of months ago called domain integrated context engineering which I think is really really important. So

this is the idea of taking our domain model and making it central to what we do with LLMs. In fact, we can even put

tools for our LLMs to use selectively on our domain objects and it works. It

works beautifully and it enables us to integrate with the domain models we've already got. Okay, what can we do as

already got. Okay, what can we do as Java developers? Well, the first thing

Java developers? Well, the first thing we could do would be imitate Python frameworks. So, you know, just look at

frameworks. So, you know, just look at what's out there in Python and try to do that in Java. Obviously, that's a pretty poor promise for us because what does it

mean? Does it mean that we're downstream

mean? Does it mean that we're downstream of where immigr innovation comes from?

Does it mean also that we're going to suffer from, you know, essentially the fact that a lot of things in Python are effectively dtyped? As I think you can

effectively dtyped? As I think you can guess, I don't think this is very exciting and it wouldn't get me out of bed in the morning. What I think we need

to do, can do and are doing is build better. Look, absolutely look at what

better. Look, absolutely look at what Python frameworks have to offer. Um, be

very familiar with that, but do better.

Build better frameworks in Java. Aim to

lead, not to follow. And aim to bring the skills that we have as enterprise developers. Remember we built the core

developers. Remember we built the core business apps. So really we are uniquely

business apps. So really we are uniquely placed to bring them into the world of Gen AI. Guess what? Everything we know

Gen AI. Guess what? Everything we know about building robust software. And look

at this room. There's a lot of knowledge about building robust software in this room. Everything we know still matters.

room. Everything we know still matters.

It's not different. It's different in the sense that it's incremental and an important new thing has emerged, but it's not fundamentally different. The

next phase of the AI revolution won't be written in Jupyter notebooks.

Python is an important language. I think

every developer needs to be familiar with Python. I believe it or not, when I

with Python. I believe it or not, when I first started working on Inbabel a couple of years ago, I was significantly more fluent in Python than Java because I hadn't done Java for a number of

years. Python's great for data science

years. Python's great for data science scripting and prototyping, but it is not great for enterprise applications. And

remember, GNAI is quite different from data science. A lot of people make this

data science. A lot of people make this mistake. Genai is really about

mistake. Genai is really about application development skills. Data

science different skill set. So, you

know, your organization very likely has zero enterprise apps in Python.

Probably a pretty good number.

So, okay. Now, on to what I am personally um endeavoring to do about this. And I would like to introduce my

this. And I would like to introduce my new framework imbabel. Inbabel is a framework that is directly attended to address the key failure points of genai.

So obviously it's on the JVM. So I think you know as you know one of the key reasons for failure is distance from the critical technology that runs the

business but it also directly tackles the problem of non-determinism.

It really emphasizes domain modeling heavily which helps you expose your existing assets. Um, and it's designed

existing assets. Um, and it's designed around toolability and testability. As I

said, the goal is not to just copy what exists in Python. So whereas for example you know lang graph for J basically

takes the finite state machine approach of um lang chain for python.

Embabel introduces a new dynamic planning approach using a nonlm AI algorithm called gulp goal oriented action planning. It's really interesting

action planning. It's really interesting benefits and I don't have time to go into it here, but it gives you deterministic planning that's nevertheless smart. So you can add more

nevertheless smart. So you can add more actions and goals to your system and it can learn to do additional things but do them in a predictable way.

Compared to other frameworks, inbable is really more a server than a framework.

So for example, it builds on Spring AI.

But if you look at Spring AI, Spring AI is about taking processes and enabling them to invoke LLMs. So, Embable is a server that is managing

what we call agent processes and these potentially can be long running. So,

compared to other frameworks, the server knows about all the capabilities that were deployed to it, which means you can extend capabilities by adding actions to goals. So it's it's pretty ambitious

goals. So it's it's pretty ambitious project today. I would say that it

project today. I would say that it probably is the nicest way to do Gen AI on any platform, but tomorrow I think it

truly can extend to be the fabric that you Gen AI enable your um JVM centric enterprise with

it. Well, actually this screen's so big

it. Well, actually this screen's so big you can probably see it. We're very

proud of our API. It is a very modern API and actually it was great seeing some of those examples um of Java 25.

You know the way Java itself is changing and getting better and the way people write Java APIs is getting better. So

you know this is a really really nice API brilliant tool support um and a pleasure to program with. So compared to

Python frameworks, I've done a series of blogs where I'm taking Python frameworks and taking some of their examples and writing them in Java within Babel. So far I've done

three and I'll do many more. Two of them were Crew AI examples. Crew AI is a very popular uh framework in the Python space. The third one was paid AI. I

space. The third one was paid AI. I

would strongly encourage you to look at those blogs because for example the first one I did with crew moderately complex example.

The Java version has significantly fewer lines of Java code than of Python code and it also has significantly fewer lines of YAML than the Crew AI example.

So, you know, when people complain that Java is verbose, with a well-designed API and modern Java, it's not your

grandfather or grandmother's Java.

So, I what I would like to leave you with is Gen AI needs to grow up. It's

not working, right? It's working for the personal assistance. It's not working in

personal assistance. It's not working in enterprise. It needs to grow up. And

enterprise. It needs to grow up. And

really it is JVM developers who have the skills to do this because bringing domain integration um is absolutely

critical to success. I now when I'm writing a new agent I start by designing the domain objects that we'll use and then the agents fall out naturally. So

Embable aims to bring the JVM strengths to Gen AI and it is genuine innovation.

So finally I would say the future is up to you. I would strongly encourage you

to you. I would strongly encourage you to learn as much as you can about Gen AI. No single framework is going to

AI. No single framework is going to solve your problems. You need to educate yourself. Look, for example, at what

yourself. Look, for example, at what Stefan has been doing, how he's been exploring. You really need to be doing

exploring. You really need to be doing that kind of thing for yourselves. You

need to be reading blogs. You need to understand best practices. But then once you've got yourself up to speed, you should be able to pitch your boss on

doing Genai and Java. For example, this slide deck was largely generated by an imbabel agent. These are the steps that

imbabel agent. These are the steps that it went through. Our travel planner application is one of the nicest and most sophisticated um gen agent samples

I've seen anywhere. So you know not only can you persuade hopefully your boss that you can incrementally genai their existing applications don't be shy tell them hey have you

looked at this Java thing you know it's better than they have on Python okay so this was all slides no code please come

on Thursday afternoon to my session and I guarantee you there will not be a single slide there will be nothing but code and I will demonstrate how to get started with imbable Thank you. Great.

>> Thank you.

[Music]

Loading...

Loading video analysis...