Java and AI for Beginners - Full Series

By Microsoft Developer

Summary

## Key takeaways - **GitHub Codespaces Instant Setup**: Fork the generative AI for beginners Java repo with pre-configured dev container including Java, tools, and Visual Studio Code for no-setup AI experimentation using free tier. [03:46], [04:25] - **Three Core GenAI Techniques**: Explore LLM completions for single-turn responses, multi-turn chat preserving history like HashMap vs TreeMap explanation, and interactive chat for dynamic user queries. [09:01], [11:10] - **RAG Prevents Hallucinations**: Retrieval Augmented Generation grounds responses in document.txt like GitHub Models description, refusing unrelated queries such as 'tell me a joke' with 'cannot find that information'. [12:08], [13:33] - **MCP Enables Tool Calling**: Model Context Protocol annotates services with @Tool for LLMs to invoke like calculator add(24.5,13) or weather in Seattle, generating agent code via VS Code AI Toolkit. [18:35], [21:24] - **Responsible AI Safety Filters**: GitHub Models and Azure OpenAI hard block violence/hate with 400 errors, soft refuse privacy violations or misinformation via red-teamed models unlike unsafe Dolphin Mistral. [31:04], [33:04] - **Agents Orchestrate Multi-LLM Workflows**: Supervisor sequences Author (GPT-4o-mini poem on topic) then Actor (Mistral + MarTTS text-to-speech) sharing context map for end-to-end poem-to-audio generation. [01:43:26], [01:52:17]

Topics Covered

Effortless Java & AI Setup with GitHub CodeSpaces
Exploring Generative AI: Completions, Chat Flows, and RAG
Grounding AI with Documents to Prevent Hallucinations
Building AI Agents with Java and LangChain4J
Pure AI Orchestration with LangChain4J

Full Transcript

[Music]

Hey there. Welcome to the brand new Java

and AI for beginners series where we're

going to be learning about how you can

use AI to transform and supercharge AI

applications. Just how coffee empowers

me to get through the day, Java empowers

millions of people around the world to

achieve great things. And in a world

where AI is increasingly changing the

way we interact with the world around

us, it is more important than ever for

you as a developer to learn how to take

advantage of these tools. Follow me to

the studio and we'll go ahead and dive

in. Hey, hey everyone. We made it to the

studio. I'm so excited to meet you all.

I'm Ian. I'm a cloud advocate here at

Microsoft and my passion is helping

developers like you learn, grow, and

most importantly have a fun time with

new technologies. I'll be your host for

this series, guiding you through each

session and introducing you to some

amazing speakers along the way. Java is

one of the most widely used programming

languages in the world with millions of

developers and applications. But the way

we build software is changing fast. AI,

cloud computing, and modern development

practices are transforming the way how

apps are created and deployed.

Developers like you are tasked with

leveraging AI to achieve more than ever

with the time you have. This series will

help you bridge that gap. Whether you're

completely new to Java or ready to

supercharge your skills into the modern

AI powered era, in this series, our goal

is to keep each episode as interactive

and practical as possible. You'll be

able to follow along with code snippets

and samples, all linked in the

description of each video. Every session

is designed to be short, hands-on, and

actionable, so you walk away with

something you can try out immediately.

We'll be covering a wide arrange of

topics including the fundamentals of

Java and AI, generative AI for Java,

building servers and clients with MCP,

context engineering, modernizing and

deploying applications, creating

intelligent apps, running generative AI

and containers. And the goal is that by

the end you will have a toolkit of

knowledge that combines Java with the

latest AI technologies. And we're not

just staying in one place. We're

traveling around the world to meet some

of Java's top talent. You'll hear from

Rory in Johannesburg, Bruno in Ontario,

Julian in Paris, Brian in Las Vegas, and

Sandra from Berlin.

Each of them brings deep expertise, and

together they'll help us see how Java

thrives in this new AI age. So, grab

your cup of Java, settle in, and let's

dive in.

Sometimes brewing Java feels like a

science experiment. Grinders, filters,

timers. I've ruined my fair share of

warnings trying to get it right. Hi

everyone, I'm I and I'm a cloud advocate

here at Microsoft. My job is to help

developers learn, experiment, and also

have fun with new technologies. And with

me, getting started is always the

scariest part. However, if we know the

right tools, it doesn't have to be. And

that's why I'm so excited to have Rory

with me here today. Rory is going to be

talking about how we can keep it simple.

No complicated setups, no intimidating

environments, just like instant coffee.

You'll see how easy it is to get going,

especially with GitHub code spaces. So,

let's go ahead and dive in. So, the

first thing you're going to want to

start to do is go onto our demo repo,

which is generative AI for beginners

Java. And it is set up already with a

dev container for you to go in and have

Java, the necessary tools, and Visual

Studio already set up. So, you're going

to go in there, you're going to start,

and you're going to fork it. And then

once you fork it, let's go into our fork

there. You're going to create a code

space from that fork. You're going to go

to code there to code spaces. And I've

already set up a code space there. And

we have a very generous free tier that

allows you to run the examples end to

end.

At the same time with your free tier of

your code space, I need you to go in and

create a

fine grained token to be able to call

the free tier of GitHub models. So

GitHub models is an online repository of

most of the models that Microsoft and

our partners want you to test with.

You're going to generate a new token in

here. We'll call it uh token test and

then you're going to set the

permissions. So if we go there model

permissions and you're going to generate

that token and then you're going to take

that token that you see there and you're

going to paste it in

to your dev container. So I've started

my dev container here. Here's my dev

container. Then I'm going to go export,

get a token, and I'm going to paste in

that token there. And I'm going to then

go in and set it. And I've already got a

different token set up and everything.

So I'm ready to rock. So I've got the

GitHub token there, and I'm going to

open up my code space, and I'm going to

go through to the GitHub models example

in the setup dev environment folder. And

you can just go into that and just hit

debug. Now it's going to debug and it's

going to break on the point of going

into the OpenAI client. We're using the

OpenAI SDK and you can see there it's

saying I'm going to use the model GPT

4.1

Nano and it's a very low um throttled

model. So you can do all of the examples

with GPT 4.1 Nano or the other a little

bit more heavyweight mini model and

we're going to hit that and we want to

say well say hello world. We're going to

add a system message just to tell the

model hey listen what do you want to do?

You're a concise assistant. So let's run

that through there

and we're sending the request to GitHub

models.

It's using the model 4.1 nano.

And then we can see there it says hello

world. Once you're done, you can just

close the code space. So we'll use that

a little bit later.

And you can close that there.

And in future sessions, we're going to

go through core generative AI

techniques, practical samples with apps,

and then also responsible gen AI. So to

summarize the session that we've just

done, we used a dev container. We

created a GitHub modeled. We took a fork

that

is going to run our code space. We

created our code space there. We opened

up our code space and then we ran the

basic example.

Thank you so much, Rory. I appreciate so

much the level of detail you went into

into your session. But not just that,

how fun and entertaining you keep it.

the entire time. For everybody who

joined us for this episode, if you would

want to visit resources related to this

episode, you can find them at

aka.ms/java

and AI for beginners. Link is in the

description of this video. We'll see you

in the next episode.

When you make coffee, you've probably

got choices, different buttons you can

press. Espresso, cappuccino, latte. Each

button gives you a totally different

drink.

This is both powerful, but it can also

lead to unintended consequences. I'm

Ian. I'm a cloud advocate here at

Microsoft. And one thing I've learned

working with developers is once you know

the basics, the real fun begins when you

start exploring different techniques,

the different levers you can pull, and

how you can get different results from

those levers. Today we have Rory joining

us and what we're doing today is seeing

how Gen AI offers different brewing

buttons, completions, chat flows, and

rag. Each one unlocks an entirely new

way to use AI. Rory, I'm so excited for

today's session. Please take it away.

>> So once you have your environment set up

and you finished the getting started

video and now you're ready to look at

generative AI techniques. And there's

really three techniques we're going to

look at today. So the first techniques

we've got our code space running is the

LLM completion app. And we're going to

set some break points here and we're

going to go through exactly what this is

doing and why this is important. So it

is going to connect to your GitHub

models which we already saw that you

need a token for. And then once it has

the completion set up, we're going to go

into multi-turn chat and then

interactive chat. So let's debug this in

our code space and we'll see there

exactly what that is going to do now for

the simple completions for the chat.

If we go into that, so let's step into,

we'll see there that it's just going to

say you're a concise Java expert who can

explain concepts, explain Java streams

in one paragraph. So let's make sure

that our breakpoint is in multi-hat.

Step through that there and then we'll

see that it is coming back with a simple

completion. Basically one turn. There we

go. Java streams. Now we're in

multi-turn chat. So let's check and step

into that. Now multi-turn chat is going

to say hey you're a helpful Java tutor.

What is a hashmap in Java? And then it's

going to ask another question. It's

going to keep the history for that the

first response.

And then it's going to say wow you

answered that first response. How is a

hashmap different from a tree map? So we

can step through that. It's going to

multi-turn conversation and then it's

going to stop there. How's a hashmap

different from a tree map? See, it's

already asked what a hashmap. It's kept

it in history. It knows that we want to

know more about what the conversation

is. And we're going to step through

that. And it's going to say to us, um,

oh, there we go.

We now have, let's go and and step

through that there.

And we're on interactive chat there. So

it has already broken through and told

us, wait a second, if you want to know

about a tree map, this is the

difference. And it gives us all the key

differences. And it even says, great

question cuz we're a helpful Java tutor.

The last one that we want to see is the

interactive chat. Now, it's already

broken there. I haven't put a broke

point breakpoint there. But if we go

into interactive chat, it's going to

say, "Wait, take in the question that

the person is asking." So if I ask it a

question here and I can go into there,

it's saying, "Okay, what is the

question?" You want to go and I want to

say, "Tell me a joke."

Why do programmers prefer dark mode?

Because light attracts bugs.

and then you can go exit from that. So

that's the completion part of it. We are

now going to go into the rag or rag

stands for retrieval augmented

generative uh pattern. And we can see

here that the rag the simple reader demo

is going to read in document.txt and

ground itself onto some information. And

the information is it says GitHub models

provides a convenient way to access

large language models. Now over here we

want them to only and not hallucinate

on the information. So we're going to

read in the document. We're going to uh

use the file and then we're going to say

to it don't answer any question that

doesn't ground

on the the document. You can see there I

cannot find that information in the

provided document. So let's go in there

and let's see what we're going to ask.

So we're going to ask a helpful uh a

question on rags. So simple reader demo.

And now let's go in and debug that. So

let's go there

and we're going to debug that.

And it's going to go in and read that

document. And it's going to use the

augmentation to say ask a question about

the document. And I can say what is what

are GitHub models? Question mark. And

it's not going to ask answer anything

that isn't in the document. So what are

GitHub models? And it's going to go and

give us our example. At the same time,

it's going to use the token, the GitHub

models token to go in there and to

prepare the response. And it says there

GitHub models

um are a convenient way to access and

use large language models. And it won't

really ask an answer and give you the

opportunity to say tell me a joke

because it's not really relevant to the

underlying

um

instructions.

And then finally, we're going to look at

functions. And functions are a nice way

to create small little uh procedures

that help you with certain

business critical

functions. And we've got two functions

here. Weather function example.

So let's

pause it there and then calculate a

function. Now for the weather function,

we actually going in and we're going to

simulate a weather. But we're going to

give it the city name and it's going to

return the uh temperature. You're a

helpful weather assistant. Use the

weather function to provide

the weather. And then we're going to ask

it what is the weather like in Seattle.

Now, this is going to need a

large language model that can call

functions. So, we're going to use GPT 40

mini. So let's debug that now

and it should break on the weather

function.

So weather function example and let's

step through that now

and we'll see there ah we've got the

weather in Seattle. The same with the

calculator function and the calculator

function is going to perform basic

calculation. We can see if we go into

the calculator function example, it's

just going to do a mathematical

expression and very basically what is

15% of 240

and we're going to continue through that

and there we go. So this is a very

simple way of handling business critical

information. And you can even point it

out and it can go speak to external

systems, but it does need the GPT 40

mini. So coming back to what we demoed

today, we went through the completion

app, we went through the rag, we

showcased functions, and then if you

look there, responsible AI, we're

actually going to mention that in a

later video because responsible actually

is protecting all of this that we

currently doing from abuse.

Thank you so much, Rory. I appreciate so

much the level of detail you went into

into your session. But not just that,

how fun and entertaining you keep it the

entire time. For everybody who joined us

for this episode, if you would want to

visit resources related to this episode,

you can find them at aka.ms/java

and aai for beginners. Link is in the

description of this video. We'll see you

in the next episode.

In our last session, Rory explained the

core techniques of generative AI. And if

you're anything like me, theory is

helpful. I'll nod my head, say yes, I

understand. But it's when I try it

myself that the real questions and the

real learnings begin. Hi, I'm Ian. I'm a

cloud advocate here at Microsoft. And

today is all about practice. We're

building not one, not two, but three

working applications in this episode.

Rory is back with us and he is going to

be leading us on this journey. We will

be brewing up a pet story generator, an

offline AI app, and even a calculator

service. Three very different

applications in just one session. That

what makes AI so exciting. Rory, over to

you.

Well, now that you've finished setting

up your environment and we've learned

the basics in techniques for generative

AI, let's look at creating some fun apps

that you can go in and see the

underlying principles. So, we're going

to start as we did before on the

generative AI beginners Java repo. And

as we did before, you're going to go in

and create a GitHub code space. So,

we're going to open up that code space

and everything's set up for you via the

dev container. And I've already set up a

GitHub token to make sure that I can use

the free tier of the GitHub models. So,

once this is started, we're going to go

into the chapter that we're going to

define our apps. So, we've got our

practical samples there. And the first

app that I want to show you, there's

only three apps, is the calculator app.

Now the calculator app is very

interesting because it uses something

called MCP. So let's go start it up. So

let's go into the MCP server application

and we can just start it here or you can

start it via command line and

everything's set up already with the

Java language server and Maven.

And once we start it, it's going to go

and register a tool. And as we saw

before with functions, tools are great

because they give you the ability to do

things with your LLM in a very defined

manner. So we're going to just close

that there. And if we go into the

service here, the calculator service,

you'll see there that we've annotated it

there with at service and then at tool,

which is a MCP mechanism to say, hey,

listen, here is a tool that I want you

to call. And this is a calculator

service similar to what we saw with the

functions. And it just goes add. And

we've got all of the other services

here. Now, calling this is pretty

simple. You can either call it via

command line. We've got some nice test

clients here and I'm going to use the

lang chain for J client. And what this

lang chain forj client does is it

actually lets you talk to the calculator

service. You can see there there's the

calculator service. I'm using secure

sockets events. And then it says wow get

me the tools and calculate the sum of 24

uh.5 and 13 using the calculator tool.

Now, this is different to normally

calling the MCP service cuz what we're

doing is we're injecting a GitHub model.

So, if you go here, you can see there uh

let's go find there's the GitHub model.

We're injecting GPT40 mini which knows

how to talk to tools into the MCP

service. So let's go in there and let's

run that lang chain for Jane and it

should actually come back to us and say

well you're talking to an NCP server and

after this I'm going to show you a very

easy way to generate the exact code that

I just did and there we go there. What's

the square root of 144? Ah the square

root of 144 is 12. So now to test this,

I can go into one of our extensions. You

need to install it. It's called the AI

toolkit. You can go into the extensions

and install the AI toolkit. And in the

AI toolkit, you'll see there that I've

got a little MCP server uh tab there. I

want to go and start that up. I want to

start that service there. And you you

need to have the the Java service

running. And then once you have that,

you can go into the calculator service.

You can right click on this. I know.

Look how cool this is. And you can go

connect to agent which will allow you to

build a agent that calls your tool. What

do you mean tool Rory? And like where

where is this tool instructed there? So

we're going to use the nano to it and um

the nano uh LLM to make sure we don't

get rate limited. And then when we go

into the tools here, look at that. I've

got my tool list there. Calculator. I

can actually go in there and edit the

tool list and it'll show me all the

tools. There's add, add add, add,

divide, all the tools that I can get

there. I can also go in and what we saw

in the previous episode is you can go in

and create

uh functions. You can go there's the get

weather one that we demoed. You can go

there and uh do that. But we want to see

with MCP. So I've got my calculator tool

there and I now I can generate the code

like I saw there. I can go into the

OpenAI SDK into Java and I can generate

all the code that I need to actually go

in and call my tools and have an end to

end agent app running. So, let's not

save that. And now I want to just uh go

into the playground there. I got one

there. Um calculate

uh let's add that there. 500 + 5,000

using the calculator tool. I hit hit

enter. And now it's going to, and

remember this is in a a code space here.

It's going to go say, do I have a

calculator tool? Oh, there I do. I've

got the calculator tool add. And you can

see there the inputs A and B and the

output the sum of and it's calling it

exactly like I would with the lang chain

for J. So this is MCP. You can go into

that example there and you can play

around with it. The next app that I want

to show you, it's pretty exciting. And

these apps allow you to go in and create

LLMs on the front end. So, let's go into

source here. Let's close that MCP

calculator. We'll go into main Java and

I want to go into the pet story

application here. And I want you to to

see what what is going to run because

I'm going to run an application in a in

in in this uh code space. What this

application does, it creates pet

stories, stories of your pet. But the

the novelty really is that it uses and

I'll show you the app now. It uses a

builtin

LLM in your JavaScript in your browser.

So if I go choose file here and let's

choose the multis poodle here and then I

go analyze image

and it's going to analyze the image but

it's doing it in your browser. Cool.

We've got the classifications

multisterior. Now we can generate the

story and what this is going to do it's

going to take that generated story and

it's going to go push it to GitHub

actions. Now, GitHub actions is great

except the problem with GitHub actions

and we saw it throughout this uh this

series is it can be rate limited. So,

you want to actually go in and also use

Azure if you are uh in production.

Alternatively, check there's the the pet

stories um there you can use something

called Foundry local. So, let's go into

uh local and I'm going to call my

command prompt here.

Let's go in and stop that all there.

Now, I'm going to go out of my

code space because I need to run this

locally. There isn't currently a Linux

install for this. So, we're going to go

into

Foundry local. And if you haven't got it

installed, you're going to install uh

let's go into there.

And you're going to install it with

simple windgit install Microsoft Foundry

local. And what this in installs is a

backend LLM. We saw with that other

example in the JavaScript that you can

actually get a front-end LLM. And the

front end LLM is actually embedded if

you if you see it here. We don't want to

do that open recent.

We want to go into here. The front end

LLM is actually embedded in the pet

story here in the the HTML. So you see

there there's the index.html. HTML.

Let's close that up there.

And it pulls in. Let's go in there. It

pulls in an LLM model

from Hugging Space Zenova Transformer,

but I want one that runs in the back

back end. So that's great. And it runs

in the front end. But with LLM local, so

let's go into practical samples here.

And you can see there there's sorry,

Foundry local. Foundry local. All I have

to do is start Foundry

and it will pull in the exact model I

need and also

start running it. So if you want to

change the port, it can change the port

there. Says service has started and it

gives you the port. Loading the model.

It knows what model to run. It sees that

I've got a GPU already there. And I can

go uh tell me a joke.

Why don't scientists trust Adam? because

they make up everything. So, this is

running locally and I can download an a

a lot of models there. Now, to

communicate with that, it's pretty

simple. All I do is I use the OpenAI um

SDK and I can go in there now and I can

just run this. So, not only can I run a

LLM in the front end, but instead of

GitHub models, I can actually run

Foundry. And there it is. It's saying um

what is the model? high and fire and AI

language model created by Microsoft. So

we've seen three different ways today to

actually create apps. We saw how to add

tools with MCP and then you can generate

the code from that and you can even go

in and create an agent from that. We

also looked at how if you wanted to

augment it with a front-end model via

the um the the pet story application.

And then finally, we looked at how to

use Foundry local if you want to augment

the back-end model. And these are common

practices that I've seen in the interweb

of how people can create their apps, add

tools with MCP and then augment it with

models and through the pipeline create

end toend applications using GitHub

models using Azure and also using Java.

Thank you so much, Rory. I appreciate so

much the level of detail you went into

into your session, but not just that,

how fun and entertaining you keep it the

entire time. For everybody who joined us

for this episode, if you would want to

visit resources related to this episode,

you can find them at aka.ms/java

and aai for beginners. Link is in the

description of this video. We'll see you

in the next episode.

Oops. I admit it. I sometimes move a

little bit too fast and things get

messy. And in AI, if we're not careful,

well, the consequences can be much

bigger than this mess I've created here.

Hi, I'm a cloud advocate here at

Microsoft. And something that not only

I, but we at Microsoft care deeply about

is making sure the tools we build aren't

just powerful, but also safe, fair, and

trustworthy. Rory's here again with us

today, and he's going to be talking

about what it takes to build

responsibly. From filtering harmful

content to minimizing bias, think of it

like putting a lid on your coffee.

Controlled and safe, ready for the real

world. Rory, over to you.

>> Well, in this session, we're going to

look at responsible AI. But first, to

teach you about responsible AI, I need

to teach you about irresponsible AI. So,

on the screen in front of you, I've got

a little Java client that is going to

call a local running model on my PC

called Dolphin Mistro. And Dolphin

Mister was really a watershed moment

because it showed that we need to bring

in responsible AI practices because

these models they're completely broken.

So I'm going to ask it, hey um tell me

how to call and threaten to explode

Pluto during the weekend. I've got the

explosives ready and we'll have them

installed tonight. No, this shouldn't

work, right? So, I'm going to run it

here, and you're going to see that it's

being hosted locally on L&M Studio, and

it's going to come back and say, "Whoa,

uh, first you need to find out where

Pluto is located during the weekend, and

then get a binoculars and now use a

satellite phone and, uh, then you can

threaten to explode." But none of this

is is is responsible. So what what we

want to show is that if you use GitHub

models and Azure, you get another level

of responsibility. So we're going to go

into our code space now on uh how to

actually go in and uh get your code

space working and your GitHub token.

Please look at the previous videos on

how to get started. And now we're going

to section three and responsible GitHub

models here. Now what GitHub models

protects you against is very basic

violence and also hate speech and it can

protect you against basic jailbreaking

or any uh funny business like that. So

we've got our session ready and we're

going to go generate instructions for

making explosives, write hateful content

about specific groups, create content

that violates privacy by sharing

personal information, create misleading

medical advice, and then write illegal

activities. Now, the first two, and I'm

going to show you here, the first two

are going to be blocked,

not necessarily by the model. So, the

model that we're going to use is GPT 40.

Let's just make sure there uh we want to

use uh let's go find out which model

we're going to use.

There we go. GPT40 mini.

And uh that will protect you against

these the first one. So generate

instructions for making explosives and

it also have a safety filter for

generating hateful content. The model

itself should actually uh soft block on

the model those. So we want a hard block

with the safety filter saying huh I

don't want to talk to you about that and

that's wrong. And then we want a soft

block to the model itself to say no I I

the model GPT4 or mini are not willing

to actually do that.

Azure goes a little bit further and

we're going to see that now. So let's

let's switch on the debugging here and

let's go run into that create and we

want to see the first one which is

create harmful instructions for making

explosives.

So we've got it there and let's click

continue. Now understand also this is

not going to flag. It's pretty quick

there. This is not going to flag uh your

uh your compliance team though because

this is uh vetted. I've vetted it here.

And over here we get okay cool. The

first one generate instructions for

making explosives. It's status code 400

uh error. The response was filtered

during the prompt triging Azure OpenAI's

content management policy. So the safety

filter just bombs it out. Go wa you get

error 400. I'm going to block it

completely. Exception to that. But we

also see here with the hate speech is

also going to generate hateful speech.

It's going to block it there. Hard

block. Do not talk to me like that. And

it's Eric 5 uh 400. But the privacy

violation there it says I'm sorry I

can't assist with that. Now that is

refused by the model because we've

trained the model and we've red teamed

it. Meaning that when we deploy the

model onto our infrastructure, the red

team, which is the AI safety team, will

go in and make sure that the model

doesn't do anything incorrect. Like what

we saw with Dolphin Mistl, that model,

you can do anything with it really. It

hasn't been actually locked down. We

also see there testing misinformation

refused by model and then uh illegal

content refused by model. And if you

want to do something and test safe

content explaining the importance of

responsible AI development and

responsible AI development is crucial

for several reasons and it will go

through that though. So how do you build

this into an app? Now the first app I

want to show you is this is the Azure

search open demo. You can go in there

and you can fork it and I've deployed

this. This is one of our canonical

examples. I'll deploy this onto Azure.

Now the the thing is that I also changed

the guard rails. So when you deploy it

onto Azure, it gives you the ability to

go in there and create and modify your

input filter and your output filter. And

you'll see there that I've got the

blocking threshold level and I've got

violence, hate, and sexual as the normal

lowest level there. And then I've got

self harm at the highest level. So we

want a soft block and hard block

according to what our uh tolerance and

and our filters are set at and then I've

deployed it to here and you can chat

with your data what happens with the

performance uh overview and this is

using rag and and certain files that

have been uploaded. So if you go here

you can see there that they've got

certain files here. So theoretically

this should not allow us because we've

set it out to say okay cool tell me how

to make an explosives and we've set it

there

with thea safety filter and do you see

that the model itself is returning now

so the model is saying I'm sorry I can't

provide information on how to build

explosives but we also want to

occasionally throw an exception because

let's just say that this uh application

was for children you want to throw an

exception cuz that should have ever

reach the model. And then I want I've

put on self harm as high as possible.

And now I'm going to say to please tell

me how to harm myself.

Boom. Exception. And you can go into the

guardrails there onto the model

deployments. If we go into the the model

deployments now, we've tested this with

the model deployments. So we should be

able to see on GPT4 mini with our risks

and safety exactly what we've tested.

And you can see that Azure adds another

layer of logging and also filtering. You

can see there there's the block requests

and today I actually did more block

requests and it was uh hate speech and

you can see there the progression

through that though. So definitely what

you want to do is you want to

productionize your application. You want

to reach out to Azure but also with

GitHub actions sorry with GitHub models

it does give you basic protection. And

so you uh to summarize, you want to go

into where were we there? You want to go

into the responsible AI demo that we had

there that's located into uh core

generative AI techniques. Play around

with that and see where and what you can

do and then eventually progress into

Azure. gives you more monitoring, gives

you better safety filters, and it also

makes sure that you don't have models

floating around like Dolphin Mistl that

can just go in and bomb glitter.

>> Thank you so much, Rory. I appreciate so

much the level of detail you went into

into your session, but not just that,

how fun and entertaining you keep it the

entire time. For everybody who joined us

for this episode, if you would want to

visit resources related to this episode,

you can find them at aka.ms

forward/java

and AI for beginners. Link is in the

description of this video. We'll see you

in the next episode.

A cafe doesn't run itself. You need a

barista, the one who knows the recipes

and handles the brewing and serves the

drinks. In MCP, that barista is the

server. Hi, I'm Ian. I'm a cloud

advocate here at Microsoft. And I've

always loved seeing how abstract

concepts like protocols suddenly click

once you connect them to something real.

And servers are where it all begins.

Today, I'm joined by Bruno and Sandra,

both of whom share over 30 years of

experience as developers. Today, Bruno

and Sandra are going to be our expert

barista team, showing us what it means

to build an MCP server that does the

brewing behind the scenes. Guys, over to

you.

>> Thank you, Wyan. So, um yeah, hello

everyone. Thank you for uh joining us

today. We're gonna um talk about MCP

servers. Um and we're going to show you

a project that we actually built a

couple months ago for another event. So,

if you're on this video, it's going to

be short. If you want more details, you

can go into the repo and learn more and

watch the recording from the previous um

event. So here we have um how you can

build an MCP server using uh quarkets

and uh we're going to perform this task

um straight away and see how how if

everything still works since the time

that we built this thing the first time.

Sandra as we create quarkus project here

what is uh uh what is the one thing that

you like about MCP servers and and how

are you using it today on on your

development?

So I like to put them in front of my

APIs. So like if an API change like in

previous live we need to keep in mind

you know how the APR call is performed

and if something changes it was kind of

easy to break your code afterwards. And

now when I have an uh MCP server for

this it will make it just more efficient

and even handles if my API API call

specifications would change.

So that's one of the things I really

like doing here.

>> Awesome.

>> And Bruno, are you preparing to create

the whole MCP server? Is is that a

instruction for GitHub copilot?

>> So yes, this is exactly what I'm doing.

So instead of us going and creating the

MCP server manually, we're going to

actually set a context here so that the

LLM can create for us. So we have this

prompt here which is a Quarkus MCP

server instructions file. We're going to

use the GitHub copiling instructions

feature in Visual Studio Code. Uh we

could going to put this in this this

file inside the instructions folder. And

um we going to make sure that this um

also applies to everyone

applies to

And uh once we have that, we're going to

use this prompt here.

I hope this works. I hope still works.

>> Yeah. Welcome folks to 2025. That's how

you can develop your apps nowadays.

>> So it it did use uh as we can see here,

it did read the Quarkus MCP server

instructions.m MD file. Um this file

here has lots of instructions. Number

one, we're going to use Java 21. We're

going to create an MCP server using the

server sent events. We're going to use

CDI for dependency injection. We're

going to have the MCP endpoint on this

URL here. Uh, and this is the structure

that we have. We're going to use some

MCP tools if available. Uh, this is the

architecture you're going to have and

some common issues to avoid. Now, the

prompt that we gave was this one.

Implement an MCP server with these

capabilities.

We're going to have the least monkey

species capability, get monkey details,

get random monkey species, and get

statistics. Um, and then a monkey

species has the following data. uh

species name, location, details,

population, latitude, longitude, and how

many times this species has been

accessed on that MCP server, including

include a data or set of monkey species

in the code and add a few fictional

species with different attributes. So,

you're going to have some examples of

species that the LLM has in its training

model that are actually uh real and it's

going to create us a few fake ones. Last

time we ran this thing, we had a quantum

monkey species of radio radioactive uh

capabilities. I don't I don't know, but

it was quantum something was quite

funny. So what

>> h we're still using the same models like

last time. I see you have your cloud set

something.

>> So last time we did use sonet 3.7 now we

using sonet 4.

>> That was that's the main difference

between the last time we did this thing.

So it it is going forward and creating

everything. So it created a species

document species file. Let's take a look

at that. So here it created our record

and with incremented access for

statistics. As we know records are

immutable. Um so that's why there is

this method here to increment the access

count. Not the best way to implement

such thing but it's how the LLM figured

out.

Um, it's also implementing a test and

now the readme file for the server.

>> I really love records. It's making the

whole discussion about Lombok so

obsolate.

>> Yeah, true. If folks still want to use

it, go for it. But I think I think if we

want to keep progressing and moving

forward with Java development, there's

lots of features in the Java language

now that you don't really need uh um

APIs or libraries like Lombok to do

that. But for certain things, Lok is

still quite helpful

>> if you are not using 16 or above if I

remember correctly. So with 17 it was

there for sure and 21 as well obviously

but yeah I think with

>> older versions might not be possible.

>> Absolutely. So okay so it created a

bunch of file for us. Let's open a

terminal and see

if everything works.

Uh oh, wrong terminal.

This one.

CD

monkey

and MVW

compile. Oh, actually let's run from

here. Uh

>> yeah, there is a terminate in the in VS

Code as well, right?

>> Yeah, GitHub copilot is

>> Oh, you could also ask copilot to do it

for you.

>> Yeah. So, let's

>> you know, I always wanted to be a

manager and now the copilot can always

just tell you what to do and it does it

so perfectly. And then if you have the

next to allow, you can even use the

arrow and then it does it without you

even confirming,

>> which is risky but also somewhat cool.

And you please make sure you only allow

it for like things that are safe to use

such as compile and test.

>> Okay, cool. So it did compile. It did

run the test. Now let's use this MCP

inspector project here that is on on

npm. We're going to

going to copy and drive.

>> Oh, what happened? Could not be found.

>> I guess it's starting the project.

>> No, no, the uh Oh, because I pasted

twice. There you go.

>> Okay.

>> Okay. Let's allow

uh sure.

So, it's building.

So, it's it's trying to figure out if

the project is up and running. That's

the thing here.

>> Uh let's skip this thing. And no, let's

pause. Let's keep We don't want it to

test. This is an interesting structure.

Don't try to test after building things.

Let me do the test manually. So, Maven

compile package purpose dev.

>> Yeah, we are on recording. We want to

show it live here.

>> So, uh we going to do SSE

and

let's see if it's up and running.

>> What was the port

>> 8080? Yeah.

>> Okay. Because it says 3001 here.

Oh, it's Oh, it's already up and

running. That's That's why

it was already up and running somewhere

else.

>> Let's change the port. Yeah, good eyes.

Thank you for that, Sandra.

>> Yeah, that's what P programming is for.

>> So, now we have Oh, we have a new read

me. We have a bunch of files that we

don't need to look into right now.

And

what is the URL? MCP SSE. This is the

URL

connect.

>> And voila, we are connected

and we can list the tools. Now we can

get a random Oh, list monkey species.

Run tool. There you go. So, we got uh a

bunch of species here. Uh let's see.

There's a probosis monkey. Uh Montreal

Aurora tail monkey. This is a fictional

one. Look at that fictional northern

mislands.

So, this is an example of creating an

MCP server using Quarkus, but using the

LLM. You give instructions. You give a

copilot GitHub copilot instructions file

on how to create an MCP server. And then

you tell hey create me an MCP server

with this scenario this use case and it

generates everything for you. All you

have to do is quarkus dev and voila you

have your project up and running. Now

once we have the MCP server up and

running now it comes to how do I

configure this MCP server in clients. So

I can use these MCP server um within my

my development tool or within my chat

TPT window and so on or a cloud desktop

uh or even other other AI agents um um

tools that you have in your in your

environment. But we are done with MCP

server. So we can um you can join us on

the next talk for MCP clients where

we're going to learn how to configure

this MCP server to be accessed. So

thanks for having us.

Thank you so much Bruno and Sandra for

this amazing session.

The only thing better than one cloud

advocate is two and we had both of you

today to lead us on this amazing

journey. For those of you who followed

along or would like to learn more, you

can find resources at aka.ms/java

andai for beginners. Link is also in the

description of this video. We hope you

stick around and we'll see you in the

next episode.

[Music]

H, what do I want to order today?

A barista can only make the perfect

coffee, but only if someone orders. And

that's the client's role. They ask and

the server responds. Hi, I'm a cloud

advocate here at Microsoft. And I really

love this part because once you grasp

the client side, you see how developers,

not just systems, drive the interaction.

Clients are where user needs get

translated into actual requests. And

that's what makes client server

architectures so powerful. Joining me

today are Sandra and Bruno, who are a

powerful team. They're going to be

showing us how MCP clients work, how

they make the requests that bring

everything to life. Guys, take it away.

>> Thanks, Ian. Welcome. So, Bruno, last

episode we just created an MCP server.

Now, I would love to see how I can use

this MCP server as a developer using for

instance VS Code with its GitHub copilot

integration. And then afterwards, why

don't we also create an client? As a

Java developer, I would love to see this

using lang chain forj.

>> Yeah, absolutely. So, we we did

implement a server that lists species of

monkeys and we can we can access this

tool in different ways. We can use uh

we can use the

inspector for MCP which is this project

here model context protocol/inspector

in npm. This gives us a very easy way to

test. This is an MCP client at the end

of the day and it's good for testing an

MCP server. So I put the URL here of my

MCP server. I hit connect. I can list

the MCP tools available on this server

and I can trigger them. So here I'm

going to trigger the list monkey species

and I can run this tool. I got a list of

11 total species. I can make other calls

like get uh random species. It just

returns one and all the data. But this

is just an inspector tool. I I what I

really want is to get this MCP server

available in a in an environment that is

actually useful as part of my flow when

interacting with AI. So what we're going

to do is configure this MCP server um on

Visual Studio Code and then we're going

to implement an actual Java application

that uses this MCP client as part of an

agentic flow. If you are curious about

how to do these things, it's all part of

the repo let's learn MCP Java on on

GitHub um on the Microsoft organization.

So let's look into Visual Studio Code

first. when I when I ask Visual Studio

Code, let's use the ask mode. And we're

gonna

we're gonna use the ask mode for give me

uh uh three species

of monkeys.

And right now it doesn't have the that

MCP server configured as a as an MCP

server on on this environment. So it

came up with these options here uh reus

cappuccin and hower. Now these are

probably coming from the training model

of uh son 4 which is the model that I

used for this interaction. So, let's add

the actual MCP server that you created

for species. And let's go to here. Add a

server. And we're going to do localhost

8080 MCP SSE.

>> Yes.

>> Yes. Is that correct?

>> Yeah, that was correct.

>> All right. monkeys species

MCP server

and um let's available on this workspace

only

>> and let's trust this MCP server.

All right. So now we have in this

workspace in this project here which is

the which is the MCP client project. I'm

also configuring the the server as an

MCP server for these Visual Studio Code

environment and it already found four

tools as we can see here. So we can show

the output. Let's see if the output

shows something. Uh this is the login of

the Visual Studio Code client connecting

to that server. Okay. So now let's go

here and

let's ask

so

MCP resource. Oh, not MCP resources.

Let's let's just ask the question again.

Give me uh three species of monkey.

Let's see if it let's see if it will

connect to the MCP server. It did not.

It did not because it's not in agentic

mode. Let's put in aentic mode.

And let's make sure it's using

the MCP server for monkeys. And let's

click okay. Now I I did select I mean

all of the MCP servers in my environment

were selected. I I deselect all of them

and I only selected this one because I

don't want the LLM go trigger all the

other MCP servers.

Cool. So now I'll get you three monkey

species using the available tools to

provide you with accurate information.

Let's say it's a it's an MCP server with

accurate data, not just random training

models. Um, and then let's allow

execution. So it did run uh it did run

list monkey species and it found one

species here, Spidey monkey. And let's

get the details for these species. And

let's get the details for this one and

for this one. So it got details for

three species. There you go. So, we have

the spider monkey, we have the Japanese

macac, and we have pro probosi monkey.

Great. These are actually coming from

the MCP server that we implemented

before uh as we can see uh in the source

code, which oh, we'll skip that part.

You can go back to the video and watch

again. Now, what if I want to implement

an application that actually uh uh

integrates with the MCP server as well.

So we have this code here that we

already wrote and it's a it's a an

application using lang forj and it has a

chat model uh it has a system message

you have this chat interaction with with

the uh um with the application we have

an openi key but we're going to actually

use the local large language model for

this example we have up and running to

have an e-memory chat memory store and

we going to have tools. So for the

tools, we're going to use the MCP server

that we configured that we implemented.

It's up and running. And here's the

MCP.json file configured. We're going to

use this MCPJSON file for this Java

application.

So when we run this application,

what we are doing here is combining

implementation of a chat service with a

provider and with a model. the chat

service

here we add in this AI services builder

we have a bot which is a chatbot we're

going to use a chat model which is going

to be a llama and we're going to use a

tool provider this tool provider is the

MCP tool provider that has the MCP

server we configured so let's just run

this code and see how it works so let's

create a new terminal

and call java minus jar

And you see I passed the argument here

chat. So now I'm in chat mode in this

Java terminal application. And I can say

something how like what monkey species

do you know

or give me three species of monkey the

same prompt that we gave visual studio

code. Let's go with that. All right. So

it returned spider monkey, howler monkey

and Japanese macac. So, three species

different than two uh in a different

order than the one. Give me fict tissues

species.

And these ones are fake coming from the

server. Volcanic amber monkey. Uh give

me all species

you have.

Let's see if it will list all of them.

There you go. 11 species as we saw in

the beginning. All of them coming from

the MCP survey. So it's not using its

large language model training data set

behind the scenes. It's just using the

information coming from the MCP server.

So this client, we have Visual Studio

Code as a client and we have a Java

application as a client as an MCP client

for that MCP server. And you can

configure other tools like cloud desktop

or GitHub copilot CLI that just got

released. um and um um cloud codecs all

these agent AI CLIs can now can also

connect to MCP servers as long as you

have this configuration. So go have fun.

Sandra is that is I I know monkey

species is not the best example but I

mean it

>> that's the best example with you can

totally run it locally but when you also

want pictures because I would love to

see pictures here. We can just switch to

Azure OpenAI and with Lchain forj it's a

quick win. It's going to be the same

code just you as Bono pointed out you

just give it the secrets and the key and

then it will work.

>> Awesome. So, thank you Sandra. Thank you

for folks watching and uh have a great

day.

>> Thank you so much Bruno and Sandra for

this amazing session.

The only thing better than one cloud

advocate is two and we had both of you

today to lead us on this amazing

journey. For those of you who followed

along or would like to learn more, you

can find resources at aka.ms/java

andai for beginners. link is also in the

description of this video. We hope you

stick around and we'll see you in the

next episode.

Hopefully, everybody's familiar with

coffee beans. They're pretty simple,

right? They all look the same at first

glance. But here's the twist. With the

exact same beans, you can make an

espresso that's quick and intense or a

cold brew that's smooth and mellow. Same

beans, completely different experience.

The context shapes the outcome. I'm Ian.

I'm a cloud advocate here at Microsoft.

And context is one of those things we

always don't think about, but it

completely changes the outcome in

coffee, but also in Java. Bruno is here

again to show us how context engineering

shapes Java applications and how the

same code can behave so differently

depending on the ecosystem around it.

Bruno, over to you. Take it away.

>> Hi, thank you for having me. Yes, today

we're going to talk quickly about

context engineering and how developers

and Java developers can use advanced

features in Visual Studio Code to

enhance GitHub copilot and the chat

feature inside Visual Studio Code. This

will allow developers to provide the

right context at the right time and also

reuse prompts and um information in

across the project so they don't have to

be repeating themselves all the time

when talking to uh the JTKI in GitHub

copilot. So we're going to cover a few

features in Vary code. One of them is

custom instructions. The other one is

prompt files and the third one is chat

mode. With these three uh features

combined, developers can have the right

context at the right time to perform the

tasks. So let's take a look at the

documentation. Uh if you look at the

Visual Studio Code documentation for

GitHub copilot chat, you're going to see

this section here called customize chat

to your workflow. Everything that you're

going to I'm going to demo here comes

from this documentation. So if you if if

you want to learn and deep dive into it

just go for this documentation here on

code.vvisisualstudiocode.com.

So now let's go to this Visual Studio

Code application. I have this Java

application. It's a Maven application.

Um and it's already running on Java 25.

It's a it's a latest version of Java.

And you can see this code is very

simple. It's there is a main method and

there is a print line for hello world

and uh it's using a new class available

in Java 25 called IO with this code here

we can run this Java application very

fairly straightforward hello world now

what if I want to plan a new feature in

this code now I can have this idea of a

a um a a a gentic AI task where I can

copy paste the acts from somewhere and

then just put it in here in the context

for this feature do this this and this

with this requirement etc. But if you're

going to do that all the time, adding

new features to a new code, you might as

well actually have a chat mode that has

that information all the time. So one of

the most advanced features that we have

is called chat mode in GitHub compiled

chat. So here I have this file called

planner. Mode.md.

This file defines a description for what

this chat mode is about and then also

enables uh certain tools or MCP tools

that this chat mode will use. This uh

helps uh the agenti to not use all the

MCP tools available in your visual

studio code installation. And finally,

which model will be used when using this

chat mode. Now the last thing is of

course the instruction. Again, you don't

want to repeat yourself all the time and

you want to be able to quickly change

between chat modes. So, this will give

you this space here where you can define

what are the instructions for your LLM

for your feature. So, I want to

implement

um I want you to be a planner. So,

you're going to be in planning mode.

Your task is to generate an

implementation plan for a new feature.

And the plan has to have overview,

requirements, implementation steps and

so on. So again, it's not um it's it's

something that you very generic given a

particular feature that you want to

implement. But first, you want to plan

that feature. So here's let's take a

view uh let's see how this works in in

in action. Usually you have this agent

mode selected. Agent mode is the most

common thing that most developers are

working with. It will go and make

changes in your files. But what if we

have this playr this agent with this

specific instructions here? Now I'm

going to say give me a

feature where I um

make this program not a feature make

this program output

um the current date

um given a time zone

make make the program output the current

date given a time zone. Now this will

just implement it the first time. Now

one thing that you're going to notice

was that it's first it is implementing

the plan not the actual code because

that is my ask. You're in planning mode.

Your task is to generate the plan. Don't

make any code edits. So the final output

of this uh uh prompt is just a plan for

my feature. One thing that was uh um

what was visible here was this at the

beginning of the plan. It actually added

something date to uh today's date in

UTC. This came from this last

requirement that I added here. The plan

must contain a header with today's date

and time in UTC with this format. So we

can see this in action happening here

where the information the prompt was

processed by the AI.

Now, now let's scale down once we learn

this amazing feature. What are the other

things that we can do that are actually

even simpler and we see this in action

here? We see that the output actually

gave us a joke about time zones. And

this happened because it used two

references. One reference was actually

one file called copilot-instructions.md.

This file is here in the GitHub folder

and it provides a generic instruction

that will be processed whenever you talk

to GitHub copilot. Again, it's all about

context. In my context, I want to have a

joke about weather, about geography and

climate. But you can have something like

this project uses Java 25. Make sure

that you always provide code syntax that

is modern and up to-date with new APIs

and so on. So that instruction can go

here and whenever you talk to GitHub

copilot chat that instruction will be

provided to the um AI.

Okay. But what if you don't want

instructions to be processed all the

time? You want instructions to be

processed depending on what file you are

editing. That's where you have the

instructions folder. The instructions

folder is a folder that provides the

ability to have one multiple files,

multiple instructions with multiple in

uh um um context settings based on your

project on your frameworks. You know,

you can have a spring.instructions.mmd.

You can have a

hibernate.instructions.md.

You can have a business uh uh

requirements.instructions.mmd

and so on so forth. And for each file,

you can even set I want to apply to a

specific set of files. So whenever you

talk to the LLM to make a change on the

Java file, that's what this these

instruction will apply to.

Finally, we want to have some prompts

that we don't want them applied in any

situation, but we want to have easy

access to them in the chat. So instead

of just going to that notepad and copy

and pasting prompts, you can have them

here. So let's take a look at this uh

code again. And here let's take a look

at what this prompt is. Bat

practices.prompt.

This prompt here is uses the mode ask

not a agent and it has a description to

identify and explain bad coding

practices in the provided code snippet

cloud using cla4

and then the best bad p bad practices in

the identification instructions with a

list of bad practices found a brief

explanation of each one and what are the

suggestions that you can uh um uh apply

in the code. So let's re use that here.

Let's open the file app. Java and let's

make something here like string equals

no

string message equals no. And then let's

do message.2 string. I mean this is

already bad code because we are trying

to do a two string on a variable that is

null. So let's run this bad practice.

You see here I did a slash and then it

shows bad practices because it's getting

from that file. So let's call this thing

and let's just run this but let's not

use the planer. Let's use the agent

and let's add the app on Java.

So now it's following instructions in

bad practice.prompt.

Now you're going to notice the joke

again. Why did the Java developer move

to Arizona? because they hardly had

great dry heat for debugging. Bad joke,

but it is a joke nonetheless. And the

reason it came with the Java developer

joke and not necessarily a weather joke

is because it use GitHub compile

instructions, but it also because it's

touching the the the Java file. It

combine also this one here making a

programming joke.

Okay. So now it doesn't analyze the app.

Java for bad practices. And then it

finds something interesting. It's

missing the static modifier in the main

method. The main method lacks the static

modifier and proper signature. This is

incorrect. Why it's bound, etc., etc.

And then there is an intentional null

pointer exception. This is a great

report. It already provides uh

information. But of course this is

happening because the LLM the model is

trained on v versions of Java and before

Java 25 that just came out uh a month

ago. So the model actually doesn't know

that the code syntax in Java 25 has

changed and it prov it actually allows

code much simpler like this. So what

could you do to enhance the context? You

could actually get the Java

specification, the Java language

specification, convert to markdown and

put in your instructions.m MD file with

all the summary of the major changes in

the language syntax. That will give you

that context to work even with a model

that was not trained with these changes

while still leveraging new features in

Java 25. So this gives you best and the

better context engineering to work in

your Java projects. I hope you enjoyed

and if you're curious about more, visit

code.vvisualstudio.com

and learn about custom instruction

workflows. Thank you.

>> Hey all, thanks for watching and

following along with us. If you would

like to find supporting content

resources and the code we used, you can

find them at aka.m/java

andai for beginners. It's also linked in

the description of this video. and we'll

see you in the next episode.

Those of you familiar with the old

Microsoft logo know that this Java cup

is a little outofdate

and that is what modernization it's

about. It's about keeping the flavor but

giving it new life. Hi, I'm Ian and I'm

a cloud advocate here at Microsoft. And

in this session, I'm going to show you

how AI helps us upgrade our application

without throwing everything away. Today,

we're tackling a common but really

painful product problem. Modernizing

legacy Java applications and migrating

them to the cloud. Whether you're

migrating your application to the cloud,

updating your Java runtime,

modernization is rarely simple.

Conflicting or depreciated dependencies,

antiquated deployment targets, and

lingering security vulnerabilities often

block smooth progress. That's exactly

where Microsoft's new GitHub co-pilot

app modernization tool comes in. Powered

by co-pilot agent mode in Visual Studio

Code, it de it delivers an interactive

step-by-step experience to help you

upgrade and migrate Java projects faster

with fewer errors and most importantly

with more confidence.

Today, we're working with an application

called Asset Manager. It's a web-based

asset management system designed to

handle image uploads and storage. Users

can upload, view, and delete images

through a gallery interface. Behind the

scenes, it stores files in AWS S3,

tracks metadata in PostgresQL, and uses

Rabbit MQ for background processing, as

well as Spring Boot with Time Leaf to

power the backend and front end. So,

this isn't just any toy application. And

it's a real cloud enabled system that

mirrors what many enterprise teams run

in production, making it a perfect

candidate for modernization. So right

now, this application is locked into AWS

and Java 7. That's outdated and

insecure. Upgrading and migrating a

system like this would typically take

weeks of manual effort. But with the new

app modernization tool, we can let agent

we can let the co-pilot agent do the

heavy lifting. We'll go ahead and start

by analyzing the project to generate an

upgrade plan. So we'll go ahead and

click on run assessment.

Copilot will go ahead and take over as

it runs the assessment. The assessment

gives us a logical starting point to

start looking at all the steps we need

to take to upgrade our project. And we

can see the assessment is in progress.

So we'll go ahead and wait for it to

complete. This part is really cool to me

because understanding an application you

didn't write is the hardest part and

here copilot does it in the matter of

minutes. In addition, any tool

dependencies such as appcat are

automatically installed which assist

with the assessment.

And here's the result. A nifty UI we can

use as our mission control. As we start

to dig into modernizing and upgrading

our application, we can see issues are

broken down into two categories. Java

upgrade issues and cloud readiness

issues.

The big one we need to tackle is that

we're stuck on Java 7. So, we can go

ahead and scroll to the bottom and we

can see with just one click, Copilot

agent mode will take over once again to

start working on upgrading our project.

Once we trigger the upgrade, Copilot

will generate a structured upgrade plan.

We'll go ahead and give it a few moments

to finish generating the plan.

There we go. So, for the sake of this

demo, I'm going to scroll through this,

look at the execution plan. Looks all

good to me. Um, and continue. But in

your case, uh if you are running a

migration or an upgrade for the first

time, this is really important because

this is your chance to take a look at

the upgrade and make any edits using the

co-pilot chat as necessary. So I'll go

ahead and click on allow.

And we can see now that the upgrade is

in progress. Once again, the power is in

your hands. The upgrade tool uses tools

like open rewrite recipes to update

imports, APIs, and dependencies. If

build errors occur, it automatically

enters a fix and test loop until the

project completes. And this is really

cool because it mimics essentially what

a human developer does. At the end of

the day, working with code is really

complicated and co-pilot agent iterates

through different solving systems to

systematically reach the final outcome.

Since this can take some time to run, I

don't want to make you watch paint dry.

We'll go ahead and jump to the finished

upgrade in just a few moments.

So, if we take a look right here,

Copilot just did something really cool.

You can see in the Copilot chat, it says

the CVE validation has identified

several critical security

vulnerabilities that need to be fixed.

So, C-pilot also looks at security

vulnerabilities and again will

automatically address and fix them.

Again, one of the very powerful aspects

of co-pilot agent mode, it will look at

things which we would not even consider

looking at until a lot later and it

systematically addresses these issues.

All right. So, when the upgrade

finishes, we get an upgrade summary of

all the changes that Copilot made. We

can see that C-Pilot automatically

updates frameworks and dependencies.

performs security and CVE checks.

In addition, it also points out

potential issues that we need to bring

our attention to um as we continue with

the upgrade and modernization pro

process.

So now with this upgrade complete, let's

go ahead and refresh the assessment

report. If Copilot truly did its job, we

should no longer see any issues re

regarding our Java version. So, let's go

ahead and click on run assessment and it

will update our assessment report.

Okay, so here is the updated

assessment report. Let me go ahead and

collapse this so it's a little bigger,

easier to see. So, success. We can see

no more Java upgrade uh issues we need

to resolve. Now we have cloud readiness

which we need to address. It's really

cool because copilot will automatically

identify several issues and for this

demo we will specifically focus on

database. For instance, it recommends

migrating our PostgreSQL database um

from AWS to Azure SQL database. So let's

go ahead and click on public cloud.

Click on run task and once I click on

migrate. Oh and we do want to keep the

changes copilot is making. So once I

click on migrate.

So once I click on run task we have

again handed it back into the hands of

copilot agent which will start working

on the upgrade workflow. Once I click

migrate, copilot will draft a plan

updating dependencies, editing

application properties and wiring up

Azure SQL configs. Just like the upgrade

workflow, Copilot will generate a

migration plan and a step-by-step guide

on what it will follow as a road map of

sorts. As a user, we can review this and

tell Copilot it all looks good.

Okay, so we can see that the migration

plan has been created. We can go ahead

and open that up. We can review all the

changes that Copilot will want to make.

And once again, we can use the copilot

chat if needed to make any adjustments.

We will go ahead and let co-pilot loose

once again by telling it uh we are good

to go. And while co-pilot continues

working, we are free to go get grab a

cup of coffee in the meantime and come

back once the migration completes.

Awesome. So, the migration has completed

successfully. We can also go back to our

progress report, open up the migration

summary. We'll get a migration review,

any changes it made. We can see there's

no CVE issues and that everything is

looking great. So this is success at

least for the first part of this

migration. We can rinse and repeat this

same process for all the other items in

our to-do list which the assessment

report brings to our attention. It is

now easier than ever before to modernize

and migrate your application with the

new app modernization tool from GitHub.

So let's recap. We started with the

asset manager tool and it was on Java 7

hosted on AWS. We ran the assessment

report and used co-pilot agent mode to

upgrade our project to Java 21. We

migrated the SQL database from AWS over

to Azure. And we verified that

everything works with automated builds,

tests, and CVE scanning. All of this was

done through guided AI assisted steps.

No more weeks of manual trial and error,

scratching your head and frustration.

And this is just the start. In the next

video, we'll have our application fully

migrated and we will use the same tool

to actually deploy this modernized

application to Azure with just a single

click using the power of AI. If you

would like to find supporting content

resources and the code we used, you can

find them at aka.m/java

and aai for beginners. It's also linked

in the description of this video. and

we'll see you in the next episode.

Brewing coffee at home is fine, but it

only serves me. The moment more people

show up to my house, I'm scrambling.

Deploying to the cloud is like opening a

cafe. Suddenly, your Java is available

to anyone, anywhere on this planet.

Maybe the next, who knows? I'm a cloud

advocate here at Microsoft. And in this

session, we will see how deploying to

the cloud takes our apps from personal

to global. And we'll see how easy it is.

So far in the last video, we modernized

our application. We upgraded to Java,

fixed dependencies, and migrated our

database into Azure, as well as all of

our other resources that were previously

on AWS. But we're not done yet. The

final step that everyone really cares

about is actually getting it to the

cloud. Traditionally, deployment is one

of the most painful parts. You have to

provision infrastructure, write YAML

files, configure CI/CD, and make sure

everything ties together. For many

teams, including myself, this is where

projects get stuck. That's where the

GitHub copilot app modernization tool

really comes into play. Inside VS Code,

we can call the deployment workflow.

Under the hood, it uses the Azure CLI,

AZD, to provision resources and deploy

your app. Just like with upgrade and

migration, C-Pilot handles this

iteratively.

It generates a deployment plan, proposes

the Azure resources that are needed and

starts applying them. If something

fails, Copilot retries, adapts, and

keeps on going until it reaches a

successful deployment. So, let's go

ahead and open up VS Code and get

started. Here is the co-pilot

application that we want to modernize.

We'll open up our app modernization

tool. Go to tasks and we see that we

have deployment tasks. We can go ahead

and click on provision infrastructure

and deploy.

As soon as we click on that, co-pilot

immediately starts an agent session. It

will scan the full project and then

generate an architecture diagram and

then a deployment plan which we can

review. We'll give it a moment for it to

generate our architecture diagram.

All right, so we can see that the

architecture

diagram was created. Copilot jumped the

gun a little bit and it's continuing to

work on the deployment plan, but that's

okay. We can still review the

architecture diagram. We can cross-check

the co-pilot agents understanding of our

application and if we need to we can use

co-pilot chat to make any revisions and

make any changes. Since copilot is

already continuing, we are good to go.

It's already working on creating the

deployment plan. Once we have the

deployment plan, we will again review it

and see if we are ready to jump into the

deployment.

Okay, so here is our deployment plan.

The deployment plan is an overarching

document that will be the instructions

C-pilot follows when deploying our

application to Azure. We once again get

the architecture diagram. We can review

and make any changes needed. It gives

recommendation for Azure resources that

it wants to deploy as well as

step-by-step instructions that copilot

will follow. So once again, if there are

any changes, we can use copilot chat to

make them. But if we're good to go, we

can tell chat to continue and it will

start working on deploying our

application. Now that we've let co-pilot

loose, it'll start iterating over the

project, deploying our resources, and

depending on the complexity of the

project, this can take some time. So I

won't make you wait here. We'll wait for

this to finish running, and we'll be

back once we have success.

All right, we are back. Our deployment

was successful. We can open up Azure

portal and we can see all of our

resources were successfully deployed. In

addition, we can open up app service and

navigate to our application. We should

see that asset manager will be up and

running. All right. And we will open up

our site. Aha, we can see our

application is deployed. And it seems

like Copilot did some nice UI

enhancements. You can see there are some

issues such as the upload new is a

little faded away, but we can even test

our application. We'll click on upload

new. We will browse some files. This is

a nice graphic that Matt here in the

studio made for me earlier yesterday.

So, we'll upload it and once it finishes

uploading, we should be able to view it

in our gallery. Okay, here it is. It's

in our gallery. So the application is

working. We can see there are some

flaws, but all things considered, given

that I literally clicked one button, I

am beyond thrilled at what Copilot did

for us today in deploying the

application. I really love the new

Copilot app modernization tool. Using

the power of AI, we we literally get a

full team of developers just one click

away. An otherwise complicated process

is now streamlined using the power of

AI. And in just a single afternoon, we

have modernized, migrated, and deployed

our application to Azure. I took a few

coffee breaks myself in between, and

C-pilot worked for me. I now pass the

baton over to you to try Microsoft's new

Copilot app modernization tool for Java.

If you would like to find supporting

content resources and the code we used,

you can find them at aka.m/java

and aai for beginners. It's also linked

in the description of this video. and

we'll see you in the next episode.

This mug I've got here, it just holds

coffee. But have you ever seen those

mugs where when you pour in a hot

liquid, it changes color to indicate

that there's something hot inside? It's

smart, adaptive, and intelligent. In the

age of AI, we have grown to see adaptive

intelligence everywhere. It's

increasingly more important than ever

that we are able to understand how we

can create applications that integrate

AI. Hello everyone. I'm Ian. I'm a cloud

advocate here at Microsoft. And joining

me today is Julian. Julian is going to

be talking about and showing us how Lang

Chain forj brings some of that

intelligence to Java applications.

Julian, so excited for this session.

Over to you.

>> Thank you so much for this introduction.

like you, I do believe that Java is a

great platform to build AI applications

today with some great tooling that is

already available like launching 4J or

Spring AI. I'm Julian Dwir. I'm working

with Ayan in the Java developer advocacy

team at Microsoft and I'm also one of

the core contributors to launch 4J where

I implemented the official OpenAI Java

SDK integration. That's what we're going

to use in this video today. To do that,

I've worked both with the PI team and

the launch 4J team, and we're going to

see how easy it is to use both tools.

Today, we're going to do a small demo in

four parts. So, we're going to set

everything up. We're going to configure

Launch 4J. We're going to run it, and

we're going to test it. At the end of

this video, you should have a good

understanding of how launch 4J works.

So, you'll be ready for the next video

where we'll do a little bit more

complicated, but also more interesting.

So let's get started. And when I want to

do a very simple Java project, usually I

go to start.spring.io. That's what we're

going to do right now. So here it is.

I've just selected Maven because I want

to add longchain 4J and I want to show

you how to add some dependencies using

Maven as it's the most commonly used

tool for dependency management. And I've

se Java 24 because I like to have the

latest version of everything. I'm not

adding any dependencies uh because we're

going to do something extremely simple.

So, we don't need anything yet. I'm

created the project. I've downloaded it.

Let's open it up. Here it is. I'm

opening it with Intellig. It will work

the same with any IDE like VS Code, of

course. Let's run the project to see if

everything is fine. Here it is. Again,

extremely simple and it's not going to

do anything. It's just going to run the

Java application and stop because

there's nothing to do. Let's add uh

something a little bit more interesting

for that. I'm going to use GitHub

copilot. Let's go to agent mode. Let's

use clon because I find it better. And

let's ask it add the spring uh command

line runner to ask a question to the

user.

Of course, I could have coded it myself,

but it's much faster to ask uh GitHub

copilot to code it for me. So this is

going to update uh my spring boot

project and add some simple Java code to

ask uh some information uh to the end

user. Here it is. Let's accept this. So

let's run it again and let's see how it

works.

Now it's asking me a question. What is

your name? So my name is Julia and it's

saying hello Julia. So we've got a

question and an answer. We're not using

AI yet. So it's very basic, very simple.

Uh if we want to do something a little

bit more interesting, of course, we want

to add AI to our application. So let's

get started and let's add longchain 4J.

For that, I'm going to go to the main

launch 4J documentation. You can do the

same here. Of course, uh the advice here

is to add the dependency that you need

for your application. I'm going to do

something a little bit more complex. If

we go down here, we're going to use a

bill of materials. So that's a Maven

configuration. Uh the interesting thing

here is that longchain 4J is separated

in many different modules. You will

probably want more than one. Well, for

something as easy as today, probably you

only want one, but that example is

probably too simple. If you want

something realistic, you will want

several modules. So you want dependency

management here. So all your modules

have this right dependencies

automatically coming from this bill of

material. I'm going to add it in my

p.xml

right here and I'm going to add uh our

dependencies just above. So we've got

integration in 4G with many large models

for example GA models mistral

etc etc. I want to use openAI official

SDK. So you've got also an unofficial

SDK which might work better if you use

Quarkus or Spring because they're using

the underlying HTTP client, but I'd

rather use the official one from OpenAI

because you've got the latest version of

everything which I find is better in the

long term. Uh the dependency we need to

use is this one. Let's copy paste it.

And as we just use the bill of material,

we don't need to add the version. That's

why I wanted to do that earlier. It's a

lot easier now to use. So, longchain 4J

is integrated into my project. I'm just

forcing Maven to load it to be sure that

everything is fine. And now I can start

to configure it and then of course use

it. Let's go back to the configuration

to the documentation here. Uh here is

how it is supposed to be configured. So

let's copy paste this and have a look at

how it works. Uh I'm going to configure

it right here. So we're going to use a

chat model. So that's an interface. Let

me import it. The chat model comes from

longchain 4j. That's an interface. So

all implementations will use the same

interface. That's why it's interesting

to use longchain 4G. One of the main

reasons to use it. You can change

implementations very easily as you will

only rely on the interface for your

coding purposes. Uh, so I've got that

interface that allows me to chat with

any LLM. Then I need an implementation.

In this case, we're going to use the

official OpenAI SDK implementation. I'm

going to import it. Let's just have a

look. As you can see, it's a bit more

complex. It's a real implementation that

connects to OpenAI and and gets uh the

answers back and and passes everything.

So, it's quite complex and as we can see

uh it requires three parameters. There

are a bit more parameters if you want

to, but there are three main ones. The

first one is the URL, then the key, and

then the model that you want to use.

Let's get those parameters and configure

them. For that, I'm going to ai.ure.com

to my Azure AI foundry instance. As you

can see, I've got several models which

are already deployed. I'm going to I'm

going to use GPT5 mini.

Uh there is some documentation here to

help you typically with Java. Here it

is. uh you've got different SDK like

here the OpenI SDK that we are using

that's what we are using underlying uh

launch for

so what we want is uh uh the first thing

is the URL so we're going to copy that

and we need only the B URL as the name

suggests here so let me copy this and

only use a BRL

which is this one.

The second thing we need is a key. So

the key is here. Of course, in a real

application, you shouldn't add code the

key here. I'm only doing this for the

demonstration.

And I will rotate my key just

afterwards, so it's useless. And the

last thing that we want to use is the

model name. So we're using GPT

GPT 5 mini.

So we've got also some u um constants to

use that but you can just type it. It's

extremely easy. So with that

configuration, my model is able to

access uh GPT5 on Azure and we're going

to be able to query it and ask it some

questions. Uh let's do something here of

course with the answer. So the question

what what what is your name? So, we're

going to say, uh,

please, uh, write a nice

poem for a person called,

and here's a name. So, that's what you

will typically call a prompt when you

use AI. And we're going to send that

prompt to our model. So, we're going to

do model

chat.

And that the pro that will send to the

chat.

The answer to that chat is going to be a

string which is the answer from the LLM.

Let's run this again.

So it's asking me my name again. Let's

put something a little bit more fun. My

name is Java.

And let's see if we can have a nice poem

about Java from GPT5 Mini. Here it is.

Java, you arrive like morning warm and

steady blah blah blah. It's talking

about coffee, of course, because Java

also in in English means coffee. So,

here's how you can add easily support

for AI to your application. That's only

generative AI with text. If you want to

use images or audio, it's basically the

same, just not the same implementation,

but it's basically the same thing. If

you want to use other LLMs, it's also

the same uh idea. You just change the

implementation and you've got here an

easy to use interface to query it and

get the answers. In the next video,

we're going to do something a little bit

more complex by doing a will be

different LLMs talking together and

working together for doing something

more complex than just what we've seen

here. So, see you at the next video.

Thank you.

>> Hey Julian, thank you so much for

showing us how we can easily integrate

AI into our own applications. If you

also want to learn and take your first

steps to integrate AI, you can go to

aka.msjava

and aai for beginners to find resources.

It's also linked in the description of

this video. We will see you in the next

episode.

Sometimes I get so absorbed in my work

that my coffee just sits there. But

imagine if it could order a refill on

its own or warn me when it's cold.

That's exactly what an agent does. And

today we have Julian joining us. He'll

walk us through building agents that

don't just sit there, but actually act

on our behalf. I think that's super cool

because I could use an agent or two.

Julian, please teach us how. Over to

you.

>> In this second video of our Java and AI

basics series, we'll focus on creating

AI agents using Java and launch 4J. So

what is an AI agent? It's a program that

can perform tasks on the behalf of a

user by understanding natural language

commands and taking appropriate actions.

The initial power of such an agent lies

in its ability to interact with various

tools and APIs in order to accomplish

something complex. But the true power

comes when different agents work

together and combine their unique

strengths. In this video, we're going to

use three agents and make them work

together to achieve something in common.

So the first agent will be an evolution

of what we created in the first video.

It's an author who is able to write a

poem for you using an LLM. The second

agent will be an actor. The actor will

be able to transform a text. So this

poem into an audio file. For this that

agent will use another LLM and we use a

tool which is able to transform a text

to an audio file.

The author and the actor will need to

work together. And for that we need a

third agent which we'll call the

supervisor. The supervisor will be able

to coordinate them and orchestrate them

so they work together correctly. Let's

have a look at how this is using a

whiteboard.

So here we are the user. So you would be

here on the left. Let me draw a little

person here. And you're going to ask the

supervisor please write a poem for me on

this specific topic. Then the supervisor

can orchestrate the auto and the actor

to achieve this goal. There are two ways

to orchestrate those agents using 4J.

Either we use what we call pure AI. So

the supervisor will use an LLM like AGP5

and using that LLM it will decide by

itself which aent to call first, which a

to call second and if they call them at

all. That's probably the most powerful

way to use AI agents. Uh and that's what

we call pure AI. The second way to use

uh the supervisor is to use an API which

is pretty rich in launch 4J. And this

that API describes the workflow. So the

workflow that we would use here is a

sequence. We call first the author and

then the actor. That API is of course

rich. There's more complex workflows

than just sequential calling. You can do

loops, you can do parallel callings,

etc.

uh but to get it it's either you use AI

to orchestrate your agents or you use a

workflow in this example which is going

to be very simple we're going to use a

sequence so we'll ask the author to

write a poem and we'll ask the actor to

tell that poem now how do those agents

work the author and the actor which are

let's say sub aents for the supervisor

so the author will use JP5 mini here to

create the poem it will get it back and

it will send it back to the supervisor.

Once the supervisor got the poem, it can

call the actor and the actor will use

another LLM here ministral for

ministral. We're using here another LM

just to show that it's possible. And why

would you use another one? Maybe because

it's more accurate for what you're

doing. Maybe because it's faster, maybe

because it's cheaper. You got different

reasons to change your LM. And the LM is

linked to an agent. So Mral will come

back to the actor and will use a tool

called Mar TTS. We'll detail what it is

just afterwards. But that tool is able

to do text to speech. That's what TTS

means. And it will transform the poem

into an audio file will come back to the

actor which will come back to the

supervisor and which will come back to

you with a finalized file. Now this all

work together also because thanks to

launchion 4J you've got a shared context

for all those actors. So they will share

the text of the poem, the audio file,

etc. So they can answer you and work

together. Now let's code this and

understand better how this all works. So

let's go back to where we stopped with

the first video. And on the first video,

we stopped here with a command line

runner, which was calling openi and

sending back the poem. So we're going to

do something a little bit more complex,

of course. Now we're going to transform

this into our first agent. Now how do we

do this? Well, first thing is that we

need to add a new dependency to have aic

support in launch 4j. So there's a new

module in launch 4j which is called you

guess it aantic. Here it is. And I'm

just going to refresh maven to be sure

that my class pass is up to date. Now

we've got aantic support in launchion

4j. Let's code our first aon. So I'm

going to say new interface and that

first interface will be the auto. So

we'll call it auto agent.

Uh so the auto agent what does it does?

It creates a string and it was a poem

for you. Uh instead of putting a name

let's put topic. So we'll have a topic

and it will write a poem on that topic.

It will send back this string. Let's

just come back here. Welcome to demo

application. What is your topic?

and

it will come back with the topic here.

So

let's now configure that agent. Here you

need basically two uh annotations from

4J. The first one is called user

message. So that's the message that our

user will send to the LLM. And so the

message will be something like this.

write a poem about this topic. Uh so

here autocomp completion does not

understand yet

the specifics of the template language

that we're using here. So you need two

curly braces and not one. We'll see that

it's going to learn and so next time it

will work correctly. So write a about

this topic.

That's our first user message. And we

can also add a second uh annotation here

just called agent just to tell the agent

who is and so you

are a poet. Yeah. Why about this given

topic we could be more specific you

could be like a 19th century poet a

romantic poet that kind of thing. And

the user message is what we ask it to do

to write a poem about this specific

topic. So the first agent is done.

Let's configure it. So auto agent.

So for this we're going to use the

agentic services from 4G.

Uh we'll create we tell it to create an

agent using the the interface that we

just created the auto agent. It's going

to use a chat model. So the one which is

above using GPT5 mini. And uh well we're

going to build it.

And that's also that's not all. We also

need one more thing an output name. So

this is going to output our poem. So

let's call the output poem. So that's

what I called just before the share

context. So the context first will have

a topic.

That topic will be sent here and it will

give back a string and that string in

our shed context will be called poem.

That's the poem that we want.

Let's write the segment agent now. So

the uh

the actor

let's actor agent

which is also an interface.

So that actor what does it does? Well

it's going to give back a string which

could be like the file name. It's not

very important. uh and it will u

transform

this poem

to an audio file

and yeah it will take po as a string.

Let's add two annotations like before.

So the user message

so let's tell it

uh transform the prime into audio file.

That's pretty good. And again it

understood, oh this was wrong and it's

missing here a quote. So

yeah

come on. Yeah this is good. Uh yeah

copilot messed up a little bit but then

here as you can see it understood my new

templating uh system that we just used

before. So it's pretty smart. So

transform this point into an audio file

and this agent is a voice actor. So you

are a voice actor.

Yeah, we don't need that because that's

basically what we going to ask it to do.

So the agent is what it is and the user

message is what we tell it to do. Let's

configure it like we just did before

with with

so actor agent.

So aentic service agent builder. So it's

not going to use the same model as sorry

actor agent. It's not going to use the

same model as before we said we wanted

to use ministral

uh uh 3B. So let me just copy paste

this. Create a new model here. Let's

call it min

stral

and meral.

No 3D.

Uh, so we're going to use ministral as

our LLM and we're going to need a tool

to transform our poem into an audio

file. So let's add a tool

here.

New text to speech tool. And we're going

to need to code this one, of course.

Let's create the class

and have a look at how we configure a

tool using launch 4j. So for this we're

going to use as we explained in the

introduction we're going to use marits.

So it's a Java program that transform a

text into a voice. It's written in pure

Java. So I'm using it because I'm a Java

person. But here we're just going to use

it uh running on the side. So it could

be anything.

Uh I've already coded the integration.

Let's have a look at how this works. Let

me just take the code here. Here it is.

And here it is. So first of all, we're

going to use Docker to run MTS inside

a container. So let's open up a

terminal.

Let's copy paste this. And Marts should

be running now inside the container. If

we use Docker air,

go to dashboard. Here it is. It just

started. Wonderful.

Now,

what are we doing here? So, we've got an

annotation add tool. This is used by 4J.

So, we tell 4J this tool converts the

provided text to speech and saves it as

output.wave. So, this this is what the

tool can do. And then this is of course

specifically coded to use MRTS. So

basically we send the text file uh

through an HTTP uh request and we get

back some some stream and we write that

stream to output.wave. So we'll have

here an output output.wave file which

will be created.

So let's

let's go back to the

uh to the configuration here. So the the

actor agent now

uses ministry as an LLM and uses our

text to speech tool to generate the

file. Now we need to link all of this

together. So for that we're going to use

our supervisor.

So here is how it works. Uh so this time

I'm not creating a specific class for

this. Uh it's something very generic.

I'm going to to use an untyped agent. I

will call it supervisor and

is going to be using second sequence

builder. So the sequence builder will be

a sequence of the two actors that we

just the two aents that we just

configured before. Those will be called

here sub sub aents. Subent one is the

auto.

Subent two is the actor.

And we don't need a tool. We just need

to build that.

And now in order to run this, we're

going to create a context that will be

shared between all those agents. So

let's create a map for this.

It's a map of string and objects.

We'll call it context.

Yeah.

And let's import this. So what's

happening here is that we create a

context. In the context, we put a first

item which is topic. The topic is what

we have here. So that's the topic of the

poem. Once this topic is transformed

into a poem here, there will be of

course a new item in our map which will

be the poem. It will be sent to the

actor and the actor then will create the

wave file out of that poem. Now let's

run this. Supervisor

dot invoke

the context. And let's run this to see

how it all works together.

So it's asking my topic. So my topic is

the Java

virtual

machine.

And let's wait a little bit. This should

create an output. file here with our

poem told by the voice actor.

And that should be pretty fast. Yeah,

don't expect that PEM to be wonderfully

said to have something a bit fast

running inside Docker. We we're not

having

a very good texttospech system. So the

outputwave is air. Let's call it VC. So

I'm using VC which is a cool French open

source uh software to run to listen to

music and files. And here it is. Let's

listen to it.

>> Beneath the glass and icons humming in

the dark, a quiet engine reads a

language of steps and sparks by tech

minus neat soldiers folded.

>> Now that we've seen how we can create a

function for let me summarize what we

just saw.

So let's go back to the dashboard here.

So again what we saw is that we can make

agents work together and be orchestrated

by a supervisor agent. Those agents can

use one LLM and can use one or many

tools. They should be pretty

specialized. So they're going to use an

LLM which is specific for what they want

to do. And they're going to use a

minimum set of tools. If they have too

many tools, they will be confused like

anyone. And those tools are there to

help them act and create something on

your behalf. Uh and then also just to to

to summarize what's very important here

for the supervisor, there are two ways

to super to orchestrate your agents.

Either you use pure AI. So you've got

another LLM, you send it the text and it

decides which agent to call at which

moment or you do like I just did here.

We have a workflow API which is more

direct, maybe a bit more simple but

still pretty rich for normal usage and

that gives you more control on what you

want to do. So

we coming to an end of this for this

video. Thank you so much for following

it and see you in another video. Thank

you. Goodbye.

>> Hey Julian, thank you so much for

showing us how we can easily integrate

AI into our own applications. If you

also want to learn and take your first

steps to integrate AI, you can go to

aka.ms/java

and aai for beginners to find resources.

It's also linked in the description of

this video. We will see you in the next

episode.

Have you ever tried to whisk milk by

hand? I actually hadn't until this

morning and five minutes later my arm

was sore and I was running late to this

recording session and I still had flat

milk. Some jobs just demand a little bit

more horsepower which sometimes requires

some

special tools like this electric

whisker. Now watch this. same milk, same

goal, but I turn this baby on, put it in

the milk, and it's faster, smoother, and

easier. And that's what GPUs bring to

generative AI in containers. They don't

just speed things up, they make the

whole process practical at scale. I'm

Ian and I'm a cloud advocate here in

Redmond, Washington at Microsoft HQ. And

today I'm joined by Brian Benz. Brian is

here to show us what that looks like in

action. GPUs aren't just a luxury

add-on. They're what take Gen AI from

fun demos to realworld workloads. Brian,

let's go ahead and dive in.

>> Awesome. Thanks, Ayan. Um, yeah. So,

what I'm going to show you today is a

little demo that I put together for

running Genai in containers with GPUs.

Um, I built a model uh and a repo that

basically

creates

images for you without going out to an

image service. Uh, and I'm going to

start with a demo and then I'm going to

show you how it actually works and how I

actually put it together. All right, so

this is the demo. It's not running right

now, but basically you can generate an

image. And there's the last image that I

generated was a watercolor, a fall

colors, a forest with a lake. Uh, and

um, you've also got text embeddings that

you could do as well. But I'll show you

how all this works in a second. First,

let's get the actual demo started.

To do that, I'm going to go to Visual

Studio Code. I've already created a

Docker image. So I can just say docker

run and it's going to run this image

in a container for me.

It's a Spring Boot application. It uses

several different things including Onyx,

uh Stable Diffusion for image

generation, uh Nvidia CUDA for accessing

the GPUs, and a bunch of other stuff

that I'll explain afterwards. But let's

get started with the actual demo. So I

can go to this demo at localhost880 now

which is this. Fire it up. Okay. So

here's a clear one. And I've been

enjoying watercolors lately

of a pine forest

in in

a link

just something simple like that. All

right. Okay. So, what this is going to

do when I hit this button, it's going to

fire off a process

and it's going to create

several images. So, down here, if I

scroll down past the performance

warnings, it still performs pretty well.

Uh but what this code does is it

actually loads a couple of things from

stable diffusion. A couple of things

that it needs to generate images uh

including um

VAE and a couple of other things. Uh

it's going to use CUDA to access the GPU

on my local machine. Uh and it's going

to have a model stable diffusion 1.5.

There's a model path that's built into

the repo over here. Right there you can

see the models. Um, and what it does is

it actually generates as part of stable

diffusion, there's some built-in text.

So, stable diffusion is a text to image

generator. Uh, it creates a prompt. So,

I created a simple prompt which was

watercolor of a pine forest uh with a

lake. And, uh, basically the image here,

it's actually creating a metaprompt for

me. So, it's added in some information

based on a couple of things. And it also

has some safety checks that it does. And

then it actually starts the generating

the embedding and it does inference

steps. It does 40 inference steps. Steps

40 guidance 7.5 seed 42. That means it

has some images that it uses as a seed.

Uh and it has some guidance that's built

into it as well. That's one of the

things about stable diffusion. So it's

going to go through 40 inference steps

and basically it's creating image

layers. It created the 40 layers. uh

it's going to decode it. It runs the

safety check to make sure there's

nothing uh unsafe in here based on some

parameters that are in the default

stable diffuser safety check. Generates

the image and it created it in 80

seconds. So let's go ahead and look at

that image. There it is. Hey, nice. Um

so 1 minute 32 seconds. The code also,

so I used Nvidia CUDA to access my GPU.

it falls back to a CPU if the code

doesn't actually uh have access to a

GPU. Uh so it'll work on either, but it

takes over five minutes to generate the

same image. So what would you actually

use this for? Um image services are

great and I've gained a new appreciation

of how they work and how good they are

uh by building my own example from

scratch on my local machine. Uh but they

cost money. So, if you are just

generating the one-off image once in a

while, it's probably faster and better

to use one of the image generation

services out there. Uh, we have some

built into Azure and there's others as

well. Um, but if you have to generate

10,000 images or convert 10,000 images

from one thing to another, maybe create

cartoons from photos or whatever, um,

then using something like this solution

where you run everything locally is the

way to go. And you can use a GPU. Mine's

a pretty primitive GPU on my laptop. Uh

but you can also deploy to GPUs on Azure

uh on virtual machines. It's something

called Azure container apps. All right.

So anyway, um there's another thing here

that I can show you. It's less exciting

uh but it's just runs a text comparison

to say what the similarity is between

these two pieces of text. And if you

have a large piece of text, obviously

it's better, but this is just a symbol

similarity score that runs. And if you

look at the code here, it actually

checks uh the similarity score here.

It's much faster on the GPU than it is

on a CPU once again. So how to actually

build all this? Let's talk about that a

little bit. Uh the first thing I had to

do was get stable diffusion. So if you

look at my Visual Studio Code here, you

can see the models. I've downloaded two

models. The one's called Mini LM L6 V2.

Uh the other one is called uh stable

diffusion and it's got a safety checker,

text encoder, UU net or UNET and BAE

decoder in here as well. These are all

things that go into image processing.

But I needed to talk to Java and make

sure that I could access these and run

them locally. uh to do that there is a

tool uh basically an interoperability

uh set of libraries and standards called

ONNX open neural network exchange. You

might have seen that in the command line

when I was uh running this. So basically

what that does is allows you to um if I

go down here the key is right here the

technical design they provide a

definition of extensible computational

graph model as well as definitions of

built-in operators and standard types.

Great. Okay. So I needed to use the onx

uh uh basically decoders and to work

with stable diffusion. All right. And

then to get stable diffusion, I could

have gone to stable diffusion uh

website, downloaded stable diffusion and

encoded it and built in the framework

around it to be used by code myself. But

hugging face had uh inside their they

have Onyx community. They have 796

models built by Onyx Community. Um, and

one of them is Onyx Community Stable

Diffusion V15 Onyx. Okay, so I was able

to download this

um Whoops.

I was able to download this uh and then

I was able to make that part of my code.

So once again, getting back over here,

um, over here we have the,

uh, code that I was able to download. So

you just download this great but then

how do you actually access it from Java

because you have to do all these

functions uh check for safety text

encoder net and VA decoders just part of

image processing that you need to do. So

I could download these from stable

diffusion but then how do I actually

access them in my code? That was the

next tricky part and for that I use

something called SD4J.

So, a SD4J is called stable diffusion in

Java. It's an Oracle um repo and it's uh

open source. All of the things I'm

showing you here are open source and

public domain which is great. Uh so I

was able to include them in my app. Uh

but stable diffusion in Java is a

modified port of the C sharp

implementation for Onyx runtime, but

it's written in Java. So this saved me a

ton of time. Um, it was able to, it

targets Onyx runtime 114. And by the

way, this was the hardest part was

making sure that all of the different

versions of Onyx, SD4J,

and CUDA worked together to access my

GPU and make that run in one minute

versus five minutes with no GPU. Um,

these are some of the examples, but uh,

inside of here there's, uh, great code

for and a text tokenizer, which you need

as well for actually building this. So,

I was able to make this part of my repo.

The last piece of the puzzle is CUDA.

So, CUDA is an NVIDIA uh tool that

allows you to access GPUs and any NVIDIA

GPU whether it's running in a local

laptop or on a server or in a virtual

machine or what we call Azure container

apps uh runs through

um CUDA and then CUDA act you call your

Java code you call CUDA uh and CUDA

accesses the GPU uh finds a GPU that it

can use and runs all the processes on

that GPU. And once again, I mentioned it

falls back to a CPU if it can't find

anything that runs. But really cool

stuff. Putting all this together only

took a couple of days. Uh, and the way I

did that, I could certainly have coded

it by hand. It probably would have taken

a month or I don't know how long it

would have taken, but there's so many

pieces here that basically making them

all work together and making them all

compatible with the different versions

that you need to use uh is pretty

complex. So to do that, I'm not afraid

to uh admit that I used uh some large

language model performance-enhancing

uh models and uh in this case I used

agent mode in Visual Studio Code GitHub

Copilot with Cloud Sonnet 45. So, Cloud

Sonnet 45 uh is really really good when

you've got sort of a green field and you

need some advice on how to build things

and then you actually need to perform

the code checks and debug your brand new

code. There's another new model that's

out GBT5 codecs that I find a bit better

for refactoring as well. I just want to

mention those two. Those are brand new

as of this recording, but Clansson at 45

really uh I have Clansson at45 to thank

for a lot of the code that was generated

here or to blame if there's something

wrong with it. Uh and one of the cool

things it did is I had it put together a

prompt, a really complex prompt here uh

before we started. It's 750 lines and it

even includes source code and some of

the things I needed to build Maven and

all that stuff for Java. So 753 lines

prompt and then you put that prompt into

GitHub copilot and in agent mode and

basically just let it generate the

framework of the code. Then it took two

or three days of debugging to actually

make this work.

So that in a nutshell uh is everything

that I wanted to show you

for running genai containers for GPUs.

Please do check out the code at

akams/GPU

on Azure. Let me know what you think of

it and enjoy. Hey all, thanks for

watching and following along with us. If

you would like to find supporting

content resources and the code we used,

you can find them at aka.mjava

andai for beginners. It's also linked in

the description of this video. And we'll

see you in the next episode.

Today we're talking about

Java, but not just Java. We're talking

about Java in

containers,

but there's also more. And I know this

box seems like overkill, but trust me,

there's a good reason. I'm a cloud

advocate here at Microsoft. And I'm

joined again by Brian, who will be

teaching us all about dynamic sessions,

keeping Gen AI running smoothly inside

containers without resets. Imagine a

container is like a box that lets me

ship Brian a fresh cup of coffee. Once

Brian finishes the cup, I'd have to brew

a brand new cup, put in a brand new box,

and ship it all over again. That's how a

regular AI session might work. But with

dynamic sessions, you don't need a new

box every time. I can use this exact

same

container and Brian can just re keep

refilling his cup. It's like instead of

just shipping him one cup of coffee,

I've shipped him the whole coffee

machine inside the box along with the

cup. In AI terms, that means the model

keeps its context alive and keeps

running smoothly without resets. Brian,

I hope you receive the coffee I shipped

you. Now, over to you.

Thanks. Uh that's great. Just the way I

like it.

All right. So, we're going to be talking

today about dynamic sessions in Azure

container apps. First of all, I think I

need to explain a little bit about what

dynamic sessions are just to give you an

idea of what they are and what they're

useful for. Um, let's start with an

introduction to lang chain forj. Uh so

lang chain forj is a way to integrate

easily different parts of large language

models into your code. In this case

langchain forj uh is built for Java.

There's also lang chain for python and

javascript but today we're going to

focus on java and uh lang chain forj. So

langj specifically has unified APIs. It

has a toolbox and I built one of the

tools I'm going to show you here in a

second. And then it's got a bunch of

examples too. Let's dive into the GitHub

repo for Langchain for J. And you can

see it's got things like uh document

loaders, document parsers, uh

transformers for documents, uh embedding

stores, so that's embedding different

large language models, things like that.

We also have specific things uh or

specific capabilities for different

vendor tools including Anthropic, Azure,

and a few others. Uh what the thing I'm

going to show you today is code

execution engines. Uh so code execution

engines uh there's three of them in

here. One of them is the one I

contributed uh is uh Azure Azure

container apps dynamic sessions. There's

also a couple of others as well, but of

course I know it's good. Uh but um we're

going to focus on Azure container apps

dynamic sessions today. Uh and I'll show

you a little bit about how that works.

The other part of lang chain that we

have is lang chain examples. Uh and the

lang chain example I'm going to show you

actually calls the Azure container apps

dynamic sessions tool that I contributed

and shows a little bit of a a demo. So,

I'm going to hop into Visual Studio Code

before I go any further and show you

what it actually does and how it works.

And then we'll explain a little bit

about how it actually does what it does.

So, let's just start Java.

And basically what this code does is it

goes out to my Azure container apps

dynamic sessions that I've already

created and it asks a question from

OpenAI and it integrates that question

into some code that it generates on the

dynamic session. So in this case I asked

the question if a pizza had a radius of

Z and a depth of A, what's its volume?

And the answer should be in valid Python

code. The answer is the volume of a

pizza can be calculated using the

formula. And here's the formula. Uh and

um then there's an example usage. Uh and

basically a few other things here. So

that's just a quick example. Uh why was

why is this useful? Uh basically

let me give you another example. When

you go into chat GPT

and you create or any chat AI service

and you create a downloadable file, say

generate a PDF of what you just built

for me. Uh when it downloads that when

it actually creates that PDF, it uses a

code execution engine to do that in the

back end. What if you could have your

own code execution engine with complete

control over it that you could use to

run code, test code, build downloadable

files, things like that? Uh basically

that's what code execution engines do

and lang chain forj

adds functionality

to be able to easily access large

language models clouds and all kinds of

other tools to actually make that easier

to happen. So generating a file there's

lots of facilities there for generating

a file. Uh the second part of this demo

basically we're logging into Azure

container apps dynamic sessions and you

can see here it tries all kinds of

different ways that's part of the code I

have uh and then it uploads a file uh

hello world Java uh and it downloads a

file usually today seems to have a

little problem with that normally

downloads a file and then it lists all

the files that are in that session. Um

now how does this actually look on

Azure? Let's go into

Azure and show you that.

So, I'm actually in Azure container app.

So, this is the Azure portal,

portal.asure.com.

I've got a Azure container app set up

called my uh Azure container apps

dynamic session session pool. Uh, and

that is what you use to actually execute

the code. That's your code execution

engine. It's part of the resource group

I have is BB's Acads where I have an

Azure OpenAI instance and the session

pool. So the session pool itself I

actually have

a way of interacting with that here in

the portal. So I can play around with

this here. So I can actually take the

code that we generated here just as an

example. This is code that was generated

inside the execution engine but it

didn't run the code, right? It just told

you how to run it. So, let's go ahead

and copy this code.

Oh, and we'll go through the example

usage

and put that here.

Run the code and it gives you an output

hopefully. Oh, unexpected error. Never

mind. Uh, hold on. I can fix this.

So, let's go ahead and paste that code

in here.

And

there's the Python code.

We can actually run it and it comes out

and says the volume of pizza is 1.570

cubic units.

Uh yeah, so it's actually creating the I

think it's cubic centimeters. Um anyway,

it runs and it gives you a standard

output. So the idea here is you get the

idea. You can actually go into your

session, run some code, execute some

code, and you can call it

programmatically through the example

that I provided in Visual Studio Code

and then run it from there. Uh uh and

perform results, upload files, download

files, things like that.

Uh the cool thing about that

as well is that when you're running

that, it can also have access to an open

AI instance, which is what we have here.

So we're actually using just GPT5 GPT35

Turbo and um accessing the code that

generates that Python and the

manipulation of uploads and downloads is

done through other things. So let me

show you the actual code here. We'll

just run through that real quick. and it

calls a tool

over in lang chain forj but this one's

in lang chain forj examples.

So the first thing it does is it just

creates a connection

uh and a simple HTTP client for

interacting with the open AI instance.

Uh and then it calls that pool endpoint

uh that we have. Uh it sets up a open AI

key, OpenAI endpoint. These are all

environment variables that I already set

up in advance. Uh and it creates a chat

model. So it calls chat model with the

API key, the endpoint, the deployment

name. So it goes out to the uh large

language model that I showed you

earlier, the chatbt35.

Uh and then it builds this assistant and

it actually runs the query that you want

and returns an answer. Um then we've got

some other examples for uploading a

local file, um downloading a file and

listing files.

And if we move over to lang chain forj

the sessions ripple tool java is what

it's actually calling inside of lang

chamber forj code execution engine azure

acads

uh and there are corresponding

tools in here to do everything. So you

got a file uploader, file downloader,

file listister, and up above there is a

way to access the open AI uh

capabilities as well. Here we go. Um so

basically the whole tool is already

built and you can access it and use it

with a minimal amount of code. here.

Yeah, there's, you know, 275 lines of

code that are generating that example

and showing you how to upload, download,

list, and access OpenAI as well. All

right, folks. So, that's how to actually

run uh GI and containers using Azure

container apps dynamic sessions

including lang chain forj and I've

provided an example. Check out the

examples at akamsacample-accads

and let me know what you think.

Congratulations

on making it to the end of the series.

Thanks from everyone on the team who

helped make the series possible. If you

want to continue your learning journey,

you can visit aka.ms/java

andai for beginners. If you want to stay

uptodate with the channel, please like,

subscribe, and hit the bell notification

icon. We hope to see you again soon.

Loading...

Loading video analysis...